Football Squares with MATLAB

Posted by Ned Gulley, January 7, 2013

4 views (last 30 days) | 0 Likes | 5 comments

Super Bowl Squares
Get the Data
Regular Expressions to the Rescue!
Populate the Results Matrix
Compute the Probability Matrix
Add Numbers to the Plot
The Bottom Line

Super Bowl Squares

In my last post I wrote about English football. This time I'm talking about the American version. Here in the U.S. it's playoff season for professional football, and that means greasy food, beer, big-screen televisions, and football squares.

And what are football squares, you may ask? It's a simple mechanism to let a group of people wager on the outcome of a ballgame. Consider the following plot.

a = invhilb(10)<0;
% Why invhilb? See this Cody problem:
%   https://www.mathworks.com/matlabcentral/cody/problems/4-make-a-checkerboard-matrix
tick = 0:9;
imagesc(tick,tick,a)
colormap([1; 0.8]*[1 1 1])
set(gca, ...
    'XAxisLocation','top', ...
    'XTick',tick, ...
    'YTick',tick)
axis square
xlabel('Last Digit of Team A''s Score')
ylabel('Last Digit of Team B''s Score')

It has 100 small squares in it, each one corresponding to a pair of one-digit numbers. These one-digit numbers, in turn, correspond to the last digit in the final score of one of the two teams. Before the game, everyone buys one or more squares until they've all been sold. Now, if the Alligators (team A) go on to defeat the Buckaroos (team B) 17-10, then the owner of the square at location (7,0) would be the winner.

As you can imagine, some score pairs are much more likely than others. For this reason, in practice the squares are usually sold off at random. You don't get to pick which score pair you will receive.

All this sets the scene for a Super Bowl party from a few years ago. The Green Bay Packers were playing the Pittsburgh Steelers, and I had acquired a square. But not just any square. My square was linked to the score pair (2,2).

This struck me as a rare score pair. But how rare? Being quantitatively minded, and armed with my favorite technical computing tool, I went looking for data.

A little web searching turned up a site with every single NFL football game played since 1920, nearly 15,000 games. A savvy reader may observe that the game has changed a lot during that interval. Nevermind that! Let's do the calculations and see what we get.

Get the Data

First grab the HTML.

url = '#game_scores::none';
html = urlread(url);

Regular Expressions to the Rescue!

By carefully examining the structure of the HTML, we can make a regular expression target that will extract the information we need.

target = [ ...
    '<tr  class="">\s*' ...
    '<td align="right"  csk.*?>.*?</td>\s*' ...
    '<td align="right"  csk.*?>.*?</td>\s*' ...
    '<td align="right" >(\d+)</td>\s*' ...
    '<td align="right" >(\d+)</td>\s*' ...
    '<td align="right" >\d+</td>\s*' ...
    '<td align="right" >\d+</td>\s*' ...
    '<td align="right"  csk.*?><a href=".*?">(\d+)</a></td>\s*' ...
    ];
tk = regexp(html,target,'tokens');

Populate the Results Matrix

Armed with the textual data from the HTML, we can insert it into a matrix with counts for all the possible outcomes.

score = zeros(100);
oneDigitScore = zeros(10);

for i = 1:length(tk)
    winning = str2num(tk{i}{1});
    winningMod10 = mod(winning,10);
    losing = str2num(tk{i}{2});
    losingMod10 = mod(losing,10);
    game_count = str2num(tk{i}{3});

    % 100-by-100 score grid with actual final scores
    score(winning+1,losing+1) = game_count;

    % 10-by-10 score grid with mod 10 final scores
    oneDigitScore(winningMod10+1,losingMod10+1) = oneDigitScore(winningMod10+1,losingMod10+1) + game_count/2;
    oneDigitScore(losingMod10+1,winningMod10+1) = oneDigitScore(losingMod10+1,winningMod10+1) + game_count/2;

end

Compute the Probability Matrix

Calculate percentages based on the total number of games and visualize the results.

prob = oneDigitScore/sum(oneDigitScore(:))*100;
imagesc(0:9,0:9,prob)
colormap(summer(64))
colorbar

set(gca, ...
    'XAxisLocation','top', ...
    'XTick',tick, ...
    'YTick',tick)
axis square
xlabel('Last Digit of Team A''s Score')
ylabel('Last Digit of Team B''s Score')

Just to be safe, let's verify that the sum of the probability matrix is 100%.

fprintf('Sum of all probabilities (percent): %2.1f\n',sum(prob(:)));

Sum of all probabilities (percent): 100.0

Add Numbers to the Plot

No surprise: the likeliest outcome is the pair (7,0) or (0,7). What about (2,2)? It's looking pretty grim. Let's throw some numbers on the plot to find out.

colorbar off

[rows,cols] = size(prob);
for i = 1:rows
  for j = 1:cols
    text(j-1,i-1,sprintf('%1.2f',prob(i,j)),...
        'FontSize', 8, ...
      'Color','red', ...
      'HorizontalAlignment','center');
  end
end

set(gca,'XAxisLocation','top')
xlabel('Last Digit of Steelers Score')
ylabel('Last Digit of Packers Score')

patch([2 3 3 2 2]-0.5,[2 2 3 3 2]-0.5,'red', ...
  'FaceColor','none','LineWidth',2,'EdgeColor','yellow')
patch([5 6 6 5 5]-0.5,[1 1 2 2 1]-0.5,'red', ...
  'FaceColor','none','LineWidth',2,'EdgeColor','yellow')

Ouch!

The Bottom Line

All this is a long-winded way of saying that my pick, (2,2), is the absolute worst possible choice. Since the merger in 1970, there have been exactly two games that ended with (2,2). On December 5, 2004, the Buffalo Bills beat the Miami Dolphins 42-32, and on November 4, 2012 the Tampa Bay Buccaneers defeated the Oakland Raiders by the same score.

Incidentally, the actual winning result for Steelers-Packers Super Bowl, (1,5), is also quite rare. Rare as these things go, but still eleven times more likely than (2,2).

Not that I'm bitter about it.

Addendum

LATE ADDITION: In the comments below, Sean and Matt banter about soccer scores and the Football Squares game. Here is the plot that results from English Premier League games (partial season). Numbers shown are percentages.