# Football Squares with MATLAB 5

Posted by **Ned Gulley**,

### Contents

#### Super Bowl Squares

In my last post I wrote about English football. This time I'm talking about the American version. Here in the U.S. it's playoff season for professional football, and that means greasy food, beer, big-screen televisions, and football squares.

And what are football squares, you may ask? It's a simple mechanism to let a group of people wager on the outcome of a ballgame. Consider the following plot.

a = invhilb(10)<0; % Why invhilb? See this Cody problem: % http://www.mathworks.com/matlabcentral/cody/problems/4-make-a-checkerboard-matrix tick = 0:9; imagesc(tick,tick,a) colormap([1; 0.8]*[1 1 1]) set(gca, ... 'XAxisLocation','top', ... 'XTick',tick, ... 'YTick',tick) axis square xlabel('Last Digit of Team A''s Score') ylabel('Last Digit of Team B''s Score')

It has 100 small squares in it, each one corresponding to a pair of one-digit numbers. These one-digit numbers, in turn, correspond to the last digit in the final score of one of the two teams. Before the game, everyone buys one or more squares until they've all been sold. Now, if the Alligators (team A) go on to defeat the Buckaroos (team B) 17-10, then the owner of the square at location (7,0) would be the winner.

As you can imagine, some score pairs are much more likely than others. For this reason, in practice the squares are usually sold off at random. You don't get to pick which score pair you will receive.

All this sets the scene for a Super Bowl party from a few years ago. The Green Bay Packers were playing the Pittsburgh Steelers, and I had acquired a square. But not just any square. My square was linked to the score pair (2,2).

This struck me as a rare score pair. But how rare? Being quantitatively minded, and armed with my favorite technical computing tool, I went looking for data.

A little web searching turned up a site with every single NFL football game played since 1920, nearly 15,000 games. A savvy reader may observe that the game has changed a lot during that interval. Nevermind that! Let's do the calculations and see what we get.

#### Get the Data

First grab the HTML.

```
url = 'http://www.pro-football-reference.com/boxscores/game_scores.cgi#game_scores::none';
html = urlread(url);
```

#### Regular Expressions to the Rescue!

By carefully examining the structure of the HTML, we can make a regular expression target that will extract the information we need.

target = [ ... '<tr class="">\s*' ... '<td align="right" csk.*?>.*?</td>\s*' ... '<td align="right" csk.*?>.*?</td>\s*' ... '<td align="right" >(\d+)</td>\s*' ... '<td align="right" >(\d+)</td>\s*' ... '<td align="right" >\d+</td>\s*' ... '<td align="right" >\d+</td>\s*' ... '<td align="right" csk.*?><a href=".*?">(\d+)</a></td>\s*' ... ]; tk = regexp(html,target,'tokens');

#### Populate the Results Matrix

Armed with the textual data from the HTML, we can insert it into a matrix with counts for all the possible outcomes.

score = zeros(100); oneDigitScore = zeros(10); for i = 1:length(tk) winning = str2num(tk{i}{1}); winningMod10 = mod(winning,10); losing = str2num(tk{i}{2}); losingMod10 = mod(losing,10); game_count = str2num(tk{i}{3}); % 100-by-100 score grid with actual final scores score(winning+1,losing+1) = game_count; % 10-by-10 score grid with mod 10 final scores oneDigitScore(winningMod10+1,losingMod10+1) = oneDigitScore(winningMod10+1,losingMod10+1) + game_count/2; oneDigitScore(losingMod10+1,winningMod10+1) = oneDigitScore(losingMod10+1,winningMod10+1) + game_count/2; end

#### Compute the Probability Matrix

Calculate percentages based on the total number of games and visualize the results.

prob = oneDigitScore/sum(oneDigitScore(:))*100; imagesc(0:9,0:9,prob) colormap(summer(64)) colorbar set(gca, ... 'XAxisLocation','top', ... 'XTick',tick, ... 'YTick',tick) axis square xlabel('Last Digit of Team A''s Score') ylabel('Last Digit of Team B''s Score')

Just to be safe, let's verify that the sum of the probability matrix is 100%.

```
fprintf('Sum of all probabilities (percent): %2.1f\n',sum(prob(:)));
```

Sum of all probabilities (percent): 100.0

#### Add Numbers to the Plot

No surprise: the likeliest outcome is the pair (7,0) or (0,7). What about (2,2)? It's looking pretty grim. Let's throw some numbers on the plot to find out.

colorbar off [rows,cols] = size(prob); for i = 1:rows for j = 1:cols text(j-1,i-1,sprintf('%1.2f',prob(i,j)),... 'FontSize', 8, ... 'Color','red', ... 'HorizontalAlignment','center'); end end set(gca,'XAxisLocation','top') xlabel('Last Digit of Steelers Score') ylabel('Last Digit of Packers Score') patch([2 3 3 2 2]-0.5,[2 2 3 3 2]-0.5,'red', ... 'FaceColor','none','LineWidth',2,'EdgeColor','yellow') patch([5 6 6 5 5]-0.5,[1 1 2 2 1]-0.5,'red', ... 'FaceColor','none','LineWidth',2,'EdgeColor','yellow')

Ouch!

#### The Bottom Line

All this is a long-winded way of saying that my pick, (2,2), is the absolute worst possible choice. Since the merger in 1970, there have been *exactly two games* that ended with (2,2). On December 5, 2004, the Buffalo Bills beat the Miami Dolphins 42-32, and on November 4, 2012 the Tampa Bay Buccaneers defeated the Oakland Raiders by the same score.

Incidentally, the actual winning result for Steelers-Packers Super Bowl, (1,5), is also quite rare. Rare as these things go, but still eleven times more likely than (2,2).

Not that I'm bitter about it.

#### Addendum

LATE ADDITION: In the comments below, Sean and Matt banter about soccer scores and the Football Squares game. Here is the plot that results from English Premier League games (partial season). Numbers shown are percentages.

Get the MATLAB code

Published with MATLAB® R2012b

## 5 CommentsOldest to Newest

**1**of 5

Great post! It would be fun to have team filters on each axis to see how the squares probability change for specific match ups. Maybe your square 2,2 isn’t so bad when the Patriots play Seattle?

**2**of 5

Soccer squares just wouldn’t be that much fun:

______Team A_______ T| | 0 | 1 | E|--+------+------| A| 0| 0.25 | 0.25 | M|--+------+------| | 1| 0.25 | 0.25 | B|__|______|______|

:)

**3**of 5

Close, Sean, but not quite. Using some resources from Ned’s previous football post:

[Date,HomeTeam,AwayTeam,FTHG,FTAG] = import_football_data(‘E0.csv’);

fsquare = accumarray([FTHG,FTAG]+1,1,[10,10]);

fsquare = 0.5*(fsquare+fsquare’)/length(FTHG);

figure

colormap(summer)

imagesc(0:9,0:9,fsquare)

But, yeah. I can see why football squares aren’t as popular outside the US. Now if only I can find the data for rugby squares…

**4**of 5

Hey Matt, I ran your code (with a few minor tweaks) and posted it above at the end of the article.

**5**of 5

Cool! And because I’m a massive geek, I couldn’t let the idea of Rugby Squares die. Thanks be to the internet, provider of all obscure data! Interestingly, the distribution for rugby seems to be a lot more uniform than for American football, from a minimum of 0.5% for (1,4) or (4,1), to a maximum of 1.8% for (0,0). Your (2,2) pick isn’t as bad, but it’s still pretty bad: 0.68%.

Your odds were slightly better (1.05%) in the 70s and 80s, during the days of 4-point tries.

I’m still trying to figure out the overall pattern. There’s a definite preference towards 0, 3, 6, and 9. But then, for some reason, there’s (2,6) and (6,2) right up there with 1.74%.

Are you wishing you’d never started this yet…?

## Recent Comments