Football Squares with MATLAB

Super Bowl Squares

In my last post I wrote about English football. This time I'm talking about the American version. Here in the U.S. it's playoff season for professional football, and that means greasy food, beer, big-screen televisions, and football squares.

And what are football squares, you may ask? It's a simple mechanism to let a group of people wager on the outcome of a ballgame. Consider the following plot.

a = invhilb(10)<0;
% Why invhilb? See this Cody problem:
%   http://www.mathworks.com/matlabcentral/cody/problems/4-make-a-checkerboard-matrix
tick = 0:9;
imagesc(tick,tick,a)
colormap([1; 0.8]*[1 1 1])
set(gca, ...
'XAxisLocation','top', ...
'XTick',tick, ...
'YTick',tick)
axis square
xlabel('Last Digit of Team A''s Score')
ylabel('Last Digit of Team B''s Score')


It has 100 small squares in it, each one corresponding to a pair of one-digit numbers. These one-digit numbers, in turn, correspond to the last digit in the final score of one of the two teams. Before the game, everyone buys one or more squares until they've all been sold. Now, if the Alligators (team A) go on to defeat the Buckaroos (team B) 17-10, then the owner of the square at location (7,0) would be the winner.

As you can imagine, some score pairs are much more likely than others. For this reason, in practice the squares are usually sold off at random. You don't get to pick which score pair you will receive.

All this sets the scene for a Super Bowl party from a few years ago. The Green Bay Packers were playing the Pittsburgh Steelers, and I had acquired a square. But not just any square. My square was linked to the score pair (2,2).

This struck me as a rare score pair. But how rare? Being quantitatively minded, and armed with my favorite technical computing tool, I went looking for data.

A little web searching turned up a site with every single NFL football game played since 1920, nearly 15,000 games. A savvy reader may observe that the game has changed a lot during that interval. Nevermind that! Let's do the calculations and see what we get.

Get the Data

First grab the HTML.

url = 'http://www.pro-football-reference.com/boxscores/game_scores.cgi#game_scores::none';


Regular Expressions to the Rescue!

By carefully examining the structure of the HTML, we can make a regular expression target that will extract the information we need.

target = [ ...
'<tr  class="">\s*' ...
'<td align="right"  csk.*?>.*?</td>\s*' ...
'<td align="right"  csk.*?>.*?</td>\s*' ...
'<td align="right" >(\d+)</td>\s*' ...
'<td align="right" >(\d+)</td>\s*' ...
'<td align="right" >\d+</td>\s*' ...
'<td align="right" >\d+</td>\s*' ...
'<td align="right"  csk.*?><a href=".*?">(\d+)</a></td>\s*' ...
];
tk = regexp(html,target,'tokens');


Populate the Results Matrix

Armed with the textual data from the HTML, we can insert it into a matrix with counts for all the possible outcomes.

score = zeros(100);
oneDigitScore = zeros(10);

for i = 1:length(tk)
winning = str2num(tk{i}{1});
winningMod10 = mod(winning,10);
losing = str2num(tk{i}{2});
losingMod10 = mod(losing,10);
game_count = str2num(tk{i}{3});

% 100-by-100 score grid with actual final scores
score(winning+1,losing+1) = game_count;

% 10-by-10 score grid with mod 10 final scores
oneDigitScore(winningMod10+1,losingMod10+1) = oneDigitScore(winningMod10+1,losingMod10+1) + game_count/2;
oneDigitScore(losingMod10+1,winningMod10+1) = oneDigitScore(losingMod10+1,winningMod10+1) + game_count/2;

end


Compute the Probability Matrix

Calculate percentages based on the total number of games and visualize the results.

prob = oneDigitScore/sum(oneDigitScore(:))*100;
imagesc(0:9,0:9,prob)
colormap(summer(64))
colorbar

set(gca, ...
'XAxisLocation','top', ...
'XTick',tick, ...
'YTick',tick)
axis square
xlabel('Last Digit of Team A''s Score')
ylabel('Last Digit of Team B''s Score')


Just to be safe, let's verify that the sum of the probability matrix is 100%.

fprintf('Sum of all probabilities (percent): %2.1f\n',sum(prob(:)));

Sum of all probabilities (percent): 100.0


No surprise: the likeliest outcome is the pair (7,0) or (0,7). What about (2,2)? It's looking pretty grim. Let's throw some numbers on the plot to find out.

colorbar off

[rows,cols] = size(prob);
for i = 1:rows
for j = 1:cols
text(j-1,i-1,sprintf('%1.2f',prob(i,j)),...
'FontSize', 8, ...
'Color','red', ...
'HorizontalAlignment','center');
end
end

set(gca,'XAxisLocation','top')
xlabel('Last Digit of Steelers Score')
ylabel('Last Digit of Packers Score')

patch([2 3 3 2 2]-0.5,[2 2 3 3 2]-0.5,'red', ...
'FaceColor','none','LineWidth',2,'EdgeColor','yellow')
patch([5 6 6 5 5]-0.5,[1 1 2 2 1]-0.5,'red', ...
'FaceColor','none','LineWidth',2,'EdgeColor','yellow')


Ouch!

The Bottom Line

All this is a long-winded way of saying that my pick, (2,2), is the absolute worst possible choice. Since the merger in 1970, there have been exactly two games that ended with (2,2). On December 5, 2004, the Buffalo Bills beat the Miami Dolphins 42-32, and on November 4, 2012 the Tampa Bay Buccaneers defeated the Oakland Raiders by the same score.

Incidentally, the actual winning result for Steelers-Packers Super Bowl, (1,5), is also quite rare. Rare as these things go, but still eleven times more likely than (2,2).

Not that I'm bitter about it.

LATE ADDITION: In the comments below, Sean and Matt banter about soccer scores and the Football Squares game. Here is the plot that results from English Premier League games (partial season). Numbers shown are percentages.

MATLAB Christmas Trees

While online doing some last minute shopping, I found some Christmas trees that were created using MATLAB.  I thought they are perfect to share with everyone in our community during this holiday season. What a perfect gift if you are still looking for something for someone special! You can even get the MATLAB code to create your own version!

From XMas Tree, this tree was created by Marc Latzel on File Exchange.

Also from File Exchange, Anselm shares his Christmas Tree Plot.

From YouTube, gknor has shared his MATLAB Xmas Tree Animation.

Do you have a favorite MATLAB Christmas tree? Vote for one of these trees or share a link to your own creation as a comment below.

Happy holidays to everyone from the Do You Speak MATLAB blog team!

Trendy now gets data hourly, desktop gadgets & more

Trendy is a MATLAB Central application that scrapes a time series dataset from website(s) you specify. Trendy uses this data to maintain a plot for you, which it updates automatically with each new data point captured. You don’t even need a MATLAB license to use this powerful tool.

For instance, there’s a plot of several trends for keeping watchful eye over the US vs Canada vs the UK national debt, normalized per person.

Data Gathering Fidelity: Hourly, Daily, Weekly & Monthly

Up to now, MATLAB Central had been running trend gathering code only once daily.  With the latest release, you can instruct Trendy how often to collect data.  This means you can now use Trendy to capture time series data with higher fidelity.

Now, when you Create a Trend:

you will see a fourth step:

This new step gives you control over when and how often Trendy collects data from the website you’ve specified.  Since there is no point collecting hourly data on something which changes less often, please configure this according to your needs.

Another great use of Trendy is to automatically graph your rank on MATLAB Central Cody.  Follow that link and you’ll have created a trend plot in Trendy easier than ever!

Debugging: Show me what was sent to MATLAB

Sometimes while creating a trend or a plot in Trendy, you may “Test Code & Show Results” for testing purposes:

Clicking “Show me what was sent to MATLAB,” above, it’s even easier to prototype and debug Trendy code in concert with MATLAB running on your desktop.  With this new feature, you see exactly how Trendy renders your code to our server-side MATLAB, e.g.:

Using Trend data in MATLAB:

A while back, I’d taken interest in Sam Mirsky’s World Mood 2 trend plot.  I was aware of the idea of using social media metrics as input into predictive modelling of financial markets.  While I would never advocate a trading or investment strategy based solely on one indicator, I had in July already been considering taking a long position in the S&P500 personally (via the SPY ETF).  At the time, I lacked conviction in the idea so let that opportunity pass absent a confirming indicator.

Drawing on the idea of using social media metrics as a trading indicator, it is interesting to use MATLAB to overlay the Surprise Tweets/min from Sam’s plot with the daily close of SPY:

Note I’ve lightened data in each series for which corresponding values in the other series were unavailable.

You can replicate the above plot in MATLAB as follows.  First, download Sam’s Surprise Tweets/min data from his Trend:

then in MATLAB,

twd = struct;
twv = struct;
twd.surprise = csv2cell('Surprise Tweets_min.csv','fromfile');
names=fieldnames(twd);
for j = 1:numel(names)
twv.(names{j}) = zeros(size(twd.(names{j}),1),2);
for i = 1:size(twd.(names{j}),1)
twd.(names{j}){i,1} = datenum(twd.(names{j}){i,1},'yyyy-mm-dd HH:MM:SS');
if ~isempty(sscanf(twd.(names{j}){i,2},'[%f]'))
twd.(names{j}){i,2} = sscanf(twd.(names{j}){i,2},'[%f]');
else
twd.(names{j}){i,2} = 0.0;  % clean nulls
end
twv.(names{j})(i,2) = twd.(names{j}){i,2};
twv.(names{j})(i,1) = twd.(names{j}){i,1};
if twv.(names{j})(i,1) > 735182.483032407
twv.(names{j})(i,2) = 0;    % omit where no corresponding
end
end
end
data = fetch(yahoo, 'spy','Close',twv.surprise(1,1),twv.surprise(size(twv.surprise(:,1),1),1));  % note "fetch" requires the MATLAB datafeed toolbox
for i = 1:size(data,1)
if data(i,1) < 735004
data(i,2) = NaN;  % omit where no corresponding
end
if data(i,1) > 735190
data(i,2) = NaN;  % omit where no corresponding
end
end
[AX, H1, H2] = plotyy(twv.surprise(:,1),twv.surprise(:,2),data(:,1),data(:,2));
set(AX, 'xTickLabel','')
datetick('x','mmm','keepticks')
set(get(AX(1),'Ylabel'),'String','Surprise Tweets/min')
set(get(AX(2),'Ylabel'),'String','SPY Close \$/share')
set(H1,'Marker','.','Color','blue','LineStyle','none' )
set(H2,'LineStyle','-')
set(AX(1),'ylim', [300 2000])
set(AX(2),'ylim', [125 150])
set(AX(1),'YTick',[0:500:2500])
set(AX(2),'YTick',[125:5:150])

In case you are wondering, I do not believe there is such thing as a “holy grail” indicator with the power to foretell financial markets.  I believe an effective trading or investing strategy is constructed cautiously using an array of clues validated over time, each in context, as useful for gaining a statistical edge.  For me, the above is only one clue to be confirmed with others; only time will tell its predictive reliability. The materials shared here are for general information purposes only and do not constitute any investment advice. That said, the data available as of July in concert with other factors I personally track would have been sufficient to have favorably affected my conviction at that time.  I’m compelled to monitor Surprise Tweets/min.  Which brings us to the next topic,

Trendy Gadgets: Windows Vista & Windows 7

As human nature would have it, I’d routinely forget to open my browser, navigate to the plot page & see what had changed on Sam’s plot discussed above.  It’s why I failed to notice the spike in Surprise Tweets/min in a timely manner, which cost me the opportunity discussed above.  While Trendy is a valuable tool, Sam’s plot page is well outside my daily workflow.

I wanted to become aware of changes to Trendy content in a more timely manner.  To bring this value within my personal workflow, I developed a Windows 7 desktop gadget to keep my favorite Trendy plot always within view:

On Windows 7, a desktop gadget is implemented as a zip archive with file extension “.gadget”.  In the case of Trendy, the archive contains HTML, CSS and JavaScript which replicates the web experience of viewing the plot image.  If you’re interested, you can learn more about gadgets here.  I’d personally thank anyone who feels inspired to share something similar on File Exchange for Mac, Linux, Android and/or Windows 8 “live tile” Metro/Surface formats.

The MATLAB Central Trendy team felt others may like this ability too, so if you’re using Windows Vista or 7 you will now find following each Trendy plot an Extras: Download Gadget tailored to the specific plot you’re viewing.

How Trendy Saved My Life

OK, it didn’t really save my life. But Trendy’s users have helped me understand MATLAB coding and plotting… and to some, that’s right up there with having their lives saved.

I’m Rob Nickerson, a Senior Usability Specialist at MathWorks, and I worked on web applications on mathworks.com like the webstore, training, and careers areas. However, Trendy is a slightly different animal; users of Trendy write small snippets of MATLAB code to gather data, and then Trendy automatically gathers and plots that data indefinitely.  When I was approached to work on the first release of Trendy, I was excited (and intimidated, to be honest) by the challenge of working on a web application so closely tied to MATLAB. Did I need to be a MATLAB coding expert? Did I need to know how to use hold on/hold off, or how to write complex HTML scraping code in MATLAB?

From the world of the obscure: My plot for the sales rank for "Overboard" on Amazon after being mentioned on the Oscars.

Thankfully, the answer to those questions was “not really”. The important part of my job is to not be an expert user myself, but rather to identify who the real users are, and what it is they want to do. In the past two years we’ve recruited MATLAB Central users like you to come to MathWorks and evaluate Trendy using paper mockups, development builds, and production releases. The feedback from those usability sessions directly influences the features and functionality we incorporate into Trendy, whether it’s during a major release or minor tune-up releases.  For example, user testing told us that being able to collect a data point once a day was limiting, and you gave us examples like “I want to collect my fantasy football stats weekly”, or “I want to track a stock price over the course of a day”. With that feedback in mind, we now allow data samples to occur hourly, daily, weekly, or even monthly.

This process of gathering and distilling user feedback has not only educated me about our users and their needs, but also about MATLAB and plotting in general. We now have lots of great content to learn from, as our users have made some pretty incredible things in the short time it’s been available, constantly changing the answer to the question “What can I use Trendy for?”:

Since the nature of Trendy is to create and share your work with the world, it’s easy for someone just starting with MATLAB (me, for example) to reverse-engineer the work of someone who’s been doing this for a while (Aurelien Queffurust or Semin Ibisevic, for example) and create my own trends. For example, I have a plot comparing Apple and Google stock prices; the plot was kind of boring and not terribly informative until a Trendy user suggested I “normalize the trend lines to make things more interesting”. To do that, I had to figure out what normalizing the plot meant… and to do THAT, I found someone else’s plot that had code for normalizing trend lines. After borrowing that code, I was in business.

From the world of the userful: Trendy user Edric Ellis plots local river heights, to determine when bridges are rendered unpassable.

Trendy is teaching me more about MATLAB and plotting than I ever would have imagined, and I have users like you to thank for it. If you have questions or comments about Trendy, please let us know… and thanks again for saving my life.

November 26th, 2012

I’m pleased to share a recent Cody development:

You’re playing along, and one day you notice Cody has begun to acknowledge your triumphs.  If you’ve not yet seen this, take a look:

1. Click “My Cody”

and you may be pleasantly surprised to learn you’ve earned one or more badges acknowledging your progress.  Want to know how to earn a  specific badge?  Simply hover over it and the criteria are revealed.  Here is an example from one of the more challenging badges:

As you gain MATLAB prowess by playing Cody, you may find your trophy case is filling up:

Such an accomplishment might well be worth showcasing on your favorite social network or next job interview.

1. Click “Players”

2. Select your friend from the list

It’s that easy.

We hope you enjoy earning badges as much as we enjoyed creating them.  Got any suggestions for other badges?  Leave us a note!

England, Football, and Comma-Separated Tables

I recently had the pleasure of visiting the UK for the MATLAB Expo in Birmingham, England. As part of that visit, I gave a talk introducing the new features in MATLAB 2012b. As much as I enjoyed that, the most exciting part of my day was meeting Yi Cao, one of the real rock stars of the MATLAB Programming Contest. Here we are shortly after my talk.

Yi Cao's analysis of the Peg Solitaire contest (which he won) is still a classic.

Since I was in England talking about the newest version of MATLAB, I was inspired to use some of the latest features to analyze (sorry, analyse) football scores. And by football, I mean the game where you actually kick the ball with your foot.

One of the new features I'm most excited about is the new Data Import tool. I was itching to try it out on some real data. At about this time, by good fortune, one of my MathWorks UK colleagues pointed me to an online CSV file with the latest results from the Premier League.

Contents

Import the Data

With the improved Import Tool, iporting the data really is as simple as double-clicking on the CSV file.

I'm interested in only six of a vast forest of columns in the table:

1. Date
2. Home team name
3. Away team name
4. Home team score, FTHG
5. Away team score, FTAG
6. Result, FTR (H = home team win, A = away team win, D = draw)

The Date column is particularly noteworthy because importing dates has always been painful.

A touch of the mouse gave me an IMPORT_FOOTBALL_DATA function that I can embed in this example file.

filename = 'E0.csv';
[Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR] = import_football_data(filename);


I can honestly say that I would never have written this data importing code. I'm capable of writing it, and if critical work depended on it, I would have. It's too much of a nuisance most of the time. But now it's so easy with the Data Import tool, away I go.

Number of Goals per Game

With the data safely in hand, we can start making pictures. First of all, how many games are we talking about?

nGames = length(FTR)

nGames =

109



What's the histogram like for total goals in a game?

hist(FTHG + FTAG,0:7)
title('Histogram of Total Goals in a Soccer Game');


Is there an evident home and away skew?

subplot(2,1,1)
hist(FTHG,0:7)
ylim([0 50])
title('Goals by the Home Team');
subplot(2,1,2)
hist(FTAG,0:7)
ylim([0 50])
title('Goals by the Away Team');


Yes there is. The away team is much more likely to be shut out than the home team. No surprises here: you are more likely to win a home game than an away game.

draw = sum(strcmp(FTR,'D'))
home = sum(strcmp(FTR,'H'))
away = sum(strcmp(FTR,'A'))
clf
bar([home away draw])
ylim([0 50])
set(gca,'XTickLabel',{'home','away','draw'})
colormap(jet)

draw =

38

home =

45

away =

26



Like pies? I don't, but here you go...

clf
pie([home away draw],{sprintf('home\nteam\nwins'),'away team wins','draw'})
colormap(summer(3))


Let's put it into a matrix. This is effectively a two-dimensional histogram of final scores.

ma = max(FTAG);
mh = max(FTHG);
mt = max(ma,mh);
g = zeros(mt+1);

for i = 1:length(FTHG);
ag = FTAG(i) + 1;
hg = FTHG(i) + 1;
g(ag,hg) = g(ag,hg) + 1;
end

clf
image(0:mt,0:mt,g);
colormap(flipud(gray(max(g(:)))));
xlabel('Home Goals')
ylabel('Away Goals')
axis xy
axis square

g

g =

8    13     6     5     1     1     0
3    18     8     2     3     0     1
3     8    10     3     2     0     0
2     3     4     2     0     0     0
0     0     1     0     0     0     0
1     0     1     0     0     0     0
0     0     0     0     0     0     0



So the most common score is 1-1. That's soccer for you. The preponderance of home team victories is all evident here.

Finally, I wanted to clean up this plot to make it easier to read. I remembered Rob Henson's excellent HEATMAPTEXT contribution to the File Exchange. This does a much nicer job than my poor plot above.

clf
heatmaptext(g,'FontColor','red');
colormap(flipud(bone));
xlabel('Home Goals')
ylabel('Away Goals')
set(gca,'XTickLabel',0:6)
set(gca,'YTickLabel',0:6)
axis on
axis xy


In creating this document, I appreciated the convenient new PUBLISH tab on the MATLAB toolstrip.

Conclusions

None of this is groundbreaking in terms of analysis, but it was a lot of fun and extremely easy. With the Data Import tool, the new toolstrip, publishing, and a little spice from the File Exchange, it brought together so much of what makes MATLAB appealing. Simple code leveraged through a powerful environment with eye-catching results.

MATLAB Knots Contest Winners

And the winners of the MATLAB Knots Contest are…

Raphaël Candelier – Grand Prize Winner

Hannes Naudé – Prince of Darkness

Alfonso Nieto-Castañón – Twilight Prize

Richard Zapor – Early Bird and Sunday Push

Yi Cao – Saturday Leap

Congratulations to Raphaël for winning the Knots Contest with his submission Cheeeese.

Raphaël is a researcher in Physics and Biophysics at the University Paris VI (France). He currently works on zebrafish neuroimagery and routinely uses MATLAB. Believe or not, this was his first MATLAB contest.

You can find out more about the contest action and mini-contests by reading the contest blog. Thanks to everyone who participated in the contest!

If you want to learn more about players who have won MATLAB Contests over the years, check out our Contest Hall of Fame.

A Meeting of the Contest Masterminds

The contest is winding down. The queue has been shut down one last time and we are waiting for the queue to finish processing so that we can announce the final Grand Prize Winner.

While we are waiting, I thought I would share a tweet from the MATLAB Expo in the UK yesterday. Ned, our contest mastermind, was at the MATLAB Expo in the UK. While at the meeting, he connected with Yi Cao from Cranfield University, who has been a long-time participant in our MATLAB Contest and on File Exchange.

What fun to be able to run into each other off-line during the contest!

MATLAB Knots Contest has started

The Fall 2012 MATLAB Contest is inspired by the game Planarity. But it was also inspired by the problem of the Gordian Knot. The problem is this: Given a deranged hairball of a knot, can you untie it?

In more software-friendly terms, the problem can be restated as this: Given a list of points and their connectivity as supplied by an adjacency matrix, move them around so that the lines do not cross.

Our contest will run from 16:00 UTC Wednesday, 31 October through to 16:00 UTC Wednesday, 7 November. Join the contest fun, and compete with the best programmers from all over the world!

Good luck everyone!

MATLAB Mobile for Android

It’s here! MATLAB Mobile is now available on the Android™ platform.

From your Android smartphone or tablet, you can now connect to a MATLAB session on the cloud or to MATLAB running on your desktop computer. For more information, visit the MATLAB Mobile page. For an overview, watch this video.

We would love to know how you are using MATLAB Mobile for Android. Leave us a comment here, with your thoughts and feedback.

