MATLAB Spoken Here

England, Football, and Comma-Separated Tables

Posted by Ned Gulley,

I recently had the pleasure of visiting the UK for the MATLAB Expo in Birmingham, England. As part of that visit, I gave a talk introducing the new features in MATLAB 2012b. As much as I enjoyed that, the most exciting part of my day was meeting Yi Cao, one of the real rock stars of the MATLAB Programming Contest. Here we are shortly after my talk.

Yi Cao's analysis of the Peg Solitaire contest (which he won) is still a classic.

Since I was in England talking about the newest version of MATLAB, I was inspired to use some of the latest features to analyze (sorry, analyse) football scores. And by football, I mean the game where you actually kick the ball with your foot.

One of the new features I'm most excited about is the new Data Import tool. I was itching to try it out on some real data. At about this time, by good fortune, one of my MathWorks UK colleagues pointed me to an online CSV file with the latest results from the Premier League.

Contents

Import the Data

With the improved Import Tool, iporting the data really is as simple as double-clicking on the CSV file.

I'm interested in only six of a vast forest of columns in the table:

  1. Date
  2. Home team name
  3. Away team name
  4. Home team score, FTHG
  5. Away team score, FTAG
  6. Result, FTR (H = home team win, A = away team win, D = draw)

The Date column is particularly noteworthy because importing dates has always been painful.

A touch of the mouse gave me an IMPORT_FOOTBALL_DATA function that I can embed in this example file.

filename = 'E0.csv';
[Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR] = import_football_data(filename);

I can honestly say that I would never have written this data importing code. I'm capable of writing it, and if critical work depended on it, I would have. It's too much of a nuisance most of the time. But now it's so easy with the Data Import tool, away I go.

Number of Goals per Game

With the data safely in hand, we can start making pictures. First of all, how many games are we talking about?

nGames = length(FTR)
nGames =

   109

What's the histogram like for total goals in a game?

hist(FTHG + FTAG,0:7)
title('Histogram of Total Goals in a Soccer Game');

Is there an evident home and away skew?

subplot(2,1,1)
hist(FTHG,0:7)
ylim([0 50])
title('Goals by the Home Team');
subplot(2,1,2)
hist(FTAG,0:7)
ylim([0 50])
title('Goals by the Away Team');

Yes there is. The away team is much more likely to be shut out than the home team. No surprises here: you are more likely to win a home game than an away game.

draw = sum(strcmp(FTR,'D'))
home = sum(strcmp(FTR,'H'))
away = sum(strcmp(FTR,'A'))
clf
bar([home away draw])
ylim([0 50])
set(gca,'XTickLabel',{'home','away','draw'})
colormap(jet)
draw =

    38


home =

    45


away =

    26

Like pies? I don't, but here you go...

clf
pie([home away draw],{sprintf('home\nteam\nwins'),'away team wins','draw'})
colormap(summer(3))

Let's put it into a matrix. This is effectively a two-dimensional histogram of final scores.

ma = max(FTAG);
mh = max(FTHG);
mt = max(ma,mh);
g = zeros(mt+1);

for i = 1:length(FTHG);
   ag = FTAG(i) + 1;
   hg = FTHG(i) + 1;
   g(ag,hg) = g(ag,hg) + 1;
end

clf
image(0:mt,0:mt,g);
colormap(flipud(gray(max(g(:)))));
xlabel('Home Goals')
ylabel('Away Goals')
axis xy
axis square

g
g =

     8    13     6     5     1     1     0
     3    18     8     2     3     0     1
     3     8    10     3     2     0     0
     2     3     4     2     0     0     0
     0     0     1     0     0     0     0
     1     0     1     0     0     0     0
     0     0     0     0     0     0     0

So the most common score is 1-1. That's soccer for you. The preponderance of home team victories is all evident here.

Finally, I wanted to clean up this plot to make it easier to read. I remembered Rob Henson's excellent HEATMAPTEXT contribution to the File Exchange. This does a much nicer job than my poor plot above.

clf
heatmaptext(g,'FontColor','red');
colormap(flipud(bone));
xlabel('Home Goals')
ylabel('Away Goals')
set(gca,'XTickLabel',0:6)
set(gca,'YTickLabel',0:6)
axis on
axis xy

In creating this document, I appreciated the convenient new PUBLISH tab on the MATLAB toolstrip.

Conclusions

None of this is groundbreaking in terms of analysis, but it was a lot of fun and extremely easy. With the Data Import tool, the new toolstrip, publishing, and a little spice from the File Exchange, it brought together so much of what makes MATLAB appealing. Simple code leveraged through a powerful environment with eye-catching results.


Get the MATLAB code

Published with MATLAB® R2012b

Comments are closed.

These postings are the author's and don't necessarily represent the opinions of MathWorks.