England, Football, and Comma-Separated Tables
I recently had the pleasure of visiting the UK for the MATLAB Expo in Birmingham, England. As part of that visit, I gave a talk introducing the new features in MATLAB 2012b. As much as I enjoyed that, the most exciting part of my day was meeting Yi Cao, one of the real rock stars of the MATLAB Programming Contest. Here we are shortly after my talk.
Yi Cao's analysis of the Peg Solitaire contest (which he won) is still a classic.
Since I was in England talking about the newest version of MATLAB, I was inspired to use some of the latest features to analyze (sorry, analyse) football scores. And by football, I mean the game where you actually kick the ball with your foot.
One of the new features I'm most excited about is the new Data Import tool. I was itching to try it out on some real data. At about this time, by good fortune, one of my MathWorks UK colleagues pointed me to an online CSV file with the latest results from the Premier League.
I'm interested in only six of a vast forest of columns in the table:
The Date column is particularly noteworthy because importing dates has always been painful.
A touch of the mouse gave me an IMPORT_FOOTBALL_DATA function that I can embed in this example file.
Is there an evident home and away skew?
Yes there is. The away team is much more likely to be shut out than the home team. No surprises here: you are more likely to win a home game than an away game.
Like pies? I don't, but here you go...
Let's put it into a matrix. This is effectively a two-dimensional histogram of final scores.
So the most common score is 1-1. That's soccer for you. The preponderance of home team victories is all evident here.
Finally, I wanted to clean up this plot to make it easier to read. I remembered Rob Henson's excellent HEATMAPTEXT contribution to the File Exchange. This does a much nicer job than my poor plot above.
In creating this document, I appreciated the convenient new PUBLISH tab on the MATLAB toolstrip.

Contents
Import the Data
With the improved Import Tool, iporting the data really is as simple as double-clicking on the CSV file.
- Date
- Home team name
- Away team name
- Home team score, FTHG
- Away team score, FTAG
- Result, FTR (H = home team win, A = away team win, D = draw)
filename = 'E0.csv';
[Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR] = import_football_data(filename);
I can honestly say that I would never have written this data importing code. I'm capable of writing it, and if critical work depended on it, I would have. It's too much of a nuisance most of the time. But now it's so easy with the Data Import tool, away I go.
Number of Goals per Game
With the data safely in hand, we can start making pictures. First of all, how many games are we talking about?nGames = length(FTR)
nGames = 109What's the histogram like for total goals in a game?
hist(FTHG + FTAG,0:7)
title('Histogram of Total Goals in a Soccer Game');

subplot(2,1,1) hist(FTHG,0:7) ylim([0 50]) title('Goals by the Home Team'); subplot(2,1,2) hist(FTAG,0:7) ylim([0 50]) title('Goals by the Away Team');

draw = sum(strcmp(FTR,'D')) home = sum(strcmp(FTR,'H')) away = sum(strcmp(FTR,'A')) clf bar([home away draw]) ylim([0 50]) set(gca,'XTickLabel',{'home','away','draw'}) colormap(jet)
draw = 38 home = 45 away = 26

clf pie([home away draw],{sprintf('home\nteam\nwins'),'away team wins','draw'}) colormap(summer(3))

ma = max(FTAG); mh = max(FTHG); mt = max(ma,mh); g = zeros(mt+1); for i = 1:length(FTHG); ag = FTAG(i) + 1; hg = FTHG(i) + 1; g(ag,hg) = g(ag,hg) + 1; end clf image(0:mt,0:mt,g); colormap(flipud(gray(max(g(:))))); xlabel('Home Goals') ylabel('Away Goals') axis xy axis square g
g = 8 13 6 5 1 1 0 3 18 8 2 3 0 1 3 8 10 3 2 0 0 2 3 4 2 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0

clf heatmaptext(g,'FontColor','red'); colormap(flipud(bone)); xlabel('Home Goals') ylabel('Away Goals') set(gca,'XTickLabel',0:6) set(gca,'YTickLabel',0:6) axis on axis xy


コメント
コメントを残すには、ここ をクリックして MathWorks アカウントにサインインするか新しい MathWorks アカウントを作成します。