Top Files and Authors

Sean's going to take this week to celebrate the top files and authors of the File Exchange.

As you may know by now, MATLAB Central is celebrating its 15th birthday. Let's start by making it a File Exchange based birthday cake!

HappyBirthday({'MATLAB' 'Central'}, 15)

Top Files

I figured an interesting thing to look at would be the top files of all time and the distribution downloads based on the total number of downloads for each file.

T = readtable('fx_downloads.xlsx');
T = sortrows(T,'total_downloads','descend');

barh(T.total_downloads(1:15));
ax = gca;
ax.YTickLabel = T.title(1:15);
ax.YDir = 'reverse';
ax.XAxis.Exponent = 0;
ax.YAxis.TickLabelInterpreter = 'none';
title('Top 15 Files')

It's not a surprise to me at all to see export_fig at the top. We'll dig into it a bit more later. There are also three Arduino support packages up there. This isn't too surprising either given the popularity of Arduinos in recent years.

What about the distribution in number of downloads of all of the files? Let's look at a histogram of the number of files binned by number of downloads. Note, the log scale.

histogram(T.total_downloads, [logspace(0,5,30) inf])
set(gca, 'XScale', 'log')
ylabel('Number of Files')
title('Download Distribution')

Top Authors

Author = varfun(@sum,T,'GroupingVariables','Creators_name','InputVariables','total_downloads');
summary(Author)
Variables:

Creators_name: 10468x1 cell string

GroupCount: 10468x1 double
Values:

min         1
median      1
max       189

Values:

min                1
median          1154
max       6.9059e+05



So it looks like there are 10468 unique authors. Most people submit only one file and one person has submitted 189 files. Who's that?

disp(Author(Author.GroupCount == 189,:))
         Creators_name          GroupCount    sum_total_downloads
________________________    __________    ___________________

'Antonio Trujillo-Ortiz'    189           3.7709e+05



What does the distribution of submitted files per author look like?

histogram(Author.GroupCount)
set(gca,'XScale','log')
axis tight
xlabel('Number of Files')
ylabel('Number of Authors')
title('Number of Files per Author')

Author = sortrows(Author,'sum_total_downloads','descend');

ax = gca;
ax.YTickLabel = Author.Creators_name(1:15);
ax.YDir = 'reverse';
ax.XAxis.Exponent = 0;
ax.YAxis.TickLabelInterpreter = 'none';
title('Top 15 Authors')

So what about export_fig? It used to belong to Oliver Woodford, the original author. In August 2015, Yair Altman took over maintenance and ownership of it. It's only fair that we give Oliver credit for the years he owned it.

I have another file that has export_fig's history. Read it in convert the date to datetime for logical indexing and plotting. The original format was 'yyyyMmm', e.g. 2016M07 for July, 2016.

HistoryExportFig = readtable('monthly-export_fig_Downloads.xlsx');
summary(HistoryExportFig)
Variables:

Values:

min       01-Apr-2009
median    16-Nov-2012
max       01-Jul-2016

SourceFileId: 88x1 double
Description:  Original column heading: 'Source File Id'
Values:

min       23629
median    23629
max       23629

Values:

min          555
median    2163.5
max         4082



How has export_fig been used with time?

plot(HistoryExportFig.MonthName_Download, HistoryExportFig.FileDownloadCount)
Aug15 = datetime(2015,8,0);
hold on
h = plot([Aug15 Aug15],ylim);
legend(h,'Yair Takes Over','location','northwest')
xlabel('Time')
title('Monthly Export Fig Downloads')

So it looks like export_fig use is in decline. But don't worry, I don't think it's Yair's fault! MATLAB R2014b included a new graphics system in MATLAB. With this printing has become much improved which has removed usecases where export_fig really helped; for example, with antialiasing. As users migrate to newer releases, I'd expect to see this trend continue.

So what happens if we give Oliver credit for the export_fig downloads leading up to August, 2015?

% Sum the file downloads before ownership transferred

% Add it to Oliver's count
idxOliver = find(strcmp(Author.Creators_name,'Oliver Woodford'));

% Re-sort
Author = sortrows(Author,'sum_total_downloads','descend');

Is it enough to bring Oliver into the top 15?

idxOliver = find(strcmp(Author.Creators_name,'Oliver Woodford'));
disp(['Oliver''s ranking: ' num2str(idxOliver)])
Oliver's ranking: 23


Not quite, but it brings him from 129th down to 23rd!