File Exchange Pick of the Week

Our best user submissions

Top Files and Authors

Sean's going to take this week to celebrate the top files and authors of the File Exchange.

As you may know by now, MATLAB Central is celebrating its 15th birthday. Let's start by making it a File Exchange based birthday cake!

HappyBirthday({'MATLAB' 'Central'}, 15)

Contents

Top Files

I figured an interesting thing to look at would be the top files of all time and the distribution downloads based on the total number of downloads for each file.

T = readtable('fx_downloads.xlsx');
T = sortrows(T,'total_downloads','descend');

And the 15 most downloaded files are:

barh(T.total_downloads(1:15));
ax = gca;
ax.YTickLabel = T.title(1:15);
ax.YDir = 'reverse';
ax.XAxis.Exponent = 0;
ax.YAxis.TickLabelInterpreter = 'none';
xlabel('Total Downloads')
title('Top 15 Files')

It's not a surprise to me at all to see export_fig at the top. We'll dig into it a bit more later. There are also three Arduino support packages up there. This isn't too surprising either given the popularity of Arduinos in recent years.

What about the distribution in number of downloads of all of the files? Let's look at a histogram of the number of files binned by number of downloads. Note, the log scale.

histogram(T.total_downloads, [logspace(0,5,30) inf])
set(gca, 'XScale', 'log')
xlabel('Total Downloads')
ylabel('Number of Files')
title('Download Distribution')

Top Authors

So which authors have the most files and downloads?

Sum the total number of downloads grouping by author.

Author = varfun(@sum,T,'GroupingVariables','Creators_name','InputVariables','total_downloads');
summary(Author)
Variables:

    Creators_name: 10468x1 cell string

    GroupCount: 10468x1 double
        Values:

            min         1         
            median      1         
            max       189         

    sum_total_downloads: 10468x1 double
        Values:

            min                1           
            median          1154           
            max       6.9059e+05           

So it looks like there are 10468 unique authors. Most people submit only one file and one person has submitted 189 files. Who's that?

disp(Author(Author.GroupCount == 189,:))
         Creators_name          GroupCount    sum_total_downloads
    ________________________    __________    ___________________

    'Antonio Trujillo-Ortiz'    189           3.7709e+05         

What does the distribution of submitted files per author look like?

histogram(Author.GroupCount)
set(gca,'XScale','log')
axis tight
xlabel('Number of Files')
ylabel('Number of Authors')
title('Number of Files per Author')

What about the most downloaded author?

Author = sortrows(Author,'sum_total_downloads','descend');

barh(Author.sum_total_downloads(1:15));
ax = gca;
ax.YTickLabel = Author.Creators_name(1:15);
ax.YDir = 'reverse';
ax.XAxis.Exponent = 0;
ax.YAxis.TickLabelInterpreter = 'none';
xlabel('Total Downloads')
title('Top 15 Authors')

So what about export_fig? It used to belong to Oliver Woodford, the original author. In August 2015, Yair Altman took over maintenance and ownership of it. It's only fair that we give Oliver credit for the years he owned it.

I have another file that has export_fig's history. Read it in convert the date to datetime for logical indexing and plotting. The original format was 'yyyyMmm', e.g. 2016M07 for July, 2016.

HistoryExportFig = readtable('monthly-export_fig_Downloads.xlsx');
HistoryExportFig.MonthName_Download = datetime(HistoryExportFig.MonthName_Download,'InputFormat','yyyy''M''MM');
summary(HistoryExportFig)
Variables:

    MonthName_Download: 88x1 datetime
        Description:  Original column heading: 'Month Name - Download'
        Values:

            min       01-Apr-2009         
            median    16-Nov-2012         
            max       01-Jul-2016         

    SourceFileId: 88x1 double
        Description:  Original column heading: 'Source File Id'
        Values:

            min       23629         
            median    23629         
            max       23629         

    FileDownloadCount: 88x1 double
        Description:  Original column heading: 'File Download Count'
        Values:

            min          555             
            median    2163.5             
            max         4082             

How has export_fig been used with time?

plot(HistoryExportFig.MonthName_Download, HistoryExportFig.FileDownloadCount)
Aug15 = datetime(2015,8,0);
hold on
h = plot([Aug15 Aug15],ylim);
legend(h,'Yair Takes Over','location','northwest')
xlabel('Time')
ylabel('Monthly Downloads')
title('Monthly Export Fig Downloads')

So it looks like export_fig use is in decline. But don't worry, I don't think it's Yair's fault! MATLAB R2014b included a new graphics system in MATLAB. With this printing has become much improved which has removed usecases where export_fig really helped; for example, with antialiasing. As users migrate to newer releases, I'd expect to see this trend continue.

So what happens if we give Oliver credit for the export_fig downloads leading up to August, 2015?

% Sum the file downloads before ownership transferred
beforeAug15 = HistoryExportFig.MonthName_Download < datetime(2015,8,18);
export_fig_Oliver = sum(HistoryExportFig.FileDownloadCount(beforeAug15));

% Add it to Oliver's count
idxOliver = find(strcmp(Author.Creators_name,'Oliver Woodford'));
Author.sum_total_downloads(idxOliver) = Author.sum_total_downloads(idxOliver)+export_fig_Oliver;

% Re-sort
Author = sortrows(Author,'sum_total_downloads','descend');

Is it enough to bring Oliver into the top 15?

idxOliver = find(strcmp(Author.Creators_name,'Oliver Woodford'));
disp(['Oliver''s ranking: ' num2str(idxOliver)])
Oliver's ranking: 23

Not quite, but it brings him from 129th down to 23rd!

Comments

What has been your "Top File" ever? Is there any other way you'd like me to slice this data? Let us know here!




Published with MATLAB® R2016a

|

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.