File Exchange Pick of the Week

Our best user submissions

“Most Active/Interactive” File Exchange Entry

Jiro's pick this week is "Command-line peak fitter for time-series signals" by Tom O'Haver.

Continuing with the celebration of MATLAB Central's 15th birthday and previous week's blog post by Sean, I'd like to focus on all of the great interactions people have had through the File Exchange entries. Although, you may not think of the File Exchange as the next big social network, people have collaborated and exchanged conversations through the comments and rating sections of the entries. When I see a File Exchange entry with a lot of comments, I tend to think that the file is getting a lot of interest from other users. If I also see a lot of responses from the author of the file, that means that the author is actively involved with improving and helping people use the file. If there are a lot of updates to the entry, that also means that the author is actively maintaining the file.

So, I wanted to see which files had the most interactions amongst the author and the users. Disclaimer: Not all of the metrics used here are purely quantitative. I've introduced some qualitative fudge factors.

The Data

I gathered my data the brute-force way, of course using MATLAB. I went through all possible File Exchange IDs and scraped each webpage for comments and updates.

load FEX

Here's what the first few entries look like.

FEX(1:5,:)
ans = 
              Name               FEXID           Author            Comments        Updates  
    _________________________    _____    ____________________    ___________    ___________
    'central_diff.m'             12       'Robert Canfield'       [3x4 table]    [5x3 table]
    'interpsinc.m'               13       'Michael Minardi'       [1x4 table]    [1x3 table]
    'Polybase'                   15       'Giampiero Campa'       [8x4 table]    [7x3 table]
    'Toolbox BOD Version 2.8'    16       'Gert-Helge Geitner'    [1x4 table]    [7x3 table]
    'connectnames.m'             17       'Douglas Harriman'      [1x4 table]    [0x3 table]

Here are the comments from the first entry (central_diff.m, which was Picked a couple of weeks ago).

FEX.Comments{1}
ans = 
       Date              Name                                           Comment                                   Rating
    __________    ___________________    _____________________________________________________________________    ______
    2004-09-16    'godlove njie teku'    ' '                                                                      4     
    2006-08-09    'Shyang-Wen Tseng'     '<p>This is a very good and usefull add-on function.  Thank you.</p>'    4     
    2007-08-06    'Alvaro Valcarce'      '<p>I think that line 98 should be (notice the "=" sign)</p>…'           4     

And the updates for that entry.

FEX.Updates{1}
ans = 
       Date       Version                                   Description                                
    __________    _______    __________________________________________________________________________
    NaT           ''         '<p>update description</p>'             
    NaT           ''         '<p>description</p>'                    
    NaT           ''         '<p>updating description</p>'           
    2001-08-21    ''         '<p>updating</p>'                       
    2015-10-01    '2.0'      '<p>Second-order accurate forward and backward difference formulae are u…'

The Metric

To help me find the entries with the most "interactions", I first calculated the number of comments and updates from the data.

FEX.NumComments = cellfun(@height, FEX.Comments);
FEX.NumUpdates = cellfun(@height, FEX.Updates);

Next, I also wanted to know of all the comments for each entry, how many were by the author of the entry.

FEX.NumAuthorComments = cellfun(@(a,c) nnz(strcmp(a,c.Name)), ...
    FEX.Author, FEX.Comments);
FEX.NumUserComments = FEX.NumComments - FEX.NumAuthorComments;

Most Comments

Let's see which entry had the most comments.

FEX = sortrows(FEX,'NumComments','descend');
barh(FEX.NumComments(10:-1:1))
title('Number of Comments')

% Truncate the file names to the first 20 characters (for labeling)
fexNames = cellfun(@(x) x(1:min(20,length(x))),FEX.Name(10:-1:1),'UniformOutput',false);

% Axes properties
ax = gca;
ax.YLim = [0 11];
ax.YTickLabel = fexNames;
ax.TickLabelInterpreter = 'none';
ax.YTickLabelRotation = 30;

Not surprisingly, export_fig.

Most Updates

How about most number of updates?

FEX = sortrows(FEX,'NumUpdates','descend');
barh(FEX.NumUpdates(10:-1:1))
title('Number of Updates')

% Truncate the file names to the first 20 characters (for labeling)
fexNames = cellfun(@(x) x(1:min(20,length(x))),FEX.Name(10:-1:1),'UniformOutput',false);

% Axes properties
ax = gca;
ax.YLim = [0 11];
ax.YTickLabel = fexNames;
ax.TickLabelInterpreter = 'none';
ax.YTickLabelRotation = 30;

"DICOM to NIfTI converter" just beats export_fig.

Highest percentage of comments by the original author

One way to see how much the original author was involved with the user comments is to look at the percentage of author comments. (Yes, an author can be heavily involved without actually responding to comments on the File Exchange. He/she can choose to respond via email or simply update files.) To account for bias towards low number of comments, I have included an arbitrary qualification cutoff of 20 comments.

FEX.AuthorCommentRatio = FEX.NumAuthorComments ./ FEX.NumComments;

% Fix 0/0 (-> NaN) to 0
FEX.AuthorCommentRatio(isnan(FEX.AuthorCommentRatio)) = 0;

% Only look at entries with 20 or more comments
FEX = FEX(FEX.NumComments >= 20,:);

FEX = sortrows(FEX,'AuthorCommentRatio','descend');
FEX(1:5,{'Name','Author','NumComments','NumAuthorComments','NumUpdates'})
ans = 
                   Name                             Author              NumComments    NumAuthorComments    NumUpdates
    ___________________________________    _________________________    ___________    _________________    __________
    'ipf(arg1,arg2,arg3,arg4)'             'Tom O'Haver'                23             14                   39        
    'nth_element'                          'Peter Li'                   26             14                    7        
    'Tree Controls for User Interfaces'    'Robyn Jackey'               29             15                    6        
    'Wavelet Based  Image Segmentation'    'Ashutosh Kumar Upadhyay'    23             11                   16        
    'iPeak'                                'Tom O'Haver'                36             17                   30        

Great job folks!

Let me add another arbitrary qualification cutoff of 10 minimum updates.

FEX = FEX(FEX.NumUpdates >= 10,:);
FEX(1:5,{'Name','Author','NumComments','NumAuthorComments','NumUpdates'})
ans = 
                           Name                                    Author              NumComments    NumAuthorComments    NumUpdates
    __________________________________________________    _________________________    ___________    _________________    __________
    'ipf(arg1,arg2,arg3,arg4)'                            'Tom O'Haver'                 23            14                   39        
    'Wavelet Based  Image Segmentation'                   'Ashutosh Kumar Upadhyay'     23            11                   16        
    'iPeak'                                               'Tom O'Haver'                 36            17                   30        
    'Command-line peak fitter for time-series signals'    'Tom O'Haver'                120            54                   41        
    'Fast Bilateral Filter'                               'Kunal Chaudhury'             20             9                   14        

Wow, Tom is up there 3 times!! I'm a little intrigued by the 4th one, which has 120 comments with 41 updates. Let's take a closer look at the timings of those comments and updates.

% Process the 4th entry

% Break up the comments into user comments and author comments
authorCommentID = strcmp(FEX.Author{4},FEX.Comments{4}.Name);
userComments = FEX.Comments{4}(~authorCommentID,:);
authorComments = FEX.Comments{4}(authorCommentID,:);

% Create plot
h1 = scatter(datenum(userComments.Date),ones(1,height(userComments)),...
    'MarkerFaceColor','b','MarkerEdgeColor','none','MarkerFaceAlpha',0.25);
hold on
h2 = scatter(datenum(authorComments.Date),1.5*ones(1,height(authorComments)),...
    'MarkerFaceColor','r','MarkerEdgeColor','none','MarkerFaceAlpha',0.25);
h3 = plot([FEX.Updates{4}.Date FEX.Updates{4}.Date]',...
    repmat([0;0.5],1,height(FEX.Updates{4})),'Color',[.3 .7 .3],...
    'DatetimeTickFormat','uuuu');
hold off

% Axis properties
ax = gca;
ax.YLim = [0 2];
ax.YTick = [1 1.5];
ax.YTickLabel = {'Users','Author'};
ax.YTickLabelRotation = 60;
ax.YGrid = 'on';
title({FEX.Name{plotID},FEX.Author{plotID}})
ylabel('Comments')
xlabel('Date')
legend([h1;h2;h3(1)],'User Comments','Author Comments','Updates')

We can see that there is a nice balance of comments from users and Tom. The updates seem to be coming in at a nice regular interval, with updates happening recently. This is a sign that Tom has been heavily involved with interacting with users and keeping the file up-to-date.

Thank you, Tom, for being a great citizen of MATLAB Central and the File Exchange! You are what makes this community thrive.

Comments

Give this a try and let us know what you think here or leave a comment for Tom.




Published with MATLAB® R2016a

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.