In the last few weeks, Sean and Jiro have done some nice posts about File Exchange activity over the years. Sean wrote about top files and authors and then Jiro followed up with most active entries. Jiro’s post reminds me of one I did three years ago on File Exchange acknowledgment trees. Both his post and mine show how groups form around contributions. Take together, these posts tell the story of long-lived and active community.
Sean’s piece starts off with a file, fx_downloads.xlsx, containing the all-time download counts for every file on the File Exchange. How did he get his hands on that? That information doesn’t appear publicly on the site. Answer: He was able to get it because he has friends on the File Exchange team, and he asked them very nicely.
We do show download numbers on the site, of course, but they’re the download counts for the last 30 days only. From time to time we get this question: why don’t we show the all-time download count for items on the File Exchange? People want to know how they’re doing. They want to have a sense of their impact on the world. They especially want to know how they’re doing relative to other people. We have the data. Why don’t we show it?
1. MAKE WAY FOR THE NEW (FILES)
One of the most important ways that people decide if a file is worth downloading is by checking how many times other people have done the same thing. When a site shows all-time download counts, it’s shining a bright light on files that have been around for a long time. If you add a file tomorrow afternoon, you’re going to have a hard time competing with a file that’s been here since 2002. We want to encourage new thinking, new connections, and growth. So we wanted to reduce this strong bias in favor of old files. No matter how well-established your file is, it’s only as good as its track record over the last 30 days. The flip side of this is that, as long as your file is at least 30 days old, you’re on equal footing with every file on the site. Nobody has a baked-in longevity benefit.
2. GET RID OF THE OLD (DATA)
We sometimes notice strange patterns in the download data. There might be unusual spikes of activity that look suspiciously like someone is “helping” their download count with a script. Other times we see extremely odd patterns that don’t look like cheating… they just look like a bot gone wild. For no apparent reason, counts across many unrelated files will balloon one month and calm down the next. To leave in counts resulting from fraud or bot glitches would be misleading. But to carefully remove them from the record would be difficult and expensive. The easiest thing to do is just disregard data that’s older than 30 days. Strange patterns still erupt from time to time, but their influence never lasts more than a month.
It’s not a perfect system, but by showing only the last 30 days’ worth of download data, we make room for cool new submissions, and we avoid the misleading data glitches of yesteryear. So now you know.
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.