This week, instead of Picking a single file for highlighting, I would like to pose a couple of questions.
Many years ago, in 2007 shortly after I started working for MathWorks, I needed at one point to calculate the cumulative sum of a vector, ignoring NaNs. Quite honestly, I don’t even remember why I needed to do so; perhaps it was to help a customer.
Searching the MATLAB documentation, I quickly found my way to nansum in the Statistics and Machine Learning Toolbox. (This, of course, was before “Machine Learning” was included in the name of that tool!) But that function didn’t do what I needed.
Undaunted, I wrote my own version of nancumsum and promptly shared it on the File Exchange. Following a flurry of emails addressing this contribution, I modified the function to provide additional modes of operation. In its current form, there are four modes:
- REPLACE NaNs with zeros; (this is the default behavior).
- MAINTAIN NaNs as position holders; (that is, skip NaNs without reset).
- RESET sum on NaNs, replacing NaNs with zeros.
- RESET sum on NaNs, maintaining NaNs as position holders.
Just because it was easy to do, I also shared nancumprod, which multiplies cumulatively in a similar vein.
In the ten years since I shared those files, they have been downloaded thousands of times. (nancumsum is clearly way ahead of nancumprod in the number of downloads.) And the reviews have been quite good. (MATLABber extraordinaire Urs Schwarz pronounced it “a trouvaille of paramount importance to this community,” and suggested that it “should have been in [MATLAB] stock for a long time.” Thanks for that, Urs! ;) ) The functions–especially nancumsum–remain popular to this day.
So now I’m wondering why they’ve been so popular for so long. Just how are people using nancumsum and nancumprod?
And more importantly: are those who are still using those functions aware that the cumsum function now has a “nanflag” to specify that NaNs are to be ignored? (In fact, ‘omitnan’ works as of R2016b in cumsum, cummin, and cummax; and as of R2017a in prod and cumprod.)
A = [3 5 NaN 9 0 NaN] A = 3 5 NaN 9 0 NaN
B = nancumsum(A,2,1) % Work along the second (row) dimension, using mode 1 B = 3 8 8 17 17 17
C = cumsum(A,'omitnan') C = 3 8 8 17 17 17
We see that the default mode of nancumsum is obviated by the new support for ignoring NaNs in cumsum!
There are other modes of nancumsum, not supported by the MATLAB functions:
B2 = nancumsum(A,2,2) B2 = 3 8 NaN 17 17 NaN B3 = nancumsum(A,2,3) B3 = 3 8 0 9 9 0 B4 = nancumsum(A,2,4) B4 = 3 8 NaN 9 9 NaN
Does anyone have any thoughts to share on the subject? Should I leave the files on the Exchange, or should I remove them to promote usage of the fully supported MATLAB functions? Do the alternate modes justify keeping the files alive?
Your thoughts and comments on the topic are very welcome!
Published with MATLAB® R2017a
3 CommentsOldest to Newest
Please leave the files on the file exchange. People will occasionally need them for legacy code, especially when passed on by a colleague who might not have built a neat library of their file exchange downloads. Also there will be users who are stuck with an old version of Matlab because of slow admin.
Perhaps there’s a role for a file exchange archive section for code contributions that are now in default Matlab, or superseded by better code contributions?
I’d leave them, but add a section right at the front of a readme file that says “many of these routines have been replaced by built-in matlab functions. You may be better off using them rather than these.”
Thanks for the input. I’m inclined to leave them, with a note (as you suggested, Bruce). I’m still interested in hearing use cases, though. Are there rationales for adding modes 2–4 to the MATLAB functions?