This week, instead of Picking a single file for highlighting, I (Brett) would like to pose a couple of questions.
Many years ago, in 2007 shortly after I started working for MathWorks, I needed at one point to calculate the cumulative sum of a vector, ignoring NaNs. Quite honestly, I don’t even remember why I needed to do so; perhaps it was to help a customer.
Searching the MATLAB documentation, I quickly found my way to nansum in the Statistics and Machine Learning Toolbox. (This, of course, was before “Machine Learning” was included in the name of that tool!) But that function didn’t do what I needed.
Undaunted, I wrote my own version of nancumsum and promptly shared it on the File Exchange. Following a flurry of emails addressing this contribution, I modified the function to provide additional modes of operation. In its current form, there are four modes:
- REPLACE NaNs with zeros; (this is the default behavior).
- MAINTAIN NaNs as position holders; (that is, skip NaNs without reset).
- RESET sum on NaNs, replacing NaNs with zeros.
- RESET sum on NaNs, maintaining NaNs as position holders.
Just because it was easy to do, I also shared nancumprod, which multiplies cumulatively in a similar vein.
In the ten years since I shared those files, they have been downloaded thousands of times. (nancumsum is clearly way ahead of nancumprod in the number of downloads.) And the reviews have been quite good. (MATLABber extraordinaire Urs Schwarz pronounced it “a trouvaille of paramount importance to this community,” and suggested that it “should have been in [MATLAB] stock for a long time.” Thanks for that, Urs! ;) ) The functions–especially nancumsum–remain popular to this day.
So now I’m wondering why they’ve been so popular for so long. Just how are people using nancumsum and nancumprod?
And more importantly: are those who are still using those functions aware that the cumsum function now has a “nanflag” to specify that NaNs are to be ignored? (In fact, ‘omitnan’ works as of R2016b in cumsum, cummin, and cummax; and as of R2017a in prod and cumprod.)
A = [3 5 NaN 9 0 NaN] A = 3 5 NaN 9 0 NaN
B = nancumsum(A,2,1) % Work along the second (row) dimension, using mode 1 B = 3 8 8 17 17 17
C = cumsum(A,'omitnan') C = 3 8 8 17 17 17
We see that the default mode of nancumsum is obviated by the new support for ignoring NaNs in cumsum!
There are other modes of nancumsum, not supported by the MATLAB functions:
B2 = nancumsum(A,2,2) B2 = 3 8 NaN 17 17 NaN B3 = nancumsum(A,2,3) B3 = 3 8 0 9 9 0 B4 = nancumsum(A,2,4) B4 = 3 8 NaN 9 9 NaN
Does anyone have any thoughts to share on the subject? Should I leave the files on the Exchange, or should I remove them to promote usage of the fully supported MATLAB functions? Do the alternate modes justify keeping the files alive?
Your thoughts and comments on the topic are very welcome!
Published with MATLAB® R2017a
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.