Loren on the Art of MATLAB

September 23rd, 2011

Another Possible Surprise – Ignored NaN Values

Ever have some data that might have some NaN values? And you start doing computations with these data, expecting |NaN|s to propagate... Only to find later, that the |NaN|s only went so far. Here's what's going on, for better or worse.

Contents

NaNs Play Different Roles

I know of 3 typical uses of NaN values:

  • to represent missing data
  • result of computational ambiguity (e.g., 0/0)
  • graphics directive to separate line segments in a single vector of point values

Some MATLAB Functions Ignore NaN Values

There are several MATLAB functions that ignore NaN valuess, including

min, max, any, and all Here are a few little examples in action.

any([0 nan])
all([0 nan])
all([1 nan])
any([1 nan])
ans =
     0
ans =
     0
ans =
     1
ans =
     1

There are also a bunch of functions in Statistics Toolbox that ignore |NaN|s, including these:

nanmean, nanmin, nanmax, nancov, nanstd, nansum, nanvar, nanmedian

Why Ignore NaNs?

The reason min, max, any and all ignore NaN values is for the the graphics case - where line segments in a line are separated by NaN, and finding the min and max values is useful for setting axis limits.

How Do You Use NaNs?

Have you found the tools you need to work with your data containing NaN values? Do you need to work around them or do you embrace their presence? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.13

9 Responses to “Another Possible Surprise – Ignored NaN Values”

  1. Aurélien replied on :

    Hi!

    I workaround NaN values when using the TriScatteredInterp function to perform data interpolation.

    And I must say that the MATLAB documentation is definitely excellent for this subject
    “Addressing Problems in Scattered Data Interpolation”
    http://www.mathworks.fr/help/techdoc/math/bsou4rj-1.html#bsphz74-1

  2. Laurens Bakker replied on :

    Hi Loren,

    Thanks for bringing up NaN. It has been a great help speeding up code for me. I often need to simulate functions in space, and approximate this by letting them play out on a lattice. Let’s take imitation as an example. An entity in space gradually becomes more like its vicinity.

    function newC = imitate( inertia, ownState, neighbourState )
        newC = inertia*ownState + (1-inertia)*nanmean(neighbourState,1);
    end
    

    Every point in such a lattice has a set number of neighbours (e.g. 4 or 8 in the case of a square lattice). The information (State in the example) needs to be grouped in columns so that it can be operated on in parallel. The variable neighbourState is then a [Nneighbours x Nentities] matrix containing the required information.

    So far so good, so why nanmean instead of mean? Well, memory is limited so the lattice must have a boundary, at which the simulated entities have fewer than the prescribed number of neighbours. Substituting NaNs for the values of these non-existent neighbours allows my function to deal with this case without any performance penalty and, more importantly, without any adjustments to the code.

  3. Jan Simon replied on :

    I use NaN values for not recoreded points in measured 3D trajectories. Unfortunately in arithmetic operations the NaN’s need much more processing time:

    X = rand(1, 1e6);
    tic; for i = 1:100, y = x + 3; end; toc
    % >> 2.44 sec
    tic; for i = 1:100, y = x + NaN; end; toc
    % >> 14.44 sec
    

    Matlab 2009a, WinXP, Core2Duo 2.3 GHz
    Obviously PLUS is not implemented in SEE, because in opposite to the floating point unit, SSE operations process NaN’s (and Inf’s) with the same speed as valid numbers. On AMD processors the slowdown should be less significant due to another design of the FPU.

    Therefore for large arrays it is cheaper to use zeros and store the information about missing points as separate LOGICAL array.

    Kind regards, Jan

  4. Daniel Armyr replied on :

    NANs appear all the time in calculations. It is nice that there are special functions to handle them in the statistics toolbox. However, I think this functionality should be incorporated into the normal min/max, but as an option flag on the end.

  5. Brett Shoelson replied on :

    Loren, I just wanted to point out that I have shared a couple of relevant files–NANCUMSUM and NANCUMPROD–on the MATLAB Central File Exchange. They allow for 4 different treatments of NaNs.
    Cheers,
    Brett

  6. Hans replied on :

    Dear Loren,

    A late reply to your NaN post. I wanted to ask why and point out that NaN can also slow down your program substantially. I created a large matrix that included a lot of NaN. Next, I added the matrix to my figure userdata so that my UI could use it when necessary. It appeared that my UI (kind of paint application) became extremely slow even though I did not use the matrix. I only obtained the userdata everytime the mouse moved. Removing the NaNs from the matrix was the solution. I assume this happens because Matlab needs more time to do some kind of checking of the matrix. Can you explain?

    thanks,
    Hans

  7. Loren Shure replied on :

    Hans-

    I believe it is more work on the part of the chips and their libraries to spend time dealing with NanS. I don’t think it’s particularly anything special MATLAB itself is doing. I’ll see if I can verify that.

    –Loren

  8. Hans replied on :

    Dear Loren,

    Thanks for the prompt reply. Actually I don’t really know how NaNs are supported, I always assumed this was done by the Matlab software. Do I understand you correct that this is implemented in hardware?

    thanks,

    -Hans

  9. Loren Shure replied on :

    Hans-

    MATLAB handles a tiny bit of it. More is handled in the libraries we build MATLAB on top of. SOME might be done in hardware as well. Not a clear answer, I realize.

    –Loren


MathWorks
Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.