Loren on the Art of MATLAB

May 17th, 2011

What a Difference It Makes!

I've been exploring some older functions in MATLAB recently and trying to characterize them. The function diff is one of them. And I realized it has a really unusual behavior when the optional dim input argument is omitted.

Contents

From the Documentation

Look in the documentation for diff under the section entitled "Tips". You will see this.

Since each iteration of diff reduces the length of X along dimension dim, it is possible to specify an order n sufficiently high to reduce dim to a singleton (size(X,dim) = 1) dimension. When this happens, diff continues calculating along the next nonsingleton dimension.

!!!

What! Try these statements.

size(diff(ones(2,3,4),5))
ans =
     1     1     2
diff(ones(2,3,4),6)
ans =
     0
diff(ones(2,3,4),7)
ans =
     []

I know this behavior was not in the initial implementation of diff since I wrote it and would never be that cruel. But it did get added later. I can't see how it is useful at all. I will always use a specific dim input myself to ensure robustness of my code. But perhaps that's a failure of my imagination.

Is This Behavior Useful?

If anyone can come forth with a useful application of this aspect of the function diff, and prove it to me with working code in an application, I will spot the first such respondent with some MATLAB bling. Let me know here.


Get the MATLAB code

Published with MATLAB® 7.12

19 Responses to “What a Difference It Makes!”

  1. Gautam Vallabha replied on :

    My imagination is failing as well, but I wonder if this behavior was just an attempt to harmonize the following two guarantees in the doc for diff:

    1) diff(X) returns the differences calculated along the first non-singleton (size(X,dim) > 1) dimension of X.

    2) Y = diff(X,n) applies diff recursively n times, resulting in the nth difference. Thus, diff(X,2) is the same as diff(diff(X))

    Taken separately, each of the guarantees seems reasonable, but together they imply that A and B in the code below are guaranteed to be equal:

    x = ones(2,3,4);
    A = diff(x,3);
    B = diff(diff(diff(x)));
    assert(all(A == B))
    

    The alternative is to relax one of the above two guarantees in the case of DIFF(x,N), when N > 1.

    Gautam

  2. Loren replied on :

    Thanks, Gautam!

    Interesting point.

    –Loren

  3. Jotaf replied on :

    I can think of quite a few examples where it’s going to be catastrophic — say you have a 2D array X with N vectors of M samples each. You innocently take the second derivative diff(X,2). In the case when M < 2 it will eat away at the next dimension and it will no longer be an array of N vectors! Any subsequent code relying on that assumption will be broken. It also doesn’t make sense physically — where did that last vector go? Vectorized code often relies on the fact that one (or more) dimensions stay the same size and you can work on the others freely.

    I understand it must have been a desire to make diff(x,n) consistent with recursive application of diff(x). Personally, I don’t like to rely on Matlab finding the first singleton dimension — if it’s a vector it’s fine, but if it’s a matrix I always prefer to specify the dimension myself :)

  4. Loren replied on :

    Jotaf-

    I understand users can be surprised with diff. But I am looking to see if there are particular instances that DEPEND on this capability?

    Can you point to a place in your own code where you depend on this, or do you actively avoid it by specifying the dim input?

    –Loren

  5. Jotaf replied on :

    I tried to concoct a code snippet using that feature, but my imagination failed me. In day-to-day code I actively avoid such quirks by always specifying DIM if the main input is not a vector. Still, it would be neat to see someone come up with an imaginative use of this feature :)

  6. the cyclist replied on :

    Not related to diff(), but along the theme of surprises that can come from singleton dimensions:

    In a company where I used to work, our “style guide” banned the use of the squeeze() function by our MATLAB programmers, because of unexpected behaviors when a particular dimension was not normally length one, but could be. We found that permute() was a more robust way to handle any situation where squeeze might be contemplated.

  7. Loren replied on :

    Thanks, Cyclist.

    I also don’t use squeeze and use reshape explicitly instead. I think you may find reshape takes less time than permute sometimes (but I am not sure of that).

    –Loren

  8. Daniel Armyr replied on :

    OK, since noone has managed anytning, here goes one convoluted case:

    Assume that you have a suspension bridge in the jungle made from ropes and pieces of wood. The shape of the bridge is defined by the heights above the ground of each point where a plank meets the rope. We therefore describe the shape of the bridge as a 2xn matrix where n is the number of planks in the bridge. Calling the matrix A, it is clear that diff(A) is a 1xn matrix describing the sideways slant of each plank. Calling diff(A,2) we get a 1xn-1 matrix that describes the difference in slant along the bridge, which is the twist of the bridge.

    I told you it was forced, but this is a calculation that can efficiently be done using the current behaviour of diff.

    –DA

  9. Loren replied on :

    Daniel-

    Thanks for showing us an example. I’m still not convinced that the current diff behavior is “worth” it.

    –loren

  10. Daniel Armyr replied on :

    I want to be very, very clear that I agree with the rest of you that this is bizarr behaviour, and can at best be motivated by the legaleze-style argument that Gautam Vallabha posted.

    –DA

  11. Stuart Murray replied on :

    Thanks for this Loren. I agree it doesn’t promote robust programming – I think it’s always a good idea for your code to know along which dimension diff should operate, rather than leaving it to MATLAB to figure it out when it runs out of dimensions.

    A slightly esoteric, yet simple application where the behaviour wouldn’t cause problems would be the following: suppose we have a sequence of numbers arranged as columns in an array

    a = 
    
               3           7        2923       17583
             -17         223        5007       24703
             -47         693        7993       33757
             -57        1543       12103       45063
    

    We might inspect the numbers for some underlying difference pattern by repeated use of diff; eventually:

    >> diff(a,4)
    
    ans =
    
        96    96    96
    

    leading us to believe the sequence might be defined by a polynomial of fourth degree. Admittedly this is much better solved starting with

    a=a(:)
    

    but my imagination also ran into the buffers!

  12. Loren replied on :

    Thanks, Stuart-

    Interesting idea. Thankfully, as you said, there is a “better” way to do the task :-)

    –Loren

  13. Andy replied on :

    There is a diff-erent (see what I did there?) quirk of diff that I find annoying: I don’t like that the iteration argument comes before the dimension argument. For some functions that could operate on various dimensions of an array (sum, mean, etc.) the dimension argument is the second argument. For other functions (diff, std, etc.) the dimension argument is third. I think in all of these cases the dimension argument should come second.

    This is somewhat related to the quirk that you’ve discussed. If the call to diff was diff(A, dim, iters), then you make sure whenever somebody iterates diff that they’ve specified the dimension so that the iterations act consistently on A. (Or they enter [] or something for dim, explicitly stating that they want this inconsistent behavior.)

    MATLAB would also gain overall consistency with these functions taking arguments in a consistent way. In this context, it makes sense for the dimension argument to come first because it is common to more functions that operate on arrays. The iteration argument for diff (or the flag value for std) are specific arguments to those functions that are not as common.

  14. Loren replied on :

    Andy-

    Thanks for your input. I will make sure the relevant group discusses this at MathWorks.

    –Loren

  15. Joe Kirk replied on :

    What if someone wanted to use this multi-dimensional recursive feature of the diff function to determine if a matrix has certain specific properties? For example, the matrix returned by the function pascal has a recursive diff result equal to 1:

    n = 6;
    p = pascal(n);
    s = size(p);
    d = diff(p,sum(s-1))
    isSpecial = (d == 1)
    

    If someone was attempting to generalize the pascal matrix (or some other type of matrix) to higher dimensions, for example, they may find it useful to check their result using this kind of test. It’s a stretch, but I cannot think of anything else.

  16. Loren replied on :

    Joe-

    Thanks for the idea. I hadn’t thought about that (I agree it’s probably still a stretch, but still…).

    –Loren

  17. Nathaniel replied on :

    I realise it’s good to bring attention to “potentially annoying” behaviour of functions, and to have a challenge of guessing.

    But presumably Mathworks uses (and has done since you wrote matlab’s first diff) thorough version control.
    Can’t you easily find when the change happened, and by whom … it could be interesting to resolve the guesses about why it /was/ done (although it’s still interesting to have discussed the possible good and bad points, regardless of the author’s intention).

    I see that as of R2010a, diff is a built-in,
    $Revision: 5.25.4.4 $ $Date: 2005/06/21 19:23:53 $

  18. Loren replied on :

    Nathaniel-

    Of course we did go through our records – the built-in version was just mimicking the last MATLAB code version. We do know when the behavior was introduced but sadly there was no reason stated in the records or the code.

    –Loren

  19. Fred Sigworth replied on :

    I would much prefer that diff(X) did not return a shorter vector, but a vector of the same size as X. As it is, every time I use diff I am padding the returned vector with zeros. And, if it did not shorten the returned vector, the undesired behavior regarding dimensions would also go away!


MathWorks
Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.