I've been exploring some older functions in MATLAB recently and trying to characterize them. The function diff is one of them. And I realized it has a really unusual behavior when the optional dim input argument is omitted.
Contents
From the Documentation
Look in the documentation for diff under the section entitled "Tips". You will see this.
Since each iteration of diff reduces the length of X along dimension dim, it is possible to specify an order n sufficiently high to reduce dim to a singleton (size(X,dim) = 1) dimension. When this happens, diff continues calculating along the next nonsingleton dimension.
!!!
What! Try these statements.
size(diff(ones(2,3,4),5))
ans =
1 1 2
diff(ones(2,3,4),6)
ans =
0
diff(ones(2,3,4),7)
ans =
[]
I know this behavior was not in the initial implementation of diff since I wrote it and would never be that cruel. But it did get added later. I can't see how it is useful at all. I will always use a specific dim input myself to ensure robustness of my code. But perhaps that's a failure of my imagination.
Is This Behavior Useful?
If anyone can come forth with a useful application of this aspect of the function diff, and prove it to me with working code in an application, I will spot the first such respondent with some MATLAB bling. Let me know here.
Get
the MATLAB code
Published with MATLAB® 7.12



My imagination is failing as well, but I wonder if this behavior was just an attempt to harmonize the following two guarantees in the doc for diff:
1) diff(X) returns the differences calculated along the first non-singleton (size(X,dim) > 1) dimension of X.
2) Y = diff(X,n) applies diff recursively n times, resulting in the nth difference. Thus, diff(X,2) is the same as diff(diff(X))
Taken separately, each of the guarantees seems reasonable, but together they imply that A and B in the code below are guaranteed to be equal:
The alternative is to relax one of the above two guarantees in the case of DIFF(x,N), when N > 1.
Gautam
Thanks, Gautam!
Interesting point.
–Loren
I can think of quite a few examples where it’s going to be catastrophic — say you have a 2D array X with N vectors of M samples each. You innocently take the second derivative diff(X,2). In the case when M < 2 it will eat away at the next dimension and it will no longer be an array of N vectors! Any subsequent code relying on that assumption will be broken. It also doesn’t make sense physically — where did that last vector go? Vectorized code often relies on the fact that one (or more) dimensions stay the same size and you can work on the others freely.
I understand it must have been a desire to make diff(x,n) consistent with recursive application of diff(x). Personally, I don’t like to rely on Matlab finding the first singleton dimension — if it’s a vector it’s fine, but if it’s a matrix I always prefer to specify the dimension myself :)
Jotaf-
I understand users can be surprised with diff. But I am looking to see if there are particular instances that DEPEND on this capability?
Can you point to a place in your own code where you depend on this, or do you actively avoid it by specifying the dim input?
–Loren
I tried to concoct a code snippet using that feature, but my imagination failed me. In day-to-day code I actively avoid such quirks by always specifying DIM if the main input is not a vector. Still, it would be neat to see someone come up with an imaginative use of this feature :)
Not related to diff(), but along the theme of surprises that can come from singleton dimensions:
In a company where I used to work, our “style guide” banned the use of the squeeze() function by our MATLAB programmers, because of unexpected behaviors when a particular dimension was not normally length one, but could be. We found that permute() was a more robust way to handle any situation where squeeze might be contemplated.
Thanks, Cyclist.
I also don’t use squeeze and use reshape explicitly instead. I think you may find reshape takes less time than permute sometimes (but I am not sure of that).
–Loren
OK, since noone has managed anytning, here goes one convoluted case:
Assume that you have a suspension bridge in the jungle made from ropes and pieces of wood. The shape of the bridge is defined by the heights above the ground of each point where a plank meets the rope. We therefore describe the shape of the bridge as a 2xn matrix where n is the number of planks in the bridge. Calling the matrix A, it is clear that diff(A) is a 1xn matrix describing the sideways slant of each plank. Calling diff(A,2) we get a 1xn-1 matrix that describes the difference in slant along the bridge, which is the twist of the bridge.
I told you it was forced, but this is a calculation that can efficiently be done using the current behaviour of diff.
–DA
Daniel-
Thanks for showing us an example. I’m still not convinced that the current diff behavior is “worth” it.
–loren
I want to be very, very clear that I agree with the rest of you that this is bizarr behaviour, and can at best be motivated by the legaleze-style argument that Gautam Vallabha posted.
–DA
Thanks for this Loren. I agree it doesn’t promote robust programming – I think it’s always a good idea for your code to know along which dimension diff should operate, rather than leaving it to MATLAB to figure it out when it runs out of dimensions.
A slightly esoteric, yet simple application where the behaviour wouldn’t cause problems would be the following: suppose we have a sequence of numbers arranged as columns in an array
a = 3 7 2923 17583 -17 223 5007 24703 -47 693 7993 33757 -57 1543 12103 45063We might inspect the numbers for some underlying difference pattern by repeated use of diff; eventually:
>> diff(a,4) ans = 96 96 96leading us to believe the sequence might be defined by a polynomial of fourth degree. Admittedly this is much better solved starting with
but my imagination also ran into the buffers!
Thanks, Stuart-
Interesting idea. Thankfully, as you said, there is a “better” way to do the task :-)
–Loren
There is a diff-erent (see what I did there?) quirk of diff that I find annoying: I don’t like that the iteration argument comes before the dimension argument. For some functions that could operate on various dimensions of an array (sum, mean, etc.) the dimension argument is the second argument. For other functions (diff, std, etc.) the dimension argument is third. I think in all of these cases the dimension argument should come second.
This is somewhat related to the quirk that you’ve discussed. If the call to diff was diff(A, dim, iters), then you make sure whenever somebody iterates diff that they’ve specified the dimension so that the iterations act consistently on A. (Or they enter [] or something for dim, explicitly stating that they want this inconsistent behavior.)
MATLAB would also gain overall consistency with these functions taking arguments in a consistent way. In this context, it makes sense for the dimension argument to come first because it is common to more functions that operate on arrays. The iteration argument for diff (or the flag value for std) are specific arguments to those functions that are not as common.
Andy-
Thanks for your input. I will make sure the relevant group discusses this at MathWorks.
–Loren
What if someone wanted to use this multi-dimensional recursive feature of the diff function to determine if a matrix has certain specific properties? For example, the matrix returned by the function pascal has a recursive diff result equal to 1:
If someone was attempting to generalize the pascal matrix (or some other type of matrix) to higher dimensions, for example, they may find it useful to check their result using this kind of test. It’s a stretch, but I cannot think of anything else.
Joe-
Thanks for the idea. I hadn’t thought about that (I agree it’s probably still a stretch, but still…).
–Loren
I realise it’s good to bring attention to “potentially annoying” behaviour of functions, and to have a challenge of guessing.
But presumably Mathworks uses (and has done since you wrote matlab’s first diff) thorough version control.
Can’t you easily find when the change happened, and by whom … it could be interesting to resolve the guesses about why it /was/ done (although it’s still interesting to have discussed the possible good and bad points, regardless of the author’s intention).
I see that as of R2010a, diff is a built-in,
$Revision: 5.25.4.4 $ $Date: 2005/06/21 19:23:53 $
Nathaniel-
Of course we did go through our records – the built-in version was just mimicking the last MATLAB code version. We do know when the behavior was introduced but sadly there was no reason stated in the records or the code.
–Loren
I would much prefer that diff(X) did not return a shorter vector, but a vector of the same size as X. As it is, every time I use diff I am padding the returned vector with zeros. And, if it did not shorten the returned vector, the undesired behavior regarding dimensions would also go away!