I can think of a lot of functions that rearrange data in MATLAB. I've long suspected that not all of these are well-known, though some are clearly daily tools. Maybe it's time to be sure they get exposure.
Contents
My List of Functions for Rearranging Data
Here's my off the top of my head incomplete list.
Frequency of Use?
My guess is that circshift is one on the list that gets used least often. It's called a circular shift because elements that fall off at one end appear at the other end, wrapping around the values. Let's play with it to see what it can do. I'll use unique numbers in the sample matrix so we can follow them around.
A = reshape(1:16,4,4)'
A =
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
From the help, I get to shift each dimension "up" or "down". Let's first just shift values in each column down by 2.
Adown2 = circshift(A,2)
Adown2 =
9 10 11 12
13 14 15 16
1 2 3 4
5 6 7 8
Now let's shift just row values - to the right by 3.
Aright3 = circshift(A,[0 3])
Aright3 =
2 3 4 1
6 7 8 5
10 11 12 9
14 15 16 13
How about shifting left by 1?
Aleft1 = circshift(A,[0 -1])
Aleft1 =
2 3 4 1
6 7 8 5
10 11 12 9
14 15 16 13
We can see that shifting left by 1 is the same as shifting right by 3 when the number of columns is 4.
I can do a combination shift, with rows and columns.
Ad2l1 = circshift(A,[2 -1])
Ad2l1 =
10 11 12 9
14 15 16 13
2 3 4 1
6 7 8 5
What Do You Use?
What functions or techniques do you use to rearrange your data most often? Do you have a favorite function in this category that I didn't list? Let me know here.
Get
the MATLAB code
Published with MATLAB® 7.11



Excellent article. I do something similar for topics such as: different types of random numbers, interpolating/lookup, etc…. I will be adding this to my list.
I may have mentioned this before to either support or on a blog, but I think _padarray_ from the image processing toolbox would be an excellent addition to base MATLAB. I think it is up there in usefulness as repmat and others you listed above.
Stephen
I often use fliplr and flipud, as well as buffer (from SP toolbox). :)
Stephen and Memming-
Excellent additions to the list!
Stephen- be sure to use the support link on the right of my blog to place the padarray idea as an enhancement request (it’s best if it comes from a real user and not me!).
–Loren
In my experience, “squeeze” is rarely a wise choice of rearrangement; reshaping (by moving the singleton dimension to the end) is almost always a superior choice to squeezing.
Reshaping instead of squeezing is a safeguard against the unintended consequence of a usually-not-singleton dimension accidentally getting squeezed away when you do not want it to be.
Reshape is often very useful for use with BSXFUN, as in the following example (taken from a newsgroup post). The original poster had this code, which he wished to improve upon
p1hf = [1,2;3,4]; nb = 2; % It might go up to 100 or so need a smarter solution p2hf = zeros(nb, nb, nb, nb); for a=1:nb for b=1:nb for c=1:nb for d=1:nb p2hf(a,b,c,d) = 2*p1hf(a,b)*p1hf(c,d) - p1hf(a,c)*p1hf(b,d); end end end endIf we use BSXFUN and RESHAPE correctly, this can all be done much more compactly (and faster for large arrays) as follows
p3 = bsxfun(@times,2*p1hf,reshape(p1hf,1,1,nb,nb)) -... bsxfun(@times,reshape(p1hf,nb,1,nb),reshape(p1hf,1,nb,1,nb));I use sort and sortrows a lot.
I also agree that padarray should come in the base distribution.
‘:’ should definitely be in there! I have applications where I want to project 3-D distributions to 2-D images. That goes neatly with a sparse forward matrix and ‘:’
img = zeros(256);
% size(I3D): [Nx, Ny, Nz]
% size(pM): [Nx*Ny*Nz, 256*256]
img(:) = pM*I3D(:); % or if there should be a transpose…
Folks-
Just fyi, I certainly meant : to be part of indexing. Another reference for indexing is that category for this blog:
http://blogs.mathworks.com/loren/category/indexing/
I agree about squeeze as well – I find I use functions that I know for sure what they will do with all shapes and that the behavior is what I want. Sometimes squeeze at the command line is ok for me.
–Loren
Lauren,
When combined with ‘cumsum’, ‘circshift’ is very useful for performing a loop-less sliding-average.
Gary
Sorry for misspelling your name in the previous post.
Gary
I would be remiss in my duties if I did not point out the “Stack” and “Unstack” methods for the dataset array.
A dataset array can be viewed as a table of values. Rows represent different observations or cases while columns represent different measured variables. The stack and unstuck methods allow you to reshape the dataset array, transforming categorical data stored in the dataset array into variables or vice versa. (This is sometimes referred to as tall-to-wide conversion)
The primary motivation for stack and unstack is shaping the data for a specific technique that you want to apply. For example, in many cases a “tall” data format is easier to work with when you are performing statistical calculations; however, a wide format works better for Exploratory Data Analysis.
The Statistics Toolbox documentation has a nice write up describing the stack and unstack methods. You’ll also find some demo code that you can experiment with.
http://www.mathworks.com/help/toolbox/stats/dataset.unstack.html
rot90(), although it’s only 2d :( for n-dimensional arrays I have to loop over the extra dimensions and sometimes permute/ipermute to get the correct orientation…
Hi Loren
I used circshift to create points along a line not parallel to matrix x or y axes. Something like this:
Cyclist, Matt, Oliver, Bjorn, Gary, Richard, Stan, Matteo-
All interesting thoughts you shared, along with the earlier ones of others already acknowledged. Thanks!
I especially appreciate the ones with code snippets that can help give other users concrete ideas.
–Loren
Direct indexing of function results would be great:
According to the forum this can be done with function-handles, but that seems quite weird to me.
Detlef
Detlef-
We have this enhancement request in our database.
–Loren
As others pointed out, ‘squeeze’ should really be avoided for multiple reasons. I got a huge performance improvement after getting rid of squeeze inside a nested for-loop (ok, not really good style, either) – a single ‘squeeze’ does not take much time – however it quickly adds up and reshaping your array to get rid of squeeze is always faster.
I often use a rather unorthodox method of concatenating two dimensions of an array – using permute and the column operator:
I guess I could use cat, but for some reason this method is easier to visualize for me. I always struggle with the proper syntax of function calls that I’m not using that often.
‘permute’ is probably one of the rearranging functions I’m using most often – e.g. for preparing an array for bsxfun:
Philipp-
Thanks for sharing your code pattern!
–loren
Hi,
I am a newbie to matlab and working on oceanographic data quality control. I work with time series observation data. My data interval is every 3hours (8 observations per day). But for certain days there will some missing observations. I want to select only those days where the no. of observations are more than 4, so that I can compute daily average for that particular day. Could please help with this?
Jayaram-
If you are missing data and the times are filled with NaNs, you can just check each day to see if sum(~isnan(day)) > 4 and select those days.
–Loren