Loren on the Art of MATLAB

September 30th, 2010

Rearranging Data

I can think of a lot of functions that rearrange data in MATLAB. I've long suspected that not all of these are well-known, though some are clearly daily tools. Maybe it's time to be sure they get exposure.

Contents

My List of Functions for Rearranging Data

Here's my off the top of my head incomplete list.

Frequency of Use?

My guess is that circshift is one on the list that gets used least often. It's called a circular shift because elements that fall off at one end appear at the other end, wrapping around the values. Let's play with it to see what it can do. I'll use unique numbers in the sample matrix so we can follow them around.

A = reshape(1:16,4,4)'
A =
     1     2     3     4
     5     6     7     8
     9    10    11    12
    13    14    15    16

From the help, I get to shift each dimension "up" or "down". Let's first just shift values in each column down by 2.

Adown2 = circshift(A,2)
Adown2 =
     9    10    11    12
    13    14    15    16
     1     2     3     4
     5     6     7     8

Now let's shift just row values - to the right by 3.

Aright3 = circshift(A,[0 3])
Aright3 =
     2     3     4     1
     6     7     8     5
    10    11    12     9
    14    15    16    13

How about shifting left by 1?

Aleft1 = circshift(A,[0 -1])
Aleft1 =
     2     3     4     1
     6     7     8     5
    10    11    12     9
    14    15    16    13

We can see that shifting left by 1 is the same as shifting right by 3 when the number of columns is 4.

I can do a combination shift, with rows and columns.

Ad2l1 = circshift(A,[2 -1])
Ad2l1 =
    10    11    12     9
    14    15    16    13
     2     3     4     1
     6     7     8     5

What Do You Use?

What functions or techniques do you use to rearrange your data most often? Do you have a favorite function in this category that I didn't list? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.11

20 Responses to “Rearranging Data”

  1. StephenLL replied on :

    Excellent article. I do something similar for topics such as: different types of random numbers, interpolating/lookup, etc…. I will be adding this to my list.

    I may have mentioned this before to either support or on a blog, but I think _padarray_ from the image processing toolbox would be an excellent addition to base MATLAB. I think it is up there in usefulness as repmat and others you listed above.

    Stephen

  2. Memming replied on :

    I often use fliplr and flipud, as well as buffer (from SP toolbox). :)

  3. Loren replied on :

    Stephen and Memming-

    Excellent additions to the list!

    Stephen- be sure to use the support link on the right of my blog to place the padarray idea as an enhancement request (it’s best if it comes from a real user and not me!).

    –Loren

  4. the cyclist replied on :

    In my experience, “squeeze” is rarely a wise choice of rearrangement; reshaping (by moving the singleton dimension to the end) is almost always a superior choice to squeezing.

    Reshaping instead of squeezing is a safeguard against the unintended consequence of a usually-not-singleton dimension accidentally getting squeezed away when you do not want it to be.

  5. matt fig replied on :

    Reshape is often very useful for use with BSXFUN, as in the following example (taken from a newsgroup post). The original poster had this code, which he wished to improve upon

    
    p1hf = [1,2;3,4];
    nb = 2; % It might go up to 100 or so need a smarter solution
    p2hf = zeros(nb, nb, nb, nb);
    for a=1:nb
       for b=1:nb
          for c=1:nb
             for d=1:nb
                p2hf(a,b,c,d) = 2*p1hf(a,b)*p1hf(c,d) - p1hf(a,c)*p1hf(b,d);
             end
          end
       end
    end
    

    If we use BSXFUN and RESHAPE correctly, this can all be done much more compactly (and faster for large arrays) as follows

    
    p3 = bsxfun(@times,2*p1hf,reshape(p1hf,1,1,nb,nb)) -...
             bsxfun(@times,reshape(p1hf,nb,1,nb),reshape(p1hf,1,nb,1,nb)); 
    
  6. Oliver Woodford replied on :

    I use sort and sortrows a lot.

    I also agree that padarray should come in the base distribution.

  7. BjornG replied on :

    ‘:’ should definitely be in there! I have applications where I want to project 3-D distributions to 2-D images. That goes neatly with a sparse forward matrix and ‘:’
    img = zeros(256);
    % size(I3D): [Nx, Ny, Nz]
    % size(pM): [Nx*Ny*Nz, 256*256]
    img(:) = pM*I3D(:); % or if there should be a transpose…

  8. Loren replied on :

    Folks-

    Just fyi, I certainly meant : to be part of indexing. Another reference for indexing is that category for this blog:

    http://blogs.mathworks.com/loren/category/indexing/

    I agree about squeeze as well – I find I use functions that I know for sure what they will do with all shapes and that the behavior is what I want. Sometimes squeeze at the command line is ok for me.

    –Loren

  9. Gary replied on :

    Lauren,
    When combined with ‘cumsum’, ‘circshift’ is very useful for performing a loop-less sliding-average.
    Gary

  10. Gary replied on :

    Sorry for misspelling your name in the previous post.
    Gary

  11. Richard replied on :

    I would be remiss in my duties if I did not point out the “Stack” and “Unstack” methods for the dataset array.

    A dataset array can be viewed as a table of values. Rows represent different observations or cases while columns represent different measured variables. The stack and unstuck methods allow you to reshape the dataset array, transforming categorical data stored in the dataset array into variables or vice versa. (This is sometimes referred to as tall-to-wide conversion)

    The primary motivation for stack and unstack is shaping the data for a specific technique that you want to apply. For example, in many cases a “tall” data format is easier to work with when you are performing statistical calculations; however, a wide format works better for Exploratory Data Analysis.

    The Statistics Toolbox documentation has a nice write up describing the stack and unstack methods. You’ll also find some demo code that you can experiment with.
    http://www.mathworks.com/help/toolbox/stats/dataset.unstack.html

  12. stan replied on :

    rot90(), although it’s only 2d :( for n-dimensional arrays I have to loop over the extra dimensions and sometimes permute/ipermute to get the correct orientation…

  13. Matteo replied on :

    Hi Loren

    I used circshift to create points along a line not parallel to matrix x or y axes. Something like this:

    test=zeros(1,512);
    c1=zeros(512);
    test(1,216)=1;
    for i=1:15
    temp=circshift(test,[0 4*i]);
    c1(32*i,:)=temp;
    end
    imagesc(c1);colormap(gray);
    
  14. Loren replied on :

    Cyclist, Matt, Oliver, Bjorn, Gary, Richard, Stan, Matteo-

    All interesting thoughts you shared, along with the earlier ones of others already acknowledged. Thanks!

    I especially appreciate the ones with code snippets that can help give other users concrete ideas.

    –Loren

  15. Detlef replied on :

    Direct indexing of function results would be great:

    (inv([1 2;3 4]))(:,1)
    

    According to the forum this can be done with function-handles, but that seems quite weird to me.

    Detlef

  16. Loren replied on :

    Detlef-

    We have this enhancement request in our database.

    –Loren

  17. Philipp replied on :

    As others pointed out, ‘squeeze’ should really be avoided for multiple reasons. I got a huge performance improvement after getting rid of squeeze inside a nested for-loop (ok, not really good style, either) – a single ‘squeeze’ does not take much time – however it quickly adds up and reshaping your array to get rid of squeeze is always faster.

    I often use a rather unorthodox method of concatenating two dimensions of an array – using permute and the column operator:

    % concatenating dim. 1 and 3:
    matrix = permute(matrix,[2 1 3]);
    matrix = matrix(:,:);
    matrix = permute(matrix,[2 1]);
    

    I guess I could use cat, but for some reason this method is easier to visualize for me. I always struggle with the proper syntax of function calls that I’m not using that often.

    ‘permute’ is probably one of the rearranging functions I’m using most often – e.g. for preparing an array for bsxfun:

    % size(array1) = [100 2 2]
    % size(array2) = [2 2]
    % multiplication with bsxfun:
    bsxfun(@times, array1, permute(array2,[3 1 2]));
    
  18. Loren replied on :

    Philipp-

    Thanks for sharing your code pattern!

    –loren

  19. Jayaram replied on :

    Hi,

    I am a newbie to matlab and working on oceanographic data quality control. I work with time series observation data. My data interval is every 3hours (8 observations per day). But for certain days there will some missing observations. I want to select only those days where the no. of observations are more than 4, so that I can compute daily average for that particular day. Could please help with this?

  20. Loren replied on :

    Jayaram-

    If you are missing data and the times are filled with NaNs, you can just check each day to see if sum(~isnan(day)) > 4 and select those days.

    –Loren


MathWorks
Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.