Skip to Main Content Skip to Search
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Loren on the Art of MATLAB

February 20th, 2008

Under-appreciated accumarray

The MATLAB function accumarray seems to be under-appreciated. accumarray allows you to aggregate items in an array in the way that you specify.

Contents

Newsgroup Statistics

Since accumarray has been in MATLAB (7.0, R14), there have been over 100 threads in the MATLAB newsgroup where accumarray arose as a solution.

Recent Questions

One of the more recent threads asks how to aggregate values in one list based on another list. Suppose the lists are

group = [1 2 2 2 3 3]'
data = [6 43 3 4 2 5]'
group =
     1
     2
     2
     2
     3
     3
data =
     6
    43
     3
     4
     2
     5

and the goal is to sum the data in each group. Let's first create the first input argument. accumarray wants the an array of subscripts of the data pertaining to which output value the data belongs to. Since we're just producing a column vector with 3 values, we just append a column of ones to the group vector.

indices = [group ones(size(group))]
indices =
     1     1
     2     1
     2     1
     2     1
     3     1
     3     1

Next We Accumulate

Since the default function for accumulation is sum, we can use the simplest form of accumarray to get the desired results.

sums = accumarray(indices, data)
sums =
     6
    50
     7

Another Way to Accumulate

We can instead accumulate the results by adding 2 input arguments to the function call. These are the a size vector for the output array and a function handle specifying the accumulating function.

sums1 = accumarray(indices, data, [numel(unique(group)) 1], @sum)
sums1 =
     6
    50
     7

It's easy to see that the results from the two function calls are the same.

isequal(sums, sums1)
ans =
     1

Other Accumulation Functions

Sometimes, summing the results isn't what I'm looking for. Having puzzled out the 4 input call syntax, I can now simply replace the accumulation function. To find the maximum values in each group, I use this code.

maxData = accumarray(indices, data, [numel(unique(group)) 1], @max)
maxData =
     6
    43
     5
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
    @(x)~any(isfinite(x)))
maxData =
     0
     0
     0
data(end) = Inf
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
    @(x)~any(isfinite(x)))
data =
     6
    43
     3
     4
     2
   Inf
maxData =
     0
     0
     0
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
    @(x)all(isfinite(x)))
maxData =
     1
     1
     0

Derivative Work

John D'Errico made a more general function consolidator, found on the MathWorks File Exchange to allow you to do some extra aggregation. For example, consolidator allows the aggregation of elements when they are within a specified tolerance and not just identical.

Do You accum?

Some other obvious accumulation functions you might use include sum, max, min, prod. What functions do you use in situations when you aggregate with accumarray? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.5

26 Responses to “Under-appreciated accumarray”

  1. Ben replied on :

    Neat function!

  2. Dani Hak replied on :

    yesterday i build a matrix of ones and zeros and multiplied by it to preform accumulation. accumarray seems a better solution but the first option also works if data has multiple colomns.

    how can I use accumarray if data has multiple colomns?

    Thanks,
    Dani.

  3. Dani Hak replied on :

    Hello Loren,
    regarding my last post,
    tried using anonymous function:

    f= @(x) accumarray(indices,data(:,x));
    sum=arrayfun(f,1:size(data,2),’UniformOutput’,'false’);

    but it is not working,
    Dani.

  4. Daniel Armyr replied on :

    Hi.
    This was a really nifty trick I didn’t know about. Thank you and keep them coming.

    However, it did take me a few minutes and some studying of the documentation to realize that the line “indices = [group ones(size(group))]” only serves to produce more visually pleasant output. On my first read I thought there was some deep and important meaning to that second column.

    Just wanted to mention that if the next reader gets confused as well.

    Sincerely
    Daniel Armyr

  5. Loren replied on :

    Thanks for the comments, folks.

    Dani,

    I recommend you look at the reference page for accumarray. There are many examples there, including ones with matrices and not just vectors.

    –Loren

  6. Oliver A. Chapman, P.E. replied on :

    This is another example of very poor documentation. I’ve read the FRP for “accumarry” several times now as well as you column and I still can’t figure out what the function is doing.

    Since the documentation has this brief comment: “accumarray sums values from val using the default behavior of sum” that the accumulating and aggregating you are referring to is actually summing. If so, you need to clearly say so.

    Countering my above suspicion are the 3 syntax options that allow the specification of an alternate function. So, what does it mean to accumulate on one hand using “sum” and on the other using “sin?” Is it effectively doing something like “sum(sin(x))?” In your column, you provided 4 examples of substituting alternate functions: “max,”, “any” and “all.” But, I’m not following any of them.

    Most importantly, I can’t follow the data flow to understand how the vector of indices controls the accumulation. The first example on the FRP does provide some help in this. But, how would one ever construct a meaningful or useful index vector?

    The FRP and your column don’t show us a problem where this function is really useful. The way it is currently explained, it appears to be a solution in search of a problem. I’ve similarly criticized function handles. It took me the better part of a year to understand what they did and I still can’t imagine a case where they would be useful since they seem to obscure the data flow.

    Finally, with 6 syntax options, the FRP requires in excess of 20 examples that systematically lead the user from trivial to sophisticated for each of syntax options.

  7. Loren replied on :

    Oliver-

    You can’t use sin with accumarray. The reason is stated in this part of the description:

    “A = accumarray(subs,val,sz,fun) applies function fun to each subset of elements of val. You must specify the fun input using the @ symbol (e.g., @sin). The function fun must accept a column vector and return a numeric, logical, or character scalar, or a scalar cell.”

    The problems accumarray helps to solve are ones posed by users such as the examples from the newsgroup — where they want to aggregate contents in a collection subject to some criteria that they have for their particular problem. That’s why hist alone was not enough.

    Nonetheless, I do hear that you are unhappy with the documentation.

    –Loren

  8. Oliver A. Chapman, P.E. replied on :

    Loren,

    It appears we aren’t communicating well.

    You say that you can’t use sin with accumarray and then you quote the documentation where they do exactly that. So, clearly, accumarray can be used with sin. But, I do not understand the data flow. e.g., what the details of this accumulation is.

    I’ve pretty much concluded that accumarray is not doing something like (using the syntax from the FRP):

    for ii = subs
    A(ii) = sum(val(subs(1 : ii)));
    end

    Before I wrote my last comment, I did review some of the entries in the newsgroup. Yet, I didn’t find one where I could understand the data flow.

    Maybe you could review the specification for accumarray and that would provide some text that would help explain what the function is doing.

    By the way, MatLab has the best documentation of any software that I’ve ever used and I tell my bosses that several times a year. But, it isn’t perfect and the shortcomings in documentation are a productivity issue.

    Like I’ve done before, I’ll tell this to Scott when he visits us next week.

    Thanks.

  9. Loren replied on :

    Oliver-

    Thanks for your explanation.

    I see @sin as how to specify a function handle, but not in one of the examples. I stand by the documentation I previously quoted:

    “A = accumarray(subs,val,sz,fun) applies function fun to each subset of elements of val. You must specify the fun input using the @ symbol (e.g., @sin). The function fun must accept a column vector and return a numeric, logical, or character scalar, or a scalar cell.”

    which says vector in -> scalar out, something that sin does not do.

    –Loren

  10. Oliver A. Chapman, P.E. replied on :

    I give up.

    I’ll ask Craig next week when he visits us.

  11. Dan K replied on :

    Loren,
    I think I’m understanding the frustration which Oliver is expressing. accumarray seems to represent a very non-intuitive process. As best I can determine, what it is doing is the following.

    % subs is an array of indices
    % vals is the data you want to work with

    For the first row of subs:
    Find all of the rows of subs that match that row and call them subset (pretend that rows 1,3, and 5 all match with: [2 1]
    Take all of the values in vals from that subset (e.g. vals([1,3,5]) and call that tempvals
    Apply whatever your function is to tempvals and assign that to the output at the index defined by the row of subs which you are working with (i.e. out(2,1) = sum(vals([1,3,5]))
    Keep going through until you’ve found all of the unique rows of subs.

    Is that an accurate description of what this is doing?

    Thanks,
    Dan

  12. Eric S. replied on :

    I think that ACCUMARRAY has one of the worst documentation pages in MATLAB. I also find ACCUMARRY to be probably the number one most confusing function in MATLAB - there might be causation behind this correlation ;)

    I also agree with Oliver that using @sin as an example for the function handle input is bad form, since it is not a valid for the function. Furthermore, the ‘issparse’ input argument description is confusing - it’s not testing sparsity but asking for it, so it should be described by something like ‘createsparse’ instead. And also, the forms that use ’sz’ but not ‘fillval’ don’t specify what will then be used to fill - it turns out by experimentation to be zero, but this should be called out.

    For your example, why did you choose to make it slightly more confusing by adding the column of ones to ‘group’ to make ‘indices’? I believe that ‘accumarray(group,data)’ gives the same result, and is clearer in its intent. Also clearer may have been to use [] for the size input argument, since you want the default behavior and are not specifying an explicitly sized output array.

    Finally, I believe that consolidator-like tolerance functionality should be added to accumarray.

    Thanks,
    Eric - former TMWer.

  13. Loren replied on :

    Eric-

    After Oliver noted the @sin instance, it was entered into the bug database at MathWorks and will be fixed.

    I created the inputs deliberately so people could see (but maybe didn’t) how to work with that indices input more easily than I felt the documentation showed. I may have guessed incorrectly, at least for some of you.

    –Loren

  14. Loren replied on :

    Dan-

    Your understanding of accumarray and how it works is correct.

    –Loren

  15. Oliver A. Chapman, P.E. replied on :

    I still say that you should take the 1st example from the FRP and show us the equivalent code that produces the same result.

    I spent 2 hours last night trying to hack it out such equivalent code and I never got close.

    And, this is exactly why I always say that documentation is a productivity issue. If The MathWorks can write a fancy routine like accumarray, they certainly can write a FRP that explains it for the novice user.

  16. Loren replied on :

    Oliver-

    I am not sure if this is what you are looking for, but try this:

    %% Here's the "Data" for Example 1
    val = 101:105;
    subs = [1; 2; 4; 2; 4]
    %% Here's the accumarray Solution
    % We are accumulating values from val based on like-values
    % in subs
    A = accumarray(subs, val)
    %% Another Method
    % Here's the outline of perhaps a more "standard" way to
    % think of this.
    %
    % * First find out how many unique indices there are in
    % subs.  The length
    % of this array corresponds to the maximum index value in
    % subs.  This is the size of the output array.
    % * Pre-allocate the output array to be the correct size.
    % * Loop through the *values* in subs, which range from
    % 1:max(subs).
    %
    %% Find How Many Times Each Index is Repeated
    % For indices, find out how many of each value.
    n = hist(subs,max(subs))
    %%
    % Verify that the maximum value in subs is indeed the number
    % of bins from calling hist.
    max(subs) == length(n)
    %%
    % Create output array.
    Aother = zeros(length(n),1);
    %%
    % Add each entry in val to the "appropriate" output entry
    for k = 1:length(n);
        % Find correct subscript for adding to Aother
        % logical index for val containing n(k) nonzeros
        ind = (subs == k)
        % verify that the number of nonzeros is the expected
        % amount
        tf = n(k) == nnz(ind)
        % here are the values in the data from the
        % corresponding indices
        valk = val(ind)
        % sum up these values
        Aother(k) = sum(val(ind));
    end
    %% Compare Output Values
    agree = isequal(Aother,A)
    

    –Loren

  17. Oliver A. Chapman, P.E. replied on :

    Loren,

    That is just what I was looking for. Thank you very much.

    Now I’m on the way to understanding what it does. Now I’ve got to understand why this would be useful.

    A more compact version of the code for the vector only form of subs, is:

    for ii = 1 : length(hist(subs, max(subs)));
    A(ii) = sum(val((subs == ii)));
    end

  18. Loren replied on :

    Glad that helped, Oliver. You are right for the vector case. I put extra statements in as a way of explanation.

    –Loren

  19. Ilya Rozenfeld replied on :

    I find accumarray extremely useful and use it quite often. However, I would concur with other commenters that the documentation could be better. When this function first appeared it took me a while to understand what it does and how it works.

  20. Ivan Pan replied on :

    After reading through all the comments, I think I understand the concept of accumarray. It will be pretty useful. However, I still don’t understand the syntax.

    >> sums = accumarray(indices, data);
    >> sum1 = accumarray(group, data);
    >> isequal(sums, sum1)
    ans =
    1

    I don’t understand the meaning of the indices or the illustration of it. This is how I can explain the syntax. The indices or subscripts is to shape the output. That’s why “data” has to be 1-d array. At first, I thought the indices to be (row, col) of data. It is in fact the (row, col) of “sums”.

    >> indices = [ones(size(group), group];
    >> sums = accumarray(indices, group)
    ans =
    6 50 7

    I’m also confused about the size input. what is the difference of using “numel” and “size”? I read the documentation and I couldn’t understand the difference.

    >> sum1 = accumarray(indices, data, [numel(unique(group)) 1], @sum)
    >> sum2 = accumarray(indices, data, size(unique(group)), @sum)
    >> isequal(sum1, sum2)
    ans =
    1

    Also what does the size input do if indices already shape the output? Or maybe I misunderstand what “indices” does.

    ip

  21. Loren replied on :

    Ivan-

    Since the data in this case is a column vector, we can either index with just group (1-d indexing) or using subscripts (row, col, where column is always one here). The indices or group is which output in sums does the corresponding data belong to.

    size produces m x n or more output values — one for each dimension. numel totals up all the elements and is equal to prod(size(…))

    –Loren

  22. Wolfgang replied on :

    Hi Loren

    I use the accumarray function very often since it is another way to efficiently make neighborhood operations besides sparse matrices. I’d like to see two things in the future:

    1. accumarray should be able to take various functions so that it is possible to replace following statements

    A = accumarray(ind,val,[n 1],@max);
    B = accumarray(ind,val,[n 1],@min);

    by something like this

    [A,B] = accumarray(ind,val,[n 1],{@min @max});

    2. accumarray should be able to make some kind of reference to the indices in val. A few month ago I posted a question on the newsgroup
    http://www.mathworks.com/matlabcentral/newsreader/view_thread/155890
    Peter Perkins came up with a neat but not very elegant idea of how to handle this. But this could be perhaps treated much better.

    Best regards,
    Wolfgang

  23. StephenLL replied on :

    You can accomplish (1) by creating a scalar cell as output from an anonymous function that combines your two functions: min and max.

    f = @(x) { [min(x) max(x)] } % take note of the brackets

    C = accumarray(ind,val,[n 1], f );

    C{1} will have both the min and max in it for the 1st subgroup.

  24. Bobby Cheng replied on :

    You can achieve (2) using test.m below. I have not tested performance. But the functionality is there. Similar solution exists for annoynmous function.

    —Bob.

    function A = test

    ind = [1 2 3 1 2 3 1 2 3]’;
    val = [0.1:0.1:0.9]’;
    A = accumarray(ind,val,[3 1],@findmax);

    function ix = findmax(s);
    ix = find(val == max(s),1);
    end
    end

  25. Wolfgang replied on :

    - Stephen
    Thanks a lot. Your suggestion actually does the job but is much more memory demanding. When the number of unique indices becomes large I would prefer calling accumarray twice.

    - Bob
    thanks, too. But your suggestion has the same problem as the one I suggested. Since accumarray works on the subsets of val it returns the indices inside these subsets as shown by Peter in the aforementioned post.

  26. Ilya Rozenfeld replied on :

    I tried to post it on the newsfeed but didn’t work for some reason.

    Wolfgang,

    How about this:

    A = accumarray(ind,1:numel(val),[3 1], @(x) findmax(x,val))

    where

    function ix = findmax(indx, s)
    [m,ix] = max(s(indx));
    ix = indx(ix);

Leave a Reply


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Loren: Timothee- Anonymous functions can only be a single (complicated) expression. You might be able to do what you...
  • Timothee: Is there a way to combine multiple commands in anonymous functions? ex1: fun=@(A)([V,D]=eig(A ); A*V-V*D)...
  • Loren: Here’s Cleve’s reply to Etienne: The crucial factor is the number and location of the nonzero...
  • Loren: Tristan- Nested functions can be slower in some cases currently. We know we have some opportunities to...
  • Tristan: Wow! I just tried with a global variable and it’s 5 times slower than with a argument! function...
  • Jon: Loren, I encountered this same problem and I attempted to find the answer by looking at the documentation for...
  • Tristan: “One thing that I have long wondered about is relative speed of nested functions relative to...
  • Etienne Non: Hi! I’m trying to understand why the Matlab function LU.m takes almost 20 times more time to...
  • Loren: Jonathan- The behavior you see is because the variable x has to come into inplaceTest and then a copy is made...
  • Jonathan: I am calling it from another function, but have just noticed a bit more odd behavior. Here is what...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.

Related Topics