The MATLAB function accumarray seems to be under-appreciated. accumarray allows you to aggregate items in an array in the way that you specify.
Contents
Newsgroup Statistics
Since accumarray has been in MATLAB (7.0, R14), there have been over 100 threads in the MATLAB newsgroup where accumarray arose as a solution.
Recent Questions
One of the more recent threads asks how to aggregate values in one list based on another list. Suppose the lists are
group = [1 2 2 2 3 3]' data = [6 43 3 4 2 5]'
group =
1
2
2
2
3
3
data =
6
43
3
4
2
5
and the goal is to sum the data in each group. Let's first create the first input argument. accumarray wants the an array of subscripts of the data pertaining to which output value the data belongs to. Since we're just producing a column vector with 3 values, we just append a column of ones to the group vector.
indices = [group ones(size(group))]
indices =
1 1
2 1
2 1
2 1
3 1
3 1
Next We Accumulate
Since the default function for accumulation is sum, we can use the simplest form of accumarray to get the desired results.
sums = accumarray(indices, data)
sums =
6
50
7
Another Way to Accumulate
We can instead accumulate the results by adding 2 input arguments to the function call. These are the a size vector for the output array and a function handle specifying the accumulating function.
sums1 = accumarray(indices, data, [numel(unique(group)) 1], @sum)
sums1 =
6
50
7
It's easy to see that the results from the two function calls are the same.
isequal(sums, sums1)
ans =
1
Other Accumulation Functions
Sometimes, summing the results isn't what I'm looking for. Having puzzled out the 4 input call syntax, I can now simply replace the accumulation function. To find the maximum values in each group, I use this code.
maxData = accumarray(indices, data, [numel(unique(group)) 1], @max)
maxData =
6
43
5
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
@(x)~any(isfinite(x)))maxData =
0
0
0
data(end) = Inf
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
@(x)~any(isfinite(x)))data =
6
43
3
4
2
Inf
maxData =
0
0
0
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
@(x)all(isfinite(x)))maxData =
1
1
0
Derivative Work
John D'Errico made a more general function consolidator, found on the MathWorks File Exchange to allow you to do some extra aggregation. For example, consolidator allows the aggregation of elements when they are within a specified tolerance and not just identical.
Do You accum?
Some other obvious accumulation functions you might use include sum, max, min, prod. What functions do you use in situations when you aggregate with accumarray? Let me know here.
Get
the MATLAB code
Published with MATLAB® 7.5

Neat function!
yesterday i build a matrix of ones and zeros and multiplied by it to preform accumulation. accumarray seems a better solution but the first option also works if data has multiple colomns.
how can I use accumarray if data has multiple colomns?
Thanks,
Dani.
Hello Loren,
regarding my last post,
tried using anonymous function:
f= @(x) accumarray(indices,data(:,x));
sum=arrayfun(f,1:size(data,2),’UniformOutput’,'false’);
but it is not working,
Dani.
Hi.
This was a really nifty trick I didn’t know about. Thank you and keep them coming.
However, it did take me a few minutes and some studying of the documentation to realize that the line “indices = [group ones(size(group))]” only serves to produce more visually pleasant output. On my first read I thought there was some deep and important meaning to that second column.
Just wanted to mention that if the next reader gets confused as well.
Sincerely
Daniel Armyr
Thanks for the comments, folks.
Dani,
I recommend you look at the reference page for accumarray. There are many examples there, including ones with matrices and not just vectors.
–Loren
This is another example of very poor documentation. I’ve read the FRP for “accumarry” several times now as well as you column and I still can’t figure out what the function is doing.
Since the documentation has this brief comment: “accumarray sums values from val using the default behavior of sum” that the accumulating and aggregating you are referring to is actually summing. If so, you need to clearly say so.
Countering my above suspicion are the 3 syntax options that allow the specification of an alternate function. So, what does it mean to accumulate on one hand using “sum” and on the other using “sin?” Is it effectively doing something like “sum(sin(x))?” In your column, you provided 4 examples of substituting alternate functions: “max,”, “any” and “all.” But, I’m not following any of them.
Most importantly, I can’t follow the data flow to understand how the vector of indices controls the accumulation. The first example on the FRP does provide some help in this. But, how would one ever construct a meaningful or useful index vector?
The FRP and your column don’t show us a problem where this function is really useful. The way it is currently explained, it appears to be a solution in search of a problem. I’ve similarly criticized function handles. It took me the better part of a year to understand what they did and I still can’t imagine a case where they would be useful since they seem to obscure the data flow.
Finally, with 6 syntax options, the FRP requires in excess of 20 examples that systematically lead the user from trivial to sophisticated for each of syntax options.
Oliver-
You can’t use sin with accumarray. The reason is stated in this part of the description:
“A = accumarray(subs,val,sz,fun) applies function fun to each subset of elements of val. You must specify the fun input using the @ symbol (e.g., @sin). The function fun must accept a column vector and return a numeric, logical, or character scalar, or a scalar cell.”
The problems accumarray helps to solve are ones posed by users such as the examples from the newsgroup — where they want to aggregate contents in a collection subject to some criteria that they have for their particular problem. That’s why hist alone was not enough.
Nonetheless, I do hear that you are unhappy with the documentation.
–Loren
Loren,
It appears we aren’t communicating well.
You say that you can’t use sin with accumarray and then you quote the documentation where they do exactly that. So, clearly, accumarray can be used with sin. But, I do not understand the data flow. e.g., what the details of this accumulation is.
I’ve pretty much concluded that accumarray is not doing something like (using the syntax from the FRP):
for ii = subs
A(ii) = sum(val(subs(1 : ii)));
end
Before I wrote my last comment, I did review some of the entries in the newsgroup. Yet, I didn’t find one where I could understand the data flow.
Maybe you could review the specification for accumarray and that would provide some text that would help explain what the function is doing.
By the way, MatLab has the best documentation of any software that I’ve ever used and I tell my bosses that several times a year. But, it isn’t perfect and the shortcomings in documentation are a productivity issue.
Like I’ve done before, I’ll tell this to Scott when he visits us next week.
Thanks.
Oliver-
Thanks for your explanation.
I see @sin as how to specify a function handle, but not in one of the examples. I stand by the documentation I previously quoted:
“A = accumarray(subs,val,sz,fun) applies function fun to each subset of elements of val. You must specify the fun input using the @ symbol (e.g., @sin). The function fun must accept a column vector and return a numeric, logical, or character scalar, or a scalar cell.”
which says vector in -> scalar out, something that sin does not do.
–Loren
I give up.
I’ll ask Craig next week when he visits us.
Loren,
I think I’m understanding the frustration which Oliver is expressing. accumarray seems to represent a very non-intuitive process. As best I can determine, what it is doing is the following.
% subs is an array of indices
% vals is the data you want to work with
For the first row of subs:
Find all of the rows of subs that match that row and call them subset (pretend that rows 1,3, and 5 all match with: [2 1]
Take all of the values in vals from that subset (e.g. vals([1,3,5]) and call that tempvals
Apply whatever your function is to tempvals and assign that to the output at the index defined by the row of subs which you are working with (i.e. out(2,1) = sum(vals([1,3,5]))
Keep going through until you’ve found all of the unique rows of subs.
Is that an accurate description of what this is doing?
Thanks,
Dan
I think that ACCUMARRAY has one of the worst documentation pages in MATLAB. I also find ACCUMARRY to be probably the number one most confusing function in MATLAB - there might be causation behind this correlation ;)
I also agree with Oliver that using @sin as an example for the function handle input is bad form, since it is not a valid for the function. Furthermore, the ‘issparse’ input argument description is confusing - it’s not testing sparsity but asking for it, so it should be described by something like ‘createsparse’ instead. And also, the forms that use ’sz’ but not ‘fillval’ don’t specify what will then be used to fill - it turns out by experimentation to be zero, but this should be called out.
For your example, why did you choose to make it slightly more confusing by adding the column of ones to ‘group’ to make ‘indices’? I believe that ‘accumarray(group,data)’ gives the same result, and is clearer in its intent. Also clearer may have been to use [] for the size input argument, since you want the default behavior and are not specifying an explicitly sized output array.
Finally, I believe that consolidator-like tolerance functionality should be added to accumarray.
Thanks,
Eric - former TMWer.
Eric-
After Oliver noted the @sin instance, it was entered into the bug database at MathWorks and will be fixed.
I created the inputs deliberately so people could see (but maybe didn’t) how to work with that indices input more easily than I felt the documentation showed. I may have guessed incorrectly, at least for some of you.
–Loren
Dan-
Your understanding of accumarray and how it works is correct.
–Loren
I still say that you should take the 1st example from the FRP and show us the equivalent code that produces the same result.
I spent 2 hours last night trying to hack it out such equivalent code and I never got close.
And, this is exactly why I always say that documentation is a productivity issue. If The MathWorks can write a fancy routine like accumarray, they certainly can write a FRP that explains it for the novice user.
Oliver-
I am not sure if this is what you are looking for, but try this:
%% Here's the "Data" for Example 1 val = 101:105; subs = [1; 2; 4; 2; 4] %% Here's the accumarray Solution % We are accumulating values from val based on like-values % in subs A = accumarray(subs, val) %% Another Method % Here's the outline of perhaps a more "standard" way to % think of this. % % * First find out how many unique indices there are in % subs. The length % of this array corresponds to the maximum index value in % subs. This is the size of the output array. % * Pre-allocate the output array to be the correct size. % * Loop through the *values* in subs, which range from % 1:max(subs). % %% Find How Many Times Each Index is Repeated % For indices, find out how many of each value. n = hist(subs,max(subs)) %% % Verify that the maximum value in subs is indeed the number % of bins from calling hist. max(subs) == length(n) %% % Create output array. Aother = zeros(length(n),1); %% % Add each entry in val to the "appropriate" output entry for k = 1:length(n); % Find correct subscript for adding to Aother % logical index for val containing n(k) nonzeros ind = (subs == k) % verify that the number of nonzeros is the expected % amount tf = n(k) == nnz(ind) % here are the values in the data from the % corresponding indices valk = val(ind) % sum up these values Aother(k) = sum(val(ind)); end %% Compare Output Values agree = isequal(Aother,A)–Loren
Loren,
That is just what I was looking for. Thank you very much.
Now I’m on the way to understanding what it does. Now I’ve got to understand why this would be useful.
A more compact version of the code for the vector only form of subs, is:
for ii = 1 : length(hist(subs, max(subs)));
A(ii) = sum(val((subs == ii)));
end
Glad that helped, Oliver. You are right for the vector case. I put extra statements in as a way of explanation.
–Loren
I find accumarray extremely useful and use it quite often. However, I would concur with other commenters that the documentation could be better. When this function first appeared it took me a while to understand what it does and how it works.
After reading through all the comments, I think I understand the concept of accumarray. It will be pretty useful. However, I still don’t understand the syntax.
>> sums = accumarray(indices, data);
>> sum1 = accumarray(group, data);
>> isequal(sums, sum1)
ans =
1
I don’t understand the meaning of the indices or the illustration of it. This is how I can explain the syntax. The indices or subscripts is to shape the output. That’s why “data” has to be 1-d array. At first, I thought the indices to be (row, col) of data. It is in fact the (row, col) of “sums”.
>> indices = [ones(size(group), group];
>> sums = accumarray(indices, group)
ans =
6 50 7
I’m also confused about the size input. what is the difference of using “numel” and “size”? I read the documentation and I couldn’t understand the difference.
>> sum1 = accumarray(indices, data, [numel(unique(group)) 1], @sum)
>> sum2 = accumarray(indices, data, size(unique(group)), @sum)
>> isequal(sum1, sum2)
ans =
1
Also what does the size input do if indices already shape the output? Or maybe I misunderstand what “indices” does.
ip
Ivan-
Since the data in this case is a column vector, we can either index with just group (1-d indexing) or using subscripts (row, col, where column is always one here). The indices or group is which output in sums does the corresponding data belong to.
size produces m x n or more output values — one for each dimension. numel totals up all the elements and is equal to prod(size(…))
–Loren
Hi Loren
I use the accumarray function very often since it is another way to efficiently make neighborhood operations besides sparse matrices. I’d like to see two things in the future:
1. accumarray should be able to take various functions so that it is possible to replace following statements
A = accumarray(ind,val,[n 1],@max);
B = accumarray(ind,val,[n 1],@min);
by something like this
[A,B] = accumarray(ind,val,[n 1],{@min @max});
2. accumarray should be able to make some kind of reference to the indices in val. A few month ago I posted a question on the newsgroup
http://www.mathworks.com/matlabcentral/newsreader/view_thread/155890
Peter Perkins came up with a neat but not very elegant idea of how to handle this. But this could be perhaps treated much better.
Best regards,
Wolfgang
You can accomplish (1) by creating a scalar cell as output from an anonymous function that combines your two functions: min and max.
f = @(x) { [min(x) max(x)] } % take note of the brackets
C = accumarray(ind,val,[n 1], f );
C{1} will have both the min and max in it for the 1st subgroup.
You can achieve (2) using test.m below. I have not tested performance. But the functionality is there. Similar solution exists for annoynmous function.
—Bob.
function A = test
ind = [1 2 3 1 2 3 1 2 3]’;
val = [0.1:0.1:0.9]’;
A = accumarray(ind,val,[3 1],@findmax);
function ix = findmax(s);
ix = find(val == max(s),1);
end
end
- Stephen
Thanks a lot. Your suggestion actually does the job but is much more memory demanding. When the number of unique indices becomes large I would prefer calling accumarray twice.
- Bob
thanks, too. But your suggestion has the same problem as the one I suggested. Since accumarray works on the subsets of val it returns the indices inside these subsets as shown by Peter in the aforementioned post.
I tried to post it on the newsfeed but didn’t work for some reason.
Wolfgang,
How about this:
A = accumarray(ind,1:numel(val),[3 1], @(x) findmax(x,val))
where
function ix = findmax(indx, s)
[m,ix] = max(s(indx));
ix = indx(ix);