# Under-appreciated accumarray 59

Posted by **Loren Shure**,

The MATLAB function `accumarray` seems to be under-appreciated. `accumarray` allows you to aggregate items in an array in the way that you specify.

### Contents

### Newsgroup Statistics

Since `accumarray` has been in MATLAB (7.0, R14), there have been over 100 threads in the MATLAB newsgroup where `accumarray` arose as a solution.

### Recent Questions

One of the more recent threads asks how to aggregate values in one list based on another list. Suppose the lists are

group = [1 2 2 2 3 3]' data = [6 43 3 4 2 5]'

group = 1 2 2 2 3 3 data = 6 43 3 4 2 5

and the goal is to `sum` the `data` in each `group`. Let's first create the first input argument. `accumarray` wants the an array of subscripts of the `data` pertaining to which output value the `data` belongs to. Since we're just producing a column vector with 3 values, we just append a column of `ones` to the `group` vector.

indices = [group ones(size(group))]

indices = 1 1 2 1 2 1 2 1 3 1 3 1

### Next We Accumulate

Since the default function for accumulation is `sum`, we can use the simplest form of `accumarray` to get the desired results.

sums = accumarray(indices, data)

sums = 6 50 7

### Another Way to Accumulate

We can instead accumulate the results by adding 2 input arguments to the function call. These are the a size vector for the output array and a function handle specifying the accumulating function.

sums1 = accumarray(indices, data, [numel(unique(group)) 1], @sum)

sums1 = 6 50 7

It's easy to see that the results from the two function calls are the same.

isequal(sums, sums1)

ans = 1

### Other Accumulation Functions

Sometimes, summing the results isn't what I'm looking for. Having puzzled out the 4 input call syntax, I can now simply replace
the accumulation function. To find the maximum values in each `group`, I use this code.

maxData = accumarray(indices, data, [numel(unique(group)) 1], @max)

maxData = 6 43 5

```
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
@(x)~any(isfinite(x)))
```

maxData = 0 0 0

```
data(end) = Inf
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
@(x)~any(isfinite(x)))
```

data = 6 43 3 4 2 Inf maxData = 0 0 0

```
maxData = accumarray(indices, data, [numel(unique(group)) 1], ...
@(x)all(isfinite(x)))
```

maxData = 1 1 0

### Derivative Work

John D'Errico made a more general function `consolidator`, found on the MathWorks File Exchange to allow you to do some extra aggregation. For example, `consolidator` allows the aggregation of elements when they are within a specified tolerance and not just identical.

### Do You accum?

Some other obvious accumulation functions you might use include `sum`, `max`, `min`, `prod`. What functions do you use in situations when you aggregate with `accumarray`?

Get the MATLAB code

Published with MATLAB® 7.5

**Category:**- Less Used Functionality,
- New Feature

### Note

Comments are closed.

## 59 CommentsOldest to Newest

**1**of 59

**2**of 59

**3**of 59

**4**of 59

**5**of 59

**6**of 59

**7**of 59

**8**of 59

**9**of 59

**10**of 59

**11**of 59

**12**of 59

**13**of 59

**14**of 59

**15**of 59

**16**of 59

%% Here's the "Data" for Example 1 val = 101:105; subs = [1; 2; 4; 2; 4] %% Here's the accumarray Solution % We are accumulating values from val based on like-values % in subs A = accumarray(subs, val) %% Another Method % Here's the outline of perhaps a more "standard" way to % think of this. % % * First find out how many unique indices there are in % subs. The length % of this array corresponds to the maximum index value in % subs. This is the size of the output array. % * Pre-allocate the output array to be the correct size. % * Loop through the *values* in subs, which range from % 1:max(subs). % %% Find How Many Times Each Index is Repeated % For indices, find out how many of each value. n = hist(subs,max(subs)) %% % Verify that the maximum value in subs is indeed the number % of bins from calling hist. max(subs) == length(n) %% % Create output array. Aother = zeros(length(n),1); %% % Add each entry in val to the "appropriate" output entry for k = 1:length(n); % Find correct subscript for adding to Aother % logical index for val containing n(k) nonzeros ind = (subs == k) % verify that the number of nonzeros is the expected % amount tf = n(k) == nnz(ind) % here are the values in the data from the % corresponding indices valk = val(ind) % sum up these values Aother(k) = sum(val(ind)); end %% Compare Output Values agree = isequal(Aother,A)--Loren

**17**of 59

**18**of 59

**19**of 59

**20**of 59

**21**of 59

**22**of 59

**23**of 59

**24**of 59

**25**of 59

**26**of 59

**27**of 59

**28**of 59

**29**of 59

**30**of 59

**31**of 59

**32**of 59

**33**of 59

**34**of 59

**35**of 59

**36**of 59

**37**of 59

**38**of 59

**39**of 59

% Francis Esmonde-White, May 6, 2010 % example input. val = 101:105; subs = [1 1 1; 2 1 2; 2 3 2; 2 1 2; 2 3 2]; % equivalent functionality to basic accumarray output_dimensions = max(subs); output = zeros(output_dimensions); for ix=1:numel(output_dimensions) subs2{ix} = subs(:,ix)'; end ind = sub2ind(output_dimensions,subs2{:}); ind_list=unique(ind); for ix = ind_list % note that the operation is done here. The sum function can be % substituted for (an)other function(s). output(ix)=sum(val(ind==ix)); end

**40**of 59

**41**of 59

**42**of 59

**43**of 59

A = accumarray(subs, val); % or, hopefully sometime soon, A = ifeval(subs, val, [1, 4], @sum); % A(1) = sum(val(subs==1)) = 101 % A(2) = sum(val(subs==2) = val(2)+val(4) = 102+104 = 206 % A(3) = sum(val(subs==3)) = 0 % A(4) = sum(val(subs==4)) =val(3)+val(5) = 103+105 = 208

**44**of 59

A = ifeval(subs, val, [1, 4], @sum) % A(1) = sum(val(subs==1)) = 0 + val(1) = 101 % A(2) = sum(val(subs==2) = 0 + val(2)+val(4) = 102+104 = 206 % A(3) = sum(val(subs==3)) = 0 + [] = 0 % A(4) = sum(val(subs==4)) = 0 + val(3)+val(5) = 103+105 = 208You could then show the same example in a more generalized context like you showed earlier:

% * First find out how many unique indices there are in % subs. n = hist(subs,max(subs)) % The length of this array corresponds to the maximum % index value in subs. This is the size of the output array. % * Pre-allocate the output array A2 to be the correct size A2 = zeros(length(n),1); % % Add each entry in val to the "appropriate" output entry for k = 1:length(n); % Find correct subscript for adding to A2 % logical index for val containing n(k) nonzeros ind = (subs == k) % here are the values in the data from the % corresponding indices val_k = val(ind) % sum up these values A2(k) = sum(val_k) end % check to make sure both approaches give the same answer; isequal(A,A2)Regards, Eric

**45**of 59

A=accumarray([4e5;4e5]*[1 1],1,[4e5 4e5],[],[],true);which shoud give me a sparse array with only one element at (4e5,4e5) with value 2, but instead gave me the error. A way around this problem is to use

A=sparse([4e5;4e5],[4e5;4e5],1);which gives me the result I want. Now A(4e5,4e5)=2. However, with this method you cannot use any other function than the summation to accumulate your results. Accumarray is a very powerful function that I like to use, but it would be nice if the maximum value could be increased. By the way, I noticed that another computer (64 bits and a newer Matlab version) did allow a larger number, but the also here, the 'sparse' function can still surpass 'accumarray'. Regards, Ezra

**46**of 59

**47**of 59

**48**of 59

**49**of 59

**50**of 59

>> strs = {'a' 'bb' 'ccc'}'; >> s = strs(randi(3,10,1)); >> x = randn(10,1); >> c = [s num2cell(x)] c = 'bb' [ 0.74808] 'ccc' [-0.19242] 'ccc' [ 0.88861] 'ccc' [-0.76485] 'bb' [ -1.4023] 'a' [ -1.4224] 'a' [ 0.48819] 'a' [-0.17738] 'ccc' [-0.19605] 'a' [ 1.4193]Now pretend we don't know where c came from. Create the group indices from the strings ...

>> [u,~,i] = unique(c(:,1));... and group the numeric data by the indices

>> groups = accumarray(i,cell2mat(c(:,2)),size(u),@(t) {t}); >> groups{:} ans = -1.4224 -0.17738 0.48819 1.4193 ans = 0.74808 -1.4023 ans = -0.19242 -0.19605 0.88861 -0.76485Or compute grouped means

>> groupMeans = accumarray(i,cell2mat(c(:,2)),size(u),@mean) groupMeans = 0.076938 -0.3271 -0.066178Now, ACCUMARRAY requires the second input to be numeric, so in the above, you have to convert the second column of your cell array to a numeric column. I imagine you have this one cell array because you wanted to mix string and numeric data in a single array. If you have access to the Statistics Toolbox, you may find that using a dataset array instead of a cell array makes your life easier. In fact, there's a function called GRPSTATS that will do both of the above for you. You may also find that using a nominal (or ordinal) array for your string data, rather than a cell array makes your life easier too. The dataset array would then contain one nominal and one numeric column. For example:

>> s = nominal(s); >> d = dataset(s,x) d = s x bb 0.74808 ccc -0.19242 ccc 0.88861 ccc -0.76485 bb -1.4023 a -1.4224 a 0.48819 a -0.17738 ccc -0.19605 a 1.4193 >> groupMeans = grpstats(d,'s') groupMeans = s GroupCount mean_x a a 4 0.076938 bb bb 2 -0.3271 ccc ccc 4 -0.066178Hope this helps.

**51**of 59

**52**of 59

**53**of 59

**54**of 59

c = 'ccc' [16] [71] 'n' 'ccc' [98] [ 4] 'p' 'a' [96] [28] 'n' 'ccc' [49] [ 5] 'n' 'bb' [81] [10] 'n' 'a' [15] [83] 'n' 'a' [43] [70] 'n' 'bb' [92] [32] 'p' 'ccc' [80] [96] 'p' 'ccc' [96] [ 4] 'p'When I put the grouped elements in their respective bins, the corresponding elements in columns 3 and 4 also need to go into the bins in the same rows as the elements of col 2. Is this possible? Thank you.

**55**of 59

groups2 = accumarray(n,cell2mat(c1(:,2)),size(u),@(t) {t}); groups3 = accumarray(n,cell2mat(c1(:,3)),size(u),@(t) {t}); groups4 = accumarray(n,cell2mat(c1(:,4)),size(u),@(t) {t});The other solution I can think of is to first get the list of unique strings using 'unique' and then use a loop along with 'strcmpi' to find the indices for all the occurrences of each unique string in the main cell array and accumulate the rows manually in separate cells or the fields of a structure. Other than these two solutions, is there a better way? Thanks a lot!

**56**of 59

**57**of 59

**58**of 59

[u,~,i] = unique(c(:,1)); r = (1:size(c,1))'; groups = accumarray(i,r,size(u),@(t) { c(t,2:4) });

**59**of 59

## Recent Comments