Comments on: Under-appreciated accumarray

By: Par

Par — Mon, 14 Nov 2011 23:34:06 +0000

Got it Peter, completed the exercise :) Your solution is quite elegant. I am happy that I finally understood this function. The fact that you can pass a function handle makes this function so powerful!

By: Peter Perkins

Peter Perkins — Mon, 14 Nov 2011 16:32:59 +0000

Par –

1) You should not depend on the internal implementation of accumarray. Stick to what the help tells you.

2) There are probably several ways to do what you’re asking about, but here’s the way I’d approach it (I’ll leave it as an exercise to figure out how this works):

   [u,~,i] = unique(c(:,1));
   r = (1:size(c,1))';
   groups = accumarray(i,r,size(u),@(t) { c(t,2:4) });

By: Peter Perkins

Peter Perkins — Mon, 14 Nov 2011 16:20:43 +0000

Gursu, meanangle appears to be a submission to the MATLAB File Exchange. I know nothing about it, but since accumarray accetps a function handle, I can only assume that you could pass accumarray a function handle to meanangle.

By: gursu

gursu — Sun, 13 Nov 2011 21:45:28 +0000

is it possible to implement @meanangle, or better, @nanmeanangle function to accumarray?

By: Par

Par — Sat, 12 Nov 2011 15:13:48 +0000

Hello Again Peter,

Continuing on my latest reply, I figure there are two ways to solve the problem. One is to first sort the original cell array based on its first column so that when I use ‘unique’ to get the ‘n’ vector, it will be sorted. Then, I can use 3 separate accumarray statements as follows:

groups2 = accumarray(n,cell2mat(c1(:,2)),size(u),@(t) {t});
groups3 = accumarray(n,cell2mat(c1(:,3)),size(u),@(t) {t});
groups4 = accumarray(n,cell2mat(c1(:,4)),size(u),@(t) {t});

The other solution I can think of is to first get the list of unique strings using ‘unique’ and then use a loop along with ‘strcmpi’ to find the indices for all the occurrences of each unique string in the main cell array and accumulate the rows manually in separate cells or the fields of a structure.

Other than these two solutions, is there a better way?

Thanks a lot!

By: Par

Par — Fri, 11 Nov 2011 23:16:08 +0000

Sorry Peter. I completely overlooked another requirement in my problem description earlier. Actually, I have two more columns in the array 'c', for example, 'c' looks like this:


c = 
    'ccc'    [16]    [71]    'n'
    'ccc'    [98]    [ 4]    'p'
    'a'      [96]    [28]    'n'
    'ccc'    [49]    [ 5]    'n'
    'bb'     [81]    [10]    'n'
    'a'      [15]    [83]    'n'
    'a'      [43]    [70]    'n'
    'bb'     [92]    [32]    'p'
    'ccc'    [80]    [96]    'p'
    'ccc'    [96]    [ 4]    'p'

When I put the grouped elements in their respective bins, the corresponding elements in columns 3 and 4 also need to go into the bins in the same rows as the elements of col 2. Is this possible? Thank you.

By: Par

Par — Fri, 11 Nov 2011 22:14:19 +0000

Got it! Now I understand the utility of this function much more clearly. Thank you so much for your explanations.

About the unpredictable order of the resulting subset elements, I wonder then, the inner working of accumarray is not as simple as scanning the input array element by element and dropping each element into its appropriate bin as the code moves forward. I am guessing that would have been quite inefficient?

By: Peter Perkins

Peter Perkins — Fri, 11 Nov 2011 18:28:21 +0000

Par, the help for ACCUMARRAY says, “Note: If the subscripts in SUBS are not sorted, FUN should not depend on the order of the values in its input data.” What that means is what you observed: for FUN == @(t) {t}, the order of the elements in the output is not predictable. HOWEVER, if the subscripts _are_ sorted, then the order of the elements will be what you expect. So, if the order of the elements in the output is critical, you can always sort SUBS, and reorder VAL in parallel to that using the second output from SORT.

The following demonstrates that using ISSORTED is a win, at least for large arrays:

>> x = randn(10000000,1);
>> xs = sort(x);
>>
>> tic, y = sort(x); toc
Elapsed time is 0.429657 seconds.
>> tic, y = sort(xs); toc
Elapsed time is 0.112595 seconds.
>> tic, if ~issorted(x), y = sort(x); end, toc
Elapsed time is 0.420848 seconds.
>> tic, if ~issorted(xs), y = sort(xs); end, toc
Elapsed time is 0.021241 seconds.

By: Par

Par — Fri, 11 Nov 2011 16:08:05 +0000

Thanks a lot, Peter. That really helps! Your example is also a great aid to understand how this works. Appreciate your help.

Just one more question: The elements in the subsets, i.e., in the “groups” cell array, do not seem to be in any particular order. I thought they might be sorted in some order, either in ascending/descending, or in the order of occurrence in the “c” array. But they seem to be randomly ordered. Is that so? Could you please throw some light on this? Of course, it is no big deal to use the sort command to sort them, but if I know they are already sorted, then I wouldn’t bother about sorting as the data I am dealing with is quite large, about 10-15 million elements.

By the way, another quick question: If we are unsure about the sort status of an array, then I know we can use the ‘issorted’ command to check. Now if the array happens to be already sorted, then does the redundant checking cost any computation time? Even if the array is very large?

Thanks!

By: Peter Perkins

Peter Perkins — Thu, 10 Nov 2011 13:43:30 +0000

Par, yes you can. Your data are perhaps not in the most convenient form, but I'll say more in a moment. First, set up some data like what you described.

>> strs = {'a' 'bb' 'ccc'}';
>> s = strs(randi(3,10,1));
>> x = randn(10,1);
>> c =  [s num2cell(x)]
c = 
    'bb'     [ 0.74808]
    'ccc'    [-0.19242]
    'ccc'    [ 0.88861]
    'ccc'    [-0.76485]
    'bb'     [ -1.4023]
    'a'      [ -1.4224]
    'a'      [ 0.48819]
    'a'      [-0.17738]
    'ccc'    [-0.19605]
    'a'      [  1.4193]

Now pretend we don't know where c came from. Create the group indices from the strings ...

>> [u,~,i] = unique(c(:,1));

... and group the numeric data by the indices

>> groups = accumarray(i,cell2mat(c(:,2)),size(u),@(t) {t});
>> groups{:}
ans =
      -1.4224
     -0.17738
      0.48819
       1.4193
ans =
      0.74808
      -1.4023
ans =
     -0.19242
     -0.19605
      0.88861
     -0.76485

Or compute grouped means

>> groupMeans = accumarray(i,cell2mat(c(:,2)),size(u),@mean)
groupMeans =
     0.076938
      -0.3271
    -0.066178

Now, ACCUMARRAY requires the second input to be numeric, so in the above, you have to convert the second column of your cell array to a numeric column. I imagine you have this one cell array because you wanted to mix string and numeric data in a single array. If you have access to the Statistics Toolbox, you may find that using a dataset array instead of a cell array makes your life easier. In fact, there's a function called GRPSTATS that will do both of the above for you. You may also find that using a nominal (or ordinal) array for your string data, rather than a cell array makes your life easier too. The dataset array would then contain one nominal and one numeric column. For example:

>> s = nominal(s);
>> d = dataset(s,x)
d = 
    s      x       
    bb      0.74808
    ccc    -0.19242
    ccc     0.88861
    ccc    -0.76485
    bb      -1.4023
    a       -1.4224
    a       0.48819
    a      -0.17738
    ccc    -0.19605
    a        1.4193
>> groupMeans = grpstats(d,'s')
groupMeans = 
           s      GroupCount    mean_x   
    a      a      4              0.076938
    bb     bb     2               -0.3271
    ccc    ccc    4             -0.066178

Hope this helps.