Loren on the Art of MATLAB

Turn ideas into MATLAB

Note

Loren on the Art of MATLAB has been archived and will not be updated.

Debugging Grouped Operations

Today's guest post comes from Sean de Wolski, one of Loren's fellow Application Engineers. You might recognize him from MATLAB answers and the pick of the week blog!

One of my colleagues approached me last month and asked for help debugging an error with splitapply. Splitapply takes group information and applies a function to each group in the data (sort of like a pivot table). Note, that everything here also applies to the lower level but more powerful function accumarray.

The documentation provides numerous simple examples for what splitapply does so check them out if you're not familiar with it.

Here is an anonymized version of the data set my colleague had. The first three variables are categorical identifiers and the fourth are some data associated with them.

load('Bears.mat')
disp(Bears)
     Candy        Animal        Sports                       Bytes                 
    ________    __________    ___________    ______________________________________
    Cinnamon    Black         Brown             -0.8095       0.40391             2
    Cinnamon    Polar         Brown             -2.9443      0.096455             3
    Cinnamon    Polar         Brown              1.4384       0.13197             9
    Cinnamon    Sloth         Chicago           0.32519       0.94205             1
    Cinnamon    Sloth         Chicago          -0.75493       0.95613             5
    Cinnamon    Sloth         Chicago            1.3703       0.57521             2
    Cinnamon    Sloth         Chicago           -1.7115       0.05978            10
    Cinnamon    Sloth         Chicago          -0.10224       0.23478             8
    Cinnamon    Sun           Baylor           -0.24145       0.35316             6
    Cinnamon    Sun           Baylor            0.31921       0.82119             5
    Cinnamon    Polar         Brown             0.31286      0.015403             1
    Cinnamon    Polar         Brown            -0.86488      0.043024             7
    Cinnamon    Black         Brown           -0.030051       0.16899             1
    Cinnamon    Black         Brown            -0.16488       0.64912             1
    Cinnamon    Polar         Brown             0.62771       0.73172             6
    Cinnamon    Polar         Brown              1.0933       0.64775             1
    Cinnamon    Spectacled    Coast Guard        1.1093       0.45092             9
    Cinnamon    Sloth         Chicago          -0.86365       0.54701             9
    Gummy       Sun           Baylor           0.077359       0.29632             8
    Gummy       Sun           Baylor            -1.2141       0.74469             2
    Gummy       Sun           Baylor            -1.1135       0.18896             7
    Gummy       Sun           Baylor         -0.0068493       0.68678             6
    Gummy       Sun           Baylor             1.5326       0.18351            10
    Gummy       Sloth         Chicago          -0.76967       0.36848             7
    Gummy       Sloth         Chicago           0.37138       0.62562             9
    Gummy       Spectacled    Coast Guard      -0.22558       0.78023             5
    Gummy       Spectacled    Coast Guard        1.1174      0.081126             5
    Gummy       Spectacled    Coast Guard       -1.0891       0.92939             9
    Gummy       Sloth         Chicago          0.032557       0.77571             1
    Gummy       Sloth         Chicago           0.55253       0.48679             2
    Gummy       Spectacled    Coast Guard        1.1006       0.43586             2
    Gummy       Spectacled    Coast Guard        1.5442       0.44678             4
    Gummy       Spectacled    Coast Guard      0.085931       0.30635             9
    Gummy       Spectacled    Coast Guard       -1.4916       0.50851             9
    Gummy       Spectacled    Coast Guard       -0.7423       0.51077             1
    Gummy       Spectacled    Coast Guard       -1.0616       0.81763             4
    Gummy       Spectacled    Coast Guard        2.3505       0.79483             6
    Gummy       Sloth         Chicago           -0.6156       0.64432             5
    Gummy       Spectacled    Coast Guard       0.74808       0.37861             7
    Gummy       Sloth         Chicago          -0.19242       0.81158             7
    Gummy       Spectacled    Coast Guard       0.88861       0.53283             3
    Gummy       Spectacled    Coast Guard      -0.76485       0.35073             5
    Gummy       Spectacled    Coast Guard       -1.4023         0.939             1
    Gummy       Black         Brown             -1.4224       0.87594            10
    Gummy       Sloth         Chicago           0.48819       0.55016             2
    Gummy       Sloth         Chicago          -0.17738       0.62248             2
    Gummy       Sloth         Chicago          -0.19605       0.58704             4
    Gummy       Sloth         Chicago            1.4193       0.20774             2
    Gummy       Sloth         Chicago           0.29158       0.30125             5
    Gummy       Sloth         Chicago           0.19781       0.47092             4
    Gummy       Polar         Brown              1.5877       0.23049            10
    Gummy       Polar         Brown            -0.80447       0.84431            10
    Gummy       Sloth         Chicago           0.69662       0.19476             1
    Gummy       Black         Brown             0.83509       0.22592             8
    Gummy       Black         Brown            -0.24372       0.17071             3
    Gummy       Sloth         Chicago           0.21567       0.22766             5
    Gummy       Black         Brown             -1.1658        0.4357             6
    Gummy       Sloth         Chicago            -1.148        0.3111            10
    Gummy       Sloth         Chicago           0.10487       0.92338             5
    Gummy       Sloth         Chicago           0.72225       0.43021            10
    Gummy       Sloth         Chicago            2.5855       0.18482             4
    Gummy       Sloth         Chicago          -0.66689       0.90488             8
    Gummy       Sloth         Chicago           0.18733       0.97975             7
    Gummy       Sloth         Chicago         -0.082494       0.43887             6
    Gummy       Sloth         Chicago            -1.933       0.11112             7
    Gummy       Sloth         Chicago          -0.43897       0.25806             7
    Gummy       Sloth         Chicago           -1.7947       0.40872             2
    Gummy       Sloth         Chicago           0.84038        0.5949             2
    Gummy       Sloth         Chicago          -0.88803       0.26221            10
    Gummy       Sloth         Chicago           0.10009       0.60284             2
    Gummy       Sloth         Chicago          -0.54453       0.71122             1
    Gummy       Sloth         Chicago           0.30352       0.22175             6
    Gummy       Sloth         Chicago          -0.60033       0.11742             9
    Gummy       Sloth         Chicago           0.48997       0.29668             7
    Gummy       Sloth         Chicago           0.73936       0.31878             2
    Gummy       Sloth         Chicago            1.7119       0.42417             4
    Gummy       Sloth         Chicago          -0.19412       0.50786             5
    Gummy       Sloth         Chicago           -2.1384      0.085516            10
    Gummy       Sloth         Chicago          -0.83959       0.26248             2
    Gummy       Sloth         Chicago            1.3546       0.80101             9
    Gummy       Sloth         Chicago           -1.0722       0.02922             7
    Gummy       Sloth         Chicago           0.96095       0.92885             4
    Gummy       Sloth         Chicago           0.12405       0.73033             2
    Gummy       Sloth         Chicago            1.4367       0.48861             5
    Gummy       Sloth         Chicago           -1.9609       0.57853             5
    Gummy       Black         Brown             -0.1977       0.23728             2
    Gummy       Sloth         Chicago           -1.2078       0.45885             6
    Gummy       Sloth         Chicago             2.908       0.96309             3
    Gummy       Sloth         Chicago           0.82522       0.54681             4
    Gummy       Sloth         Chicago             1.379       0.52114             6
    Gummy       Sloth         Chicago           -1.0582       0.23159             3
    Gummy       Sloth         Chicago          -0.46862        0.4889             3
    Gummy       Black         Brown            -0.27247       0.62406             7
    Gummy       Polar         Brown              1.0984       0.67914             3
    Gummy       Sloth         Chicago          -0.27787       0.39552             9
    Gummy       Sloth         Chicago           0.70154       0.36744            10
    Gummy       Sloth         Chicago           -2.0518       0.98798             8
    Gummy       Black         Brown            -0.35385      0.037739             4
    Gummy       Sloth         Chicago          -0.82359       0.88517             6
    Gummy       Sloth         Chicago           -1.5771       0.91329             2
    Gummy       Sloth         Chicago           0.50797       0.79618            10
    Gummy       Sloth         Chicago           0.28198      0.098712             9
    Gummy       Sloth         Chicago           0.03348       0.26187             9
    Gummy       Sloth         Chicago           -1.3337       0.33536             3
    Gummy       Sloth         Chicago            1.1275       0.67973             6
    Gummy       Sloth         Chicago           0.35018       0.13655             1
    Gummy       Sloth         Chicago          -0.29907       0.72123             5
    Gummy       Sloth         Chicago           0.02289       0.10676             4
    Gummy       Sloth         Chicago            -0.262       0.65376             2
    Gummy       Sloth         Chicago           -1.7502       0.49417             2
    Gummy       Sloth         Chicago          -0.28565       0.77905             5
    Gummy       Black         Brown            -0.83137       0.71504             1
    Gummy       Sloth         Chicago          -0.97921       0.90372             6
    Gummy       Sloth         Chicago           -1.1564       0.89092             5
    Gummy       Sloth         Chicago          -0.53356       0.33416             7
    Gummy       Sloth         Chicago           -2.0026       0.69875             7
    Gummy       Sloth         Chicago           0.96423       0.19781             7
    Gummy       Sloth         Chicago           0.52006      0.030541             1
    Gummy       Sloth         Chicago         -0.020028       0.74407             1
    Gummy       Sloth         Chicago         -0.034771       0.50002             4
    Gummy       Sloth         Chicago          -0.79816       0.47992             6
    Gummy       Sloth         Chicago            1.0187       0.90472             7
    Gummy       Sloth         Chicago          -0.13322       0.60987             5
    Gummy       Sloth         Chicago          -0.71453       0.61767             9
    Gummy       Sloth         Chicago            1.3514       0.85944             8
    Gummy       Sloth         Chicago          -0.22477       0.80549            10
    Cinnamon    Polar         Brown            -0.58903       0.57672             6

The operation he was trying to calculate was the nan-omitted mean of Bytes based on two of the categories.

[animalcandy, animal, candy] = findgroups(Bears.Animal,Bears.Candy);
meanbyte = splitapply(@(x)mean(x, 'omitnan'), Bears.Bytes, animalcandy);
Error using vertcat
Dimensions of arrays being concatenated are not consistent.
Error in splitapply>localapply (line 257)
            finalOut{curVar} = vertcat(funOut{:,curVar}); 
Error in splitapply (line 132)
varargout = localapply(fun,splitData,gdim,nargout);
Error in mainDebuggingGroupedOps (line 32)
meanbyte = splitapply(@(x)mean(x, 'omitnan'), Bears.Bytes, animalcandy);

Hmm, I've seen that error before, but what does it have to do with this? How do we debug this? One could put a break point at the anonymous function @(x)mean(x, 'omitnan') and then step with the debugger until the error occurs.

This would likely work for a small number of groups, but as the number of groups gets larger, it would be lots of steps, one for each function evaluation. You'd also likely have to do it twice, a second time after the error occurs. Setting the debugger to stop on errors may work as well for splitapply but not for accumarray which is builtin and even with splitapply may not stop you in a useful spot.

A trick I like to use is to just replace the function handle with {}. This takes whatever is provided and packs it into a cell so you can see exactly what is being passed into each function evaluation for each group.

bytecell = splitapply(@(x){x}, Bears.Bytes, animalcandy);
disp(bytecell)
    [14×3 double]
    [ 1×3 double]
    [ 5×3 double]
    [ 2×3 double]
    [78×3 double]
    [ 6×3 double]
    [ 8×3 double]
    [ 3×3 double]
    [ 3×3 double]
    [ 7×3 double]

From here we can see that the second cell has only one row. Since mean takes the mean of the first non-singleton dimension, it's reducing this to a scalar by taking the mean of the row where the rest of the elements are coming rows from taking the mean of columns. A scalar can't concatenate with a matrix so we get the error.

The fix for this is simple, pass in the dimension to mean to force it to always take column mean. Then rebuild the table with the labels.

meanbyte = splitapply(@(x)mean(x, 1, 'omitnan'), Bears.Bytes, animalcandy);

disp(table(animal, candy, meanbyte))
      animal       candy                   meanbyte              
    __________    ________    ___________________________________
    Spectacled    Gummy        0.075572      0.55805            5
    Spectacled    Cinnamon       1.1093      0.45092            9
    Sun           Gummy         -0.1449      0.42005          6.6
    Sun           Cinnamon      0.03888      0.58718          5.5
    Sloth         Gummy       -0.081113      0.51523       5.2949
    Sloth         Cinnamon     -0.28948      0.55249       5.8333
    Black         Gummy        -0.45653       0.4153        5.125
    Black         Cinnamon     -0.33481      0.40734       1.3333
    Polar         Gummy         0.62722      0.58464       7.6667
    Polar         Cinnamon     -0.13228      0.32043       4.7143

In this case, the fix was fairly discernible from a quick inspection. If it was not, we could loop over the cell and evaluate the function on each element to see where the error occurs. If the error occurs on a specific cells' data, the loop will stop there and we can investigate. If it's on the concatenate step, that'll be obvious at the end.

fun = @(x)mean(x, 'omitnan');
meanbytecell = cell(size(bytecell));
for ii = 1:numel(bytecell)
    meanbytecell{ii} = fun(bytecell{ii});
end
disp(meanbytecell)
    [1×3 double]
    [    3.5201]
    [1×3 double]
    [1×3 double]
    [1×3 double]
    [1×3 double]
    [1×3 double]
    [1×3 double]
    [1×3 double]
    [1×3 double]

And now it is obvious why these won't concatenate.

I find looping over the cell in this manner to be much easier than looping over the original data set and trying to identify which elements are in which groups and indexing correctly.

I'm also a big fan of using splitapply/accumarray with cell output for making objects or plots based on grouped data where the object can't be returned directly. Continuing this example we'll use a histogram for each group of original data, wrapping histogram in {}.

figure
axes('ColorOrder', parula(numel(animal)))
hold on
h = splitapply(@(x){histogram(x)}, Bears.Bytes, animalcandy);
legend([h{:}], compose("%s/%s", animal, candy));

On an aside, development has been working to make grouped operations easier over the last few releases with a collection of new functions:

Doing this same operation with varfun would look like this:

meanbytetable = varfun(@(x)mean(x, 1, 'omitnan'), Bears, ...
    'GroupingVariables', {'Animal', 'Candy'}, ...
    'InputVariables', {'Bytes'});
disp(meanbytetable)
      Animal       Candy      GroupCount                 Fun_Bytes             
    __________    ________    __________    ___________________________________
    Spectacled    Gummy           14         0.075572      0.55805            5
    Spectacled    Cinnamon         1           1.1093      0.45092            9
    Sun           Gummy            5          -0.1449      0.42005          6.6
    Sun           Cinnamon         2          0.03888      0.58718          5.5
    Sloth         Gummy           78        -0.081113      0.51523       5.2949
    Sloth         Cinnamon         6         -0.28948      0.55249       5.8333
    Black         Gummy            8         -0.45653       0.4153        5.125
    Black         Cinnamon         3         -0.33481      0.40734       1.3333
    Polar         Gummy            3          0.62722      0.58464       7.6667
    Polar         Cinnamon         7         -0.13228      0.32043       4.7143

Do you work with grouped data functions? Let us know here.




Published with MATLAB® R2018b


  • print

댓글

댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.