Let Me Count the Ways
In the course of many applications, we sometimes need to tally up counts for different purposes, e.g., investigating the distribution of some data or equalizing gray tones in an image.
Contents
In version 4 of MATLAB, we introduced sparse matrices. A bit later, we introduced the first release of the Image Processing Toolbox; and for even more information, see Steve's blog. We then had several says we could count up the number of pixels in an indexed image for each color in the colormap. A few of the ways are
- Run a for loop over the number of colors and find the elements in the image with that value.
- Use hist to create a histogram with the bins corresponding to the number of colors.
- Use sparse matrices to tally the bins, especially since sparse performs accumulation as it goes.
Counting by Color - Brute Force
Here's code to show how to bin the data by brute force.
Load the data.
clown = load('clown');
lcmap = clown.map;
nc = length(lcmap);
X = clown.X;
Set up the output array for collecting the binned information.
count = zeros(1,nc);
Loop over the image elements and augment the colormap entry by 1 for each hit.
M = numel(X); for ind=1:M count(X(ind)) = count(X(ind))+1; end
Counting by Color - Accumulation Using sparse
Think first about creating an M-by-nc array, mostly zeros, with one 1 per row, in the color column to identify the color.
S = sparse(1:M,X(:),1,M,nc);
We could do this and then sum each column. However, sparse automatically accumulates duplicate elements rather than replacing them, so instead we can create a 1-by-nc sparse array, and then convert it to full when we're done.
counts = full(sparse(1,X(:),1,1,nc));
If you looked at this M-file, you would now see an mlint message for the previous line of code. Here it is.
mlhint = mlint('meanValues');
disp(mlhint.message)
ACCUMARRAY([r,c],v,[m,n]) is faster than FULL(SPARSE(r,c,v,m,n)).
It suggests to us the next technique to use.
Counting by Color - Accumulation Using accumarray
Following the message as closely as we can (though we can't have the first column of the first input scalar since the second column is a long vector), we get try the following code. We can also use the help for accumarray to guide us.
counta = accumarray([ones(M,1) X(:)], 1 , [1 nc]);
Let's make sure all the techniques get the same answer.
agreement = isequal(count, counts, counta)
agreement = 1
Computing Mean Values
There was a thread recently on the MATLAB newsgroup asking how to compute the mean value of portions of an array, the partitions described via arrays of indices. Sounds like an image histogram, only not -- because of computing the mean value instead of the total number. Here's the posted advice. help accumarray
HTH, John D'Errico
While I know that information helped the user that day, I suspect not everyone knew exactly what to do with it. So let's try it here.
Here are the details and the original code (except for the sizes). Original sizes: m = 1245; n = 1200; maxnum = 50;
m = 100; n = 100; maxnum = 22;
Create the input matrices.
observed = rand(m,n); num = 1:maxnum; arr1 = num(ceil(maxnum*rand(size(observed)))); arr2 = num(ceil(maxnum*rand(size(observed))));
Computing Mean Values - Straight-forward Double Loop
meanMatrix=zeros(maxnum,maxnum); tic for i=1:maxnum for ii=1:maxnum meanMatrix(i,ii) = mean(observed(arr1==i & arr2 ==ii)); end end looptime = toc
looptime = 0.1494
Computing Mean Values - Using accumarray
There are a whole class of functions I can use with accumarray to calculate different information about these arrays. These functions fall into a class of operations that other languages, e.g., APL, call reduction operations and they consist of functions such as sum, mean, and max, for example that return a scalar value for vector input.
tic means = accumarray([arr1(:),arr2(:)], observed(:), [maxnum maxnum], @mean, 0); accumtime = toc
accumtime = 0.0129
Notice how much faster the accumarray solution is than the loop, even with the output pre-allocated.
Ensuring Identical Results
If we compare the results, and not just the times, for the different ways to produce the mean values, we see the results are not identical.
norm(means-meanMatrix)
ans = 7.3041e-016
What's going on here? I will show you that the difference depends on the ordering of the elements by doing the same calculation with a different program to produce the mean. In this program, we make sure the data are sorted before we calculate the result.
dbtype sortmean
1 function y = sortmean(x) 2 %SORTMEAN Sort data and find mean. 3 y = mean(sort(x));
Do the calculation with sortmean with loops and with accumarray and compare results.
for i=1:maxnum for ii=1:maxnum meanMatrix(i,ii) = sortmean(observed(arr1==i & arr2 ==ii)); end end means = accumarray([arr1(:),arr2(:)], observed(:), [maxnum maxnum], @sortmean, 0);
We can see now that if we tightly control the order of the calculations for the mean value, we get the same answers, though the loop version still takes longer.
equalmeans = isequal(means,meanMatrix)
equalmeans = 1
Conclusion
Once again, there are many ways to reach the same outcome in MATLAB, with different characteristics in terms of readability, performance, etc. Which of the solutions used here would you use, or do you have some other favorite techniques? Post them here.