{"id":68,"date":"2006-12-01T16:02:52","date_gmt":"2006-12-01T21:02:52","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=68"},"modified":"2006-12-14T10:21:36","modified_gmt":"2006-12-14T15:21:36","slug":"let-me-count-the-ways","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2006\/12\/01\/let-me-count-the-ways\/","title":{"rendered":"Let Me Count the Ways"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p>In the course of many applications, we sometimes need to tally up counts for different purposes, e.g., investigating the distribution\r\n         of some data or equalizing gray tones in an image.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#2\">Counting by Color - Brute Force<\/a><\/li>\r\n         <li><a href=\"#6\">Counting by Color - Accumulation Using sparse<\/a><\/li>\r\n         <li><a href=\"#9\">Counting by Color - Accumulation Using accumarray<\/a><\/li>\r\n         <li><a href=\"#11\">Computing Mean Values<\/a><\/li>\r\n         <li><a href=\"#14\">Computing Mean Values - Straight-forward Double Loop<\/a><\/li>\r\n         <li><a href=\"#15\">Computing Mean Values - Using accumarray<\/a><\/li>\r\n         <li><a href=\"#18\">Ensuring Identical Results<\/a><\/li>\r\n         <li><a href=\"#23\">Conclusion<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p>In version 4 of MATLAB, we introduced <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/sparse.html\"><tt>sparse<\/tt><\/a> matrices.  A bit later, we introduced the first release of the <a href=\"https:\/\/www.mathworks.com\/products\/image\/\">Image Processing Toolbox<\/a>; and for even more information, see <a href=\"https:\/\/blogs.mathworks.com\/steve\/\">Steve's blog<\/a>. We then had several says we could count up the number of pixels in an indexed image for each color in the colormap.  A few\r\n      of the ways are\r\n   <\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Run a for loop over the number of colors and find the elements in the image with that value.<\/li>\r\n         <li>Use <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/hist.html\"><tt>hist<\/tt><\/a> to create a histogram with the bins corresponding to the number of colors.\r\n         <\/li>\r\n         <li>Use <tt>sparse<\/tt> matrices to tally the bins, especially since <tt>sparse<\/tt> performs accumulation as it goes.\r\n         <\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Counting by Color - Brute Force<a name=\"2\"><\/a><\/h3>\r\n   <p>Here's code to show how to bin the data by brute force.<\/p>\r\n   <p>Load the data.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">clown = load(<span style=\"color: #A020F0\">'clown'<\/span>);\r\nlcmap = clown.map;\r\nnc = length(lcmap);\r\nX = clown.X;<\/pre><p>Set up the output array for collecting the binned information.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">count = zeros(1,nc);<\/pre><p>Loop over the image elements and augment the colormap entry by 1 for each hit.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">M = numel(X);\r\n<span style=\"color: #0000FF\">for<\/span> ind=1:M\r\n    count(X(ind)) = count(X(ind))+1;\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><h3>Counting by Color - Accumulation Using sparse<a name=\"6\"><\/a><\/h3>\r\n   <p>Think first about creating an M-by-nc array, mostly zeros, with one 1 per row, in the color column to identify the color.<\/p><pre>  S = sparse(1:M,X(:),1,M,nc);<\/pre><p>We could do this and then sum each column.  However, sparse automatically accumulates duplicate elements rather than replacing\r\n      them, so instead we can create a 1-by-nc sparse array, and then convert it to <tt>full<\/tt> when we're done.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">counts = full(sparse(1,X(:),1,1,nc));<\/pre><p>If you looked at this M-file, you would now see an <tt>mlint<\/tt> message for the previous line of code.  Here it is.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">mlhint = mlint(<span style=\"color: #A020F0\">'meanValues'<\/span>);\r\ndisp(mlhint.message)<\/pre><pre style=\"font-style:oblique\">ACCUMARRAY([r,c],v,[m,n]) is faster than FULL(SPARSE(r,c,v,m,n)).\r\n<\/pre><p>It suggests to us the next technique to use.<\/p>\r\n   <h3>Counting by Color - Accumulation Using accumarray<a name=\"9\"><\/a><\/h3>\r\n   <p>Following the message as closely as we can (though we can't have the first column of the first input scalar since the second\r\n      column is a long vector), we get try the following code.  We can also use the help for <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/accumarray.html\"><tt>accumarray<\/tt><\/a> to guide us.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">counta = accumarray([ones(M,1) X(:)], 1 , [1 nc]);<\/pre><p>Let's make sure all the techniques get the same answer.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">agreement = isequal(count, counts, counta)<\/pre><pre style=\"font-style:oblique\">agreement =\r\n     1\r\n<\/pre><h3>Computing Mean Values<a name=\"11\"><\/a><\/h3>\r\n   <p>There was a <a href=\"\">thread<\/a> recently on the MATLAB newsgroup asking how to compute the mean value of portions of an array, the partitions described via\r\n      arrays of indices. Sounds like an image histogram, only not -- because of computing the <tt>mean<\/tt> value instead of the total number.  Here's the posted advice.   help accumarray\r\n   <\/p><pre> HTH,\r\n John D'Errico<\/pre><p>While I know that information helped the user that day, I suspect not everyone knew exactly what to do with it.  So let's\r\n      try it here.\r\n   <\/p>\r\n   <p>Here are the details and the original code (except for the sizes).   Original sizes:        m = 1245;        n = 1200;   \r\n          maxnum = 50;\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">m = 100;\r\nn = 100;\r\nmaxnum = 22;<\/pre><p>Create the input matrices.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">observed = rand(m,n);\r\nnum = 1:maxnum;\r\narr1 = num(ceil(maxnum*rand(size(observed))));\r\narr2 = num(ceil(maxnum*rand(size(observed))));<\/pre><h3>Computing Mean Values - Straight-forward Double Loop<a name=\"14\"><\/a><\/h3><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">meanMatrix=zeros(maxnum,maxnum);\r\ntic\r\n<span style=\"color: #0000FF\">for<\/span> i=1:maxnum\r\n    <span style=\"color: #0000FF\">for<\/span> ii=1:maxnum\r\n        meanMatrix(i,ii) = mean(observed(arr1==i &amp; arr2 ==ii));\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span>\r\nlooptime = toc<\/pre><pre style=\"font-style:oblique\">looptime =\r\n    0.1494\r\n<\/pre><h3>Computing Mean Values - Using accumarray<a name=\"15\"><\/a><\/h3>\r\n   <p>There are a whole class of functions I can use with <tt>accumarray<\/tt> to calculate different information about these arrays.  These functions fall into a class of operations that other languages,\r\n      e.g., APL, call reduction operations and they consist of functions such as <tt>sum<\/tt>, <tt>mean<\/tt>, and <tt>max<\/tt>, for example that return a scalar value for vector input.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">tic\r\nmeans = accumarray([arr1(:),arr2(:)], observed(:), [maxnum maxnum], @mean, 0);\r\naccumtime = toc<\/pre><pre style=\"font-style:oblique\">accumtime =\r\n    0.0129\r\n<\/pre><p>Notice how much faster the <tt>accumarray<\/tt> solution is than the loop, even with the output pre-allocated.\r\n   <\/p>\r\n   <h3>Ensuring Identical Results<a name=\"18\"><\/a><\/h3>\r\n   <p>If we compare the results, and not just the times, for the different ways to produce the mean values, we see the results are\r\n      not identical.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">norm(means-meanMatrix)<\/pre><pre style=\"font-style:oblique\">ans =\r\n  7.3041e-016\r\n<\/pre><p>What's going on here?  I will show you that the difference depends on the ordering of the elements by doing the same calculation\r\n      with a different program to produce the mean.  In this program, we make sure the data are sorted before we calculate the result.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">dbtype <span style=\"color: #A020F0\">sortmean<\/span><\/pre><pre style=\"font-style:oblique\">\r\n1     function y = sortmean(x)\r\n2     %SORTMEAN Sort data and find mean.\r\n3     y = mean(sort(x));\r\n<\/pre><p>Do the calculation with <tt>sortmean<\/tt> with loops and with <tt>accumarray<\/tt> and compare results.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #0000FF\">for<\/span> i=1:maxnum\r\n    <span style=\"color: #0000FF\">for<\/span> ii=1:maxnum\r\n        meanMatrix(i,ii) = sortmean(observed(arr1==i &amp; arr2 ==ii));\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\nmeans = accumarray([arr1(:),arr2(:)], observed(:), [maxnum maxnum], @sortmean, 0);<\/pre><p>We can see now that if we tightly control the order of the calculations for the mean value, we get the same answers, though\r\n      the loop version still takes longer.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">equalmeans = isequal(means,meanMatrix)<\/pre><pre style=\"font-style:oblique\">equalmeans =\r\n     1\r\n<\/pre><h3>Conclusion<a name=\"23\"><\/a><\/h3>\r\n   <p>Once again, there are many ways to reach the same outcome in MATLAB, with different characteristics in terms of readability,\r\n      performance, etc. Which of the solutions used here would you use, or do you have some other favorite techniques?  Post them\r\n      <a href=\"?p=68#respond\">here<\/a>.\r\n   <\/p><script language=\"JavaScript\">\r\n<!-- \r\n    function grabCode_61369d25625d4b64a7fc7d26a145dfbf() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='61369d25625d4b64a7fc7d26a145dfbf ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 61369d25625d4b64a7fc7d26a145dfbf';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Loren Shure';\r\n        copyright = 'Copyright 2006 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }\r\n-->   \r\n      <\/script>\r\n<noscript>\r\n<em>A Javascript-enabled browser is required to use the \"Get the MATLAB code\" link.<\/em>\r\n<\/noscript>\r\n<p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_61369d25625d4b64a7fc7d26a145dfbf()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code<\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.3<br><\/p>\r\n<\/div>\r\n<!--\r\n61369d25625d4b64a7fc7d26a145dfbf ##### SOURCE BEGIN #####\r\n%% Let Me Count the Ways\r\n% In the course of many applications, we sometimes need to tally up counts\r\n% for different purposes, e.g., investigating the distribution of some\r\n% data or equalizing gray tones in an image.\r\n%%\r\n% In version 4 of MATLAB, we introduced \r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/sparse.html |sparse|>\r\n% matrices.  A bit later, we introduced the first release of the \r\n% <https:\/\/www.mathworks.com\/products\/image\/ Image Processing Toolbox>; and\r\n% for even more information, see \r\n% <https:\/\/blogs.mathworks.com\/steve\/ Steve's blog>.  \r\n% We then had several says we could count up the number of pixels in an \r\n% indexed image for each color in the colormap.  A few of the ways are\r\n%\r\n% * Run a for loop over the number of colors and find the elements in the\r\n% image with that value.\r\n% * Use\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/hist.html |hist|>\r\n% to create a histogram with the bins corresponding to the number of\r\n% colors.\r\n% * Use |sparse| matrices to tally the bins, especially since |sparse|\r\n% performs accumulation as it goes.  \r\n%% Counting by Color - Brute Force\r\n% Here's code to show how to bin the data by brute force.\r\n%%\r\n% Load the data.\r\nclown = load('clown');\r\nlcmap = clown.map;\r\nnc = length(lcmap);\r\nX = clown.X;  \r\n%%\r\n% Set up the output array for collecting the binned information.\r\ncount = zeros(1,nc);\r\n%%\r\n% Loop over the image elements and augment the colormap entry by 1 for each\r\n% hit.\r\nM = numel(X);\r\nfor ind=1:M\r\n    count(X(ind)) = count(X(ind))+1;\r\nend\r\n%% Counting by Color - Accumulation Using sparse\r\n% Think first about creating an M-by-nc array, mostly zeros, with\r\n% one 1 per row, in the color column to identify the color.\r\n%\r\n%    S = sparse(1:M,X(:),1,M,nc);\r\n%\r\n% We could do this and then sum each column.  However, sparse automatically\r\n% accumulates duplicate elements rather than replacing them, so instead we\r\n% can create a 1-by-nc sparse array, and then convert it to |full| when\r\n% we're done.\r\ncounts = full(sparse(1,X(:),1,1,nc));\r\n%%\r\n% If you looked at this M-file, you would now see an |mlint| message for\r\n% the previous line of code.  Here it is.\r\nmlhint = mlint('meanValues');\r\ndisp(mlhint.message)\r\n%%\r\n% It suggests to us the next technique to use.\r\n%% Counting by Color - Accumulation Using accumarray\r\n% Following the message as closely as we can (though we can't have the\r\n% first column of the first input scalar since the second column is a long\r\n% vector), we get try the following code.  We can also use the help for \r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/accumarray.html |accumarray|> \r\n% to guide us.\r\ncounta = accumarray([ones(M,1) X(:)], 1 , [1 nc]);\r\n%%\r\n% Let's make sure all the techniques get the same answer.\r\nagreement = isequal(count, counts, counta)\r\n\r\n%% Computing Mean Values\r\n% There was a \r\n% < thread>\r\n% recently on the MATLAB newsgroup asking how to compute the mean value of\r\n% portions of an array, the partitions described via arrays of indices.\r\n% Sounds like an image histogram, only not REPLACE_WITH_DASH_DASH because of computing the\r\n% |mean| value instead of the total number.  Here's the posted advice.\r\n%   help accumarray\r\n% \r\n%   HTH,\r\n%   John D'Errico\r\n%\r\n% While I know that information helped the user that day, I suspect not\r\n% everyone knew exactly what to do with it.  So let's try it here.\r\n%%\r\n% Here are the details and the original code (except for the sizes).\r\n%   Original sizes:\r\n%        m = 1245;\r\n%        n = 1200;\r\n%        maxnum = 50;\r\nm = 100;\r\nn = 100;\r\nmaxnum = 22;\r\n%%\r\n% Create the input matrices.\r\nobserved = rand(m,n);\r\nnum = 1:maxnum;\r\narr1 = num(ceil(maxnum*rand(size(observed))));\r\narr2 = num(ceil(maxnum*rand(size(observed))));\r\n%% Computing Mean Values - Straight-forward Double Loop\r\nmeanMatrix=zeros(maxnum,maxnum);\r\ntic\r\nfor i=1:maxnum\r\n    for ii=1:maxnum\r\n        meanMatrix(i,ii) = mean(observed(arr1==i & arr2 ==ii));\r\n    end\r\nend\r\nlooptime = toc\r\n\r\n%% Computing Mean Values - Using accumarray\r\n% There are a whole class of functions I can use with |accumarray| to\r\n% calculate different information about these arrays.  These functions fall\r\n% into a class of operations that other languages, e.g., APL, call\r\n% reduction operations and they consist of functions such as\r\n% |sum|, |mean|, and |max|, for example that return a scalar value for\r\n% vector input.\r\n%%\r\ntic\r\nmeans = accumarray([arr1(:),arr2(:)], observed(:), [maxnum maxnum], @mean, 0);\r\naccumtime = toc\r\n%%\r\n% Notice how much faster the |accumarray| solution is than the loop, even\r\n% with the output pre-allocated.\r\n%% Ensuring Identical Results\r\n% If we compare the results, and not just the times, for the different ways\r\n% to produce the mean values, we see the results are not identical.\r\nnorm(means-meanMatrix)   \r\n%%\r\n% What's going on here?  I will show you that the difference depends on \r\n% the ordering of the elements by doing the same calculation with a\r\n% different program to produce the mean.  In this program, we make sure the\r\n% data are sorted before we calculate the result.\r\n%%\r\ndbtype sortmean\r\n%%\r\n% Do the calculation with |sortmean| with loops and with |accumarray| and\r\n% compare results.\r\nfor i=1:maxnum\r\n    for ii=1:maxnum\r\n        meanMatrix(i,ii) = sortmean(observed(arr1==i & arr2 ==ii));\r\n    end\r\nend\r\n\r\nmeans = accumarray([arr1(:),arr2(:)], observed(:), [maxnum maxnum], @sortmean, 0);\r\n%% \r\n% We can see now that if we tightly control the order of the calculations\r\n% for the mean value, we get the same answers, though the loop version\r\n% still takes longer.\r\nequalmeans = isequal(means,meanMatrix)   \r\n%% Conclusion\r\n% Once again, there are many ways to reach the same outcome in MATLAB, with\r\n% different characteristics in terms of readability, performance, etc.\r\n% Which of the solutions used here would you use, or do you have some other\r\n% favorite techniques?  Post them <?p=68#respond here>.\r\n\r\n##### SOURCE END ##### 61369d25625d4b64a7fc7d26a145dfbf\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      In the course of many applications, we sometimes need to tally up counts for different purposes, e.g., investigating the distribution\r\n         of some data or equalizing gray tones in... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2006\/12\/01\/let-me-count-the-ways\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,15,12],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/68"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=68"}],"version-history":[{"count":0,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/68\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=68"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=68"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=68"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}