{"id":56,"date":"2006-05-02T08:00:48","date_gmt":"2006-05-02T12:00:48","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/?p=56"},"modified":"2023-01-09T16:08:22","modified_gmt":"2023-01-09T21:08:22","slug":"fast-local-sums","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2006\/05\/02\/fast-local-sums\/","title":{"rendered":"Fast local sums"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n\r\n<div class=\"alert alert-info\">\r\n<span class=\"alert_icon icon-alert-info-reverse\"><\/span>\r\n<p class=\"alert_heading\"><strong>Note<\/strong><\/p>\r\n<p>See the 09-Jan-2023 post for updated information about this topic, including a discussion of integral images and integral box filtering.<\/p>\r\n<\/div>\r\n\r\n   <p>The Image Processing Toolbox function <tt>normxcorr2<\/tt> computes the two-dimensional normalized cross correlation.  One of the denominator terms in this computation is a matrix\r\n      containing local sums of the input image.\r\n   <\/p>\r\n   <p>To see what this is, let's look at an example:<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">A = magic(7)<\/pre><pre style=\"font-style:oblique\">\r\nA =\r\n\r\n    30    39    48     1    10    19    28\r\n    38    47     7     9    18    27    29\r\n    46     6     8    17    26    35    37\r\n     5    14    16    25    34    36    45\r\n    13    15    24    33    42    44     4\r\n    21    23    32    41    43     3    12\r\n    22    31    40    49     2    11    20\r\n\r\n<\/pre><p>The 3-by-3 local sum of the (2,3) element of A can be computed this way:<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">submatrix = A(1:3, 2:4);\r\nlocal_sum_2_3 = sum(submatrix(:))<\/pre><pre style=\"font-style:oblique\">\r\nlocal_sum_2_3 =\r\n\r\n   182\r\n\r\n<\/pre><p>Computing an N-by-N local sum would seem to require N^2 - 1 additions for each location in the matrix.  Function <tt>normxcorr2<\/tt>, though, contains code contributed by Eli Horn that uses two tricks to speed up the computation.\r\n   <\/p>\r\n   <p>The first trick is that an N-by-N local sum can be computed by first summing along one dimension, giving a vector, and then\r\n      summing the vector.  Here's what the 3-by-3 local sum example from above looks like using this method:\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">column_sums = sum(submatrix, 1)<\/pre><pre style=\"font-style:oblique\">\r\ncolumn_sums =\r\n\r\n    92    63    27\r\n\r\n<\/pre><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">local_sum_2_3 = sum(column_sums)<\/pre><pre style=\"font-style:oblique\">\r\nlocal_sum_2_3 =\r\n\r\n   182\r\n\r\n<\/pre><p>To compute N-by-N local sums for the entire matrix, you could first compute N-point column sums for each matrix location,\r\n      and then you could compute N-point row sums of the result.  That gives you an average of 2N additions per matrix element,\r\n      instead of N^2-1.\r\n   <\/p>\r\n   <p>The second trick uses <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/cumsum.html#1000123\"><tt>cumsum<\/tt><\/a> to reduce the computational cost of computing the column and row sums.  Let's explore this trick by looking at a single row\r\n      of A, zero-padded on the left and the right:\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">row = [0, 0, 0, A(1,:), 0, 0]<\/pre><pre style=\"font-style:oblique\">\r\nrow =\r\n\r\n     0     0     0    30    39    48     1    10    19    28     0     0\r\n\r\n<\/pre><p>Now use <tt>cumsum<\/tt> to compute its cumulative sum:\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">c = cumsum(row)<\/pre><pre style=\"font-style:oblique\">\r\nc =\r\n\r\n     0     0     0    30    69   117   118   128   147   175   175   175\r\n\r\n<\/pre><p>Next, form a new vector that is [c(4)-c(1), c(5)-c(2), ..., c(end)-c(end-3)].<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">c_local_sums = c(4:end) - c(1:end-3)<\/pre><pre style=\"font-style:oblique\">\r\nc_local_sums =\r\n\r\n    30    69   117    88    59    30    57    47    28\r\n\r\n<\/pre><p>Notice that c_local_sum(3) is the sum of the first three elements of A(1,:). Similarly, c_local_sum(4) is the sum of second,\r\n      third, and fourth elements of A(1,:). So this procedure gives us the local sums of a vector.\r\n   <\/p>\r\n   <p>The significance of this result is that it required on average only a single addition (the call to <tt>cumsum<\/tt>) and a single subtraction per vector element.  Most importantly, the number of floating-point operations per element in this\r\n      procedure is <b>independent of N, the size of the local sum window<\/b>.\r\n   <\/p>\r\n   <p>Putting these two tricks together, we can compute two-dimensional local sums by first computing column sums and then computing\r\n      row sums.  The column and row sums can be computed using the <tt>cumsum<\/tt> trick.  The overall average computational cost per matrix element is thereby reduced to 2 additions and 2 subtractions. \r\n      Remarkably, this cost is independent of N.\r\n   <\/p>\r\n   <p>If you want to see the particular implementation of this idea in <tt>normxcorr2<\/tt>, take a look at the subfunction <tt>local_sum<\/tt>.\r\n   <\/p>\r\n <script language=\"JavaScript\"> \r\n<!--\r\n    function grabCode_56() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='56 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 56';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Steve Eddins';\r\n        copyright = 'Copyright 2006 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n-->\r\n      <\/script>\r\n<noscript>\r\n<em>A JavaScript-enabled browser is required to use the \"Get the MATLAB code\" link.<\/em>\r\n<\/noscript>\r\n<p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_56()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code<\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.2<br><\/p>\r\n<\/div>\r\n<!--\r\n56 ##### SOURCE BEGIN #####\r\n%% Fast local sums\r\n% The Image Processing Toolbox function <https:\/\/www.mathworks.com\/help\/images\/index.htmlnormxcorr2.html |normxcorr2|> \r\n% computes the\r\n% two-dimensional normalized cross correlation.  One of the denominator\r\n% terms in this computation is a matrix containing local sums of the input\r\n% image.\r\n%\r\n% To see what this is, let's look at an example:\r\n\r\nA = magic(7)\r\n\r\n%%\r\n% The 3-by-3 local sum of the (2,3) element of A can be computed this way:\r\n\r\nsubmatrix = A(1:3, 2:4);\r\nlocal_sum_2_3 = sum(submatrix(:))\r\n\r\n%%\r\n% Computing an N-by-N local sum would seem to require N^2 - 1 additions for\r\n% each location in the matrix.  Function |normxcorr2|, though, contains\r\n% code contributed by Eli Horn that uses two tricks to speed up the\r\n% computation.\r\n%\r\n% The first trick is that an N-by-N local sum can be computed by first\r\n% summing along one dimension, giving a vector, and then summing the\r\n% vector.  Here's what the 3-by-3 local sum example from above looks like \r\n% using this method:\r\n\r\ncolumn_sums = sum(submatrix, 1)\r\n\r\n%%\r\n\r\nlocal_sum_2_3 = sum(column_sums)\r\n\r\n%%\r\n% To compute N-by-N local sums for the entire matrix, you could first\r\n% compute N-point column sums for each matrix location, and then you could\r\n% compute N-point row sums of the result.  That gives you an average of 2N\r\n% additions per matrix element, instead of N^2-1.\r\n%\r\n% The second trick uses <https:\/\/www.mathworks.com\/help\/matlab\/ref\/cumsum.html#1000123 |cumsum|> \r\n% to reduce the computational cost of computing \r\n% the column and row sums.  Let's explore this trick by looking at a single\r\n% row of A, zero-padded on the left and the right:\r\n\r\nrow = [0, 0, 0, A(1,:), 0, 0]\r\n\r\n%%\r\n% Now use |cumsum| to compute its cumulative sum:\r\n\r\nc = cumsum(row)\r\n\r\n%%\r\n% Next, form a new vector that is [c(4)-c(1), c(5)-c(2), ...,\r\n% c(end)-c(end-3)].\r\n\r\nc_local_sums = c(4:end) - c(1:end-3)\r\n\r\n%%\r\n% Notice that c_local_sum(3) is the sum of the first three elements of\r\n% A(1,:). Similarly, c_local_sum(4) is the sum of second, third, and fourth\r\n% elements of A(1,:). So this procedure gives us the local sums of a\r\n% vector.\r\n%\r\n% The significance of this result is that it required on average only a\r\n% single addition (the call to |cumsum|) and a single subtraction per vector\r\n% element.  Most importantly, the number of floating-point operations per\r\n% element in this procedure is *independent of N, the size of the local sum\r\n% window*.\r\n%\r\n% Putting these two tricks together, we can compute two-dimensional local\r\n% sums by first computing column sums and then computing row sums.  The\r\n% column and row sums can be computed the |cumsu| trick.  The overall\r\n% average computational cost per matrix element is thereby reduced to 2\r\n% additions and 2 subtractions.  Remarkably, this cost is independent of\r\n% N.\r\n%\r\n% If you want to see the particular implementation of this idea in\r\n% |normxcorr2|, take a look at the subfunction |local_sum|.\r\n\r\n\r\n##### SOURCE END ##### 56\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n\r\n\r\n\r\nNote\r\nSee the 09-Jan-2023 post for updated information about this topic, including a discussion of integral images and integral box filtering.\r\n\r\n\r\n   The Image Processing Toolbox function... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2006\/05\/02\/fast-local-sums\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[128,54,126],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/56"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=56"}],"version-history":[{"count":3,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/56\/revisions"}],"predecessor-version":[{"id":6749,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/56\/revisions\/6749"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=56"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=56"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=56"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}