# Fast local sums

The Image Processing Toolbox function normxcorr2 computes the two-dimensional normalized cross correlation. One of the denominator terms in this computation is a matrix containing local sums of the input image.

To see what this is, let's look at an example:

A = magic(7)
A =

30    39    48     1    10    19    28
38    47     7     9    18    27    29
46     6     8    17    26    35    37
5    14    16    25    34    36    45
13    15    24    33    42    44     4
21    23    32    41    43     3    12
22    31    40    49     2    11    20



The 3-by-3 local sum of the (2,3) element of A can be computed this way:

submatrix = A(1:3, 2:4);
local_sum_2_3 = sum(submatrix(:))
local_sum_2_3 =

182



Computing an N-by-N local sum would seem to require N^2 - 1 additions for each location in the matrix. Function normxcorr2, though, contains code contributed by Eli Horn that uses two tricks to speed up the computation.

The first trick is that an N-by-N local sum can be computed by first summing along one dimension, giving a vector, and then summing the vector. Here's what the 3-by-3 local sum example from above looks like using this method:

column_sums = sum(submatrix, 1)
column_sums =

92    63    27


local_sum_2_3 = sum(column_sums)
local_sum_2_3 =

182



To compute N-by-N local sums for the entire matrix, you could first compute N-point column sums for each matrix location, and then you could compute N-point row sums of the result. That gives you an average of 2N additions per matrix element, instead of N^2-1.

The second trick uses cumsum to reduce the computational cost of computing the column and row sums. Let's explore this trick by looking at a single row of A, zero-padded on the left and the right:

row = [0, 0, 0, A(1,:), 0, 0]
row =

0     0     0    30    39    48     1    10    19    28     0     0



Now use cumsum to compute its cumulative sum:

c = cumsum(row)
c =

0     0     0    30    69   117   118   128   147   175   175   175



Next, form a new vector that is [c(4)-c(1), c(5)-c(2), ..., c(end)-c(end-3)].

c_local_sums = c(4:end) - c(1:end-3)
c_local_sums =

30    69   117    88    59    30    57    47    28



Notice that c_local_sum(3) is the sum of the first three elements of A(1,:). Similarly, c_local_sum(4) is the sum of second, third, and fourth elements of A(1,:). So this procedure gives us the local sums of a vector.

The significance of this result is that it required on average only a single addition (the call to cumsum) and a single subtraction per vector element. Most importantly, the number of floating-point operations per element in this procedure is independent of N, the size of the local sum window.

Putting these two tricks together, we can compute two-dimensional local sums by first computing column sums and then computing row sums. The column and row sums can be computed using the cumsum trick. The overall average computational cost per matrix element is thereby reduced to 2 additions and 2 subtractions. Remarkably, this cost is independent of N.

If you want to see the particular implementation of this idea in normxcorr2, take a look at the subfunction local_sum.

Published with MATLAB® 7.2

|