# Fast local sums

**Note**

See the 09-Jan-2023 post for updated information about this topic, including a discussion of integral images and integral box filtering.

The Image Processing Toolbox function `normxcorr2` computes the two-dimensional normalized cross correlation. One of the denominator terms in this computation is a matrix
containing local sums of the input image.

To see what this is, let's look at an example:

A = magic(7)

A = 30 39 48 1 10 19 28 38 47 7 9 18 27 29 46 6 8 17 26 35 37 5 14 16 25 34 36 45 13 15 24 33 42 44 4 21 23 32 41 43 3 12 22 31 40 49 2 11 20

The 3-by-3 local sum of the (2,3) element of A can be computed this way:

submatrix = A(1:3, 2:4); local_sum_2_3 = sum(submatrix(:))

local_sum_2_3 = 182

Computing an N-by-N local sum would seem to require N^2 - 1 additions for each location in the matrix. Function `normxcorr2`, though, contains code contributed by Eli Horn that uses two tricks to speed up the computation.

The first trick is that an N-by-N local sum can be computed by first summing along one dimension, giving a vector, and then summing the vector. Here's what the 3-by-3 local sum example from above looks like using this method:

column_sums = sum(submatrix, 1)

column_sums = 92 63 27

local_sum_2_3 = sum(column_sums)

local_sum_2_3 = 182

To compute N-by-N local sums for the entire matrix, you could first compute N-point column sums for each matrix location, and then you could compute N-point row sums of the result. That gives you an average of 2N additions per matrix element, instead of N^2-1.

The second trick uses `cumsum` to reduce the computational cost of computing the column and row sums. Let's explore this trick by looking at a single row
of A, zero-padded on the left and the right:

row = [0, 0, 0, A(1,:), 0, 0]

row = 0 0 0 30 39 48 1 10 19 28 0 0

Now use `cumsum` to compute its cumulative sum:

c = cumsum(row)

c = 0 0 0 30 69 117 118 128 147 175 175 175

Next, form a new vector that is [c(4)-c(1), c(5)-c(2), ..., c(end)-c(end-3)].

c_local_sums = c(4:end) - c(1:end-3)

c_local_sums = 30 69 117 88 59 30 57 47 28

Notice that c_local_sum(3) is the sum of the first three elements of A(1,:). Similarly, c_local_sum(4) is the sum of second, third, and fourth elements of A(1,:). So this procedure gives us the local sums of a vector.

The significance of this result is that it required on average only a single addition (the call to `cumsum`) and a single subtraction per vector element. Most importantly, the number of floating-point operations per element in this
procedure is **independent of N, the size of the local sum window**.

Putting these two tricks together, we can compute two-dimensional local sums by first computing column sums and then computing
row sums. The column and row sums can be computed using the `cumsum` trick. The overall average computational cost per matrix element is thereby reduced to 2 additions and 2 subtractions.
Remarkably, this cost is independent of N.

If you want to see the particular implementation of this idea in `normxcorr2`, take a look at the subfunction `local_sum`.

## Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.