Fast Local Sums Example

From MATLAB Techniques for Image Processing by Steve Eddins.

A common image processing operation is computing local sums. That is, for each pixel, compute the sum of that pixel and its neigbhors.

Here's an example:

A = magic(7)
A =

    30    39    48     1    10    19    28
    38    47     7     9    18    27    29
    46     6     8    17    26    35    37
     5    14    16    25    34    36    45
    13    15    24    33    42    44     4
    21    23    32    41    43     3    12
    22    31    40    49     2    11    20

The 3-by-3 local sum of the (2,3) element of A can be computed this way:

submatrix = A(1:3,2:4)
submatrix =

    39    48     1
    47     7     9
     6     8    17

local_sum_2_3 = sum(submatrix(:))
local_sum_2_3 =

   182

Computing an N-by-N local sum would seem to require N^2 - 1 additions for each location in the matrix. However, there are a couple of tricks that can be used to speed up the computation.

The first trick is that an N-by-N local sum can be computed by first summing along one dimension, giving a vector, and then summing the vector. Here's what the 3-by-3 local sum example from above looks like using this method:

column_sums = sum(submatrix,1)
column_sums =

    92    63    27

local_sum_2_3 = sum(column_sums)
local_sum_2_3 =

   182

To compute N-by-N local sums for the entire matrix, you could first compute N-point column sums for each matrix location, and then you could compute N-point row sums of the result. That gives you an average of 2N additions per matrix element, instead of N^2-1. (For those of you who know a little signal processing theory, this is basically the computational advantage of a separable filter.)

The second trick uses cumsum to reduce the computational cost of computing the column and row sums. Let's explore this trick by looking at a single row of A, zero-padded on the left and the right:

row = [0, 0, 0, A(1,:), 0, 0]
row =

     0     0     0    30    39    48     1    10    19    28     0     0

Now use cumsum to compute its cumulative sum:

c = cumsum(row)
c =

     0     0     0    30    69   117   118   128   147   175   175   175

Next, form a new vector that is [c(4)-c(1), c(5)-c(2), ..., c(end)-c(end-3)].

c_local_sums = c(4:end) - c(1:end-3)
c_local_sums =

    30    69   117    88    59    30    57    47    28

Notice that c_local_sum(3) is the sum of the first three elements of A(1,:). Similarly, c_local_sum(4) is the sum of second, third, and fourth elements of A(1,:). So this procedure gives us the local sums of a vector.

The significance of this result is that it required on average only a single addition (the call to cumsum) and a single subtraction per vector element. Most importantly, the number of floating-point operations per element in this procedure is independent of N, the size of the local sum window.

Putting these two tricks together, we can compute two-dimensional local sums by first computing column sums and then computing row sums. The column and row sums can be computed the cumsum trick. The overall average computational cost per matrix element is thereby reduced to 2 additions and 2 subtractions. Remarkably, this cost is independent of N.