Fast local sums
Note
See the 09-Jan-2023 post for updated information about this topic, including a discussion of integral images and integral box filtering.
The Image Processing Toolbox function normxcorr2 computes the two-dimensional normalized cross correlation. One of the denominator terms in this computation is a matrix containing local sums of the input image.
To see what this is, let's look at an example:
A = magic(7)
A = 30 39 48 1 10 19 28 38 47 7 9 18 27 29 46 6 8 17 26 35 37 5 14 16 25 34 36 45 13 15 24 33 42 44 4 21 23 32 41 43 3 12 22 31 40 49 2 11 20
The 3-by-3 local sum of the (2,3) element of A can be computed this way:
submatrix = A(1:3, 2:4); local_sum_2_3 = sum(submatrix(:))
local_sum_2_3 = 182
Computing an N-by-N local sum would seem to require N^2 - 1 additions for each location in the matrix. Function normxcorr2, though, contains code contributed by Eli Horn that uses two tricks to speed up the computation.
The first trick is that an N-by-N local sum can be computed by first summing along one dimension, giving a vector, and then summing the vector. Here's what the 3-by-3 local sum example from above looks like using this method:
column_sums = sum(submatrix, 1)
column_sums = 92 63 27
local_sum_2_3 = sum(column_sums)
local_sum_2_3 = 182
To compute N-by-N local sums for the entire matrix, you could first compute N-point column sums for each matrix location, and then you could compute N-point row sums of the result. That gives you an average of 2N additions per matrix element, instead of N^2-1.
The second trick uses cumsum to reduce the computational cost of computing the column and row sums. Let's explore this trick by looking at a single row of A, zero-padded on the left and the right:
row = [0, 0, 0, A(1,:), 0, 0]
row = 0 0 0 30 39 48 1 10 19 28 0 0
Now use cumsum to compute its cumulative sum:
c = cumsum(row)
c = 0 0 0 30 69 117 118 128 147 175 175 175
Next, form a new vector that is [c(4)-c(1), c(5)-c(2), ..., c(end)-c(end-3)].
c_local_sums = c(4:end) - c(1:end-3)
c_local_sums = 30 69 117 88 59 30 57 47 28
Notice that c_local_sum(3) is the sum of the first three elements of A(1,:). Similarly, c_local_sum(4) is the sum of second, third, and fourth elements of A(1,:). So this procedure gives us the local sums of a vector.
The significance of this result is that it required on average only a single addition (the call to cumsum) and a single subtraction per vector element. Most importantly, the number of floating-point operations per element in this procedure is independent of N, the size of the local sum window.
Putting these two tricks together, we can compute two-dimensional local sums by first computing column sums and then computing row sums. The column and row sums can be computed using the cumsum trick. The overall average computational cost per matrix element is thereby reduced to 2 additions and 2 subtractions. Remarkably, this cost is independent of N.
If you want to see the particular implementation of this idea in normxcorr2, take a look at the subfunction local_sum.
댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.