The conv function and implementation tradeoffs12

Posted by Steve Eddins,

A friend from my grad school days (back in the previous millenium) is an electrical engineering professor. Students in his class recently asked him why the conv function is not implemented using FFTs.

I'm not on the team responsible for conv, but I wrote back with my thoughts, and I thought I would share them here as well.

Let's review the basics. Using the typical convolution formula to compute the one-dimensional convolution of a P-element sequence A with Q-element sequence B has a computational complexity of . However, the discrete Fourier transform (DFT) can be used to implement convolution as follows:

1. Compute the L-point DFT of A, where .

2. Compute the L-point DFT of B.

3. Multiply the two DFTs.

4. Compute the inverse DFT to get the convolution.

Here's a simple MATLAB function for computing convolution using the Fast Fourier Transform (FFT), which is simply a fast algorithm for computing the DFT.

type conv_fft
function c = conv_fft(a, b)

P = numel(a);
Q = numel(b);
L = P + Q - 1;
K = 2^nextpow2(L);

c = ifft(fft(a, K) .* fft(b, K));
c = c(1:L);



Note that the code uses the next power-of-two greater than or equal to L, although this is not strictly necessary. The fft function operates in time whether or not L is a power of two.

The overall computational complexity of these steps is . For P and Q sufficiently large, then, using the DFT to implement convolution is a computational win.

So why don't we do it?

There are several technical factors. Let's look at speed, exact computation, and memory.

Speed

One factor is that DFT-based computation is not always faster. Let's do an experiment where we compute the convolution of a 1000-element sequence with another sequence of varying length. (Get the timeit function from the MATLAB Central File Exchange.)

x = rand(1, 1000);
nn = 25:25:1000;

t_normal = zeros(size(nn));
t_fft = zeros(size(nn));

for k = 1:numel(nn)
n = nn(k);
y = rand(1, n);
t_normal(k) = timeit(@() conv(x, y));
t_fft(k) = timeit(@() conv_fft(x, y));
end
plot(nn, t_normal, nn, t_fft)
legend({'Normal computation', 'FFT-based computation'})

For sequences y shorter than a certain length, called the cross-over point, it's quicker to use the normal computation.

Exact computation

A second consideration is whether the computation is subject to floating-point round-off errors and to what degree. There are applications, for example, where the convolution of integer-valued sequences is computed. For such applications, a user would reasonably expect the output sequence to be integer-valued as well.

To illustrate, here's a simple function that computes n-th order binomial coefficients using convolution:

type binom
function c = binom(n)

c = 1;
for k = 1:n
c = conv(c, [1 1]);
end


binom(7)
ans =

1     7    21    35    35    21     7     1



I wrote a variation called binom_fft that is the same as binom except that it calls conv_fft.

format long
binom_fft(7)
ans =

Columns 1 through 4

1.000000000000000   6.999999999999998  21.000000000000000  35.000000000000000

Columns 5 through 8

35.000000000000000  21.000000000000000   7.000000000000000   0.999999999999996



Whoops! What's going on? The answer is that the FFT-based implementation of convolution is subject to floating-point round-off error.

I imagine that most MATLAB users would consider the output of binom_fft to be wrong.

Memory

The last technical consideration I want to mention is memory. Because of the padding and complex arithmetic involved in the FFT computations, the FFT-based convolution implementation requires a lot more memory than the normal computation. This may not often be a problem for one-dimensional computations, but it can be a big deal for multidimensional computations.

Final thoughts

The technical considerations listed above can all be solved in principle. The implementation could switch to using the normal method for short or integer-valued sequences, for example. And there are FFT-based techniques such as overlap-and-add to reduce the memory load.

But the problem can get quite complicated. Testing floating-point values to see if they are integers, for example, can be slow. Also, I suspect (but have not checked) that a multithreaded implemention of the normal computation will take advantage of multiple cores better than a multithreaded FFT-based method. That would change the cross-over point between the two methods. And the exact cross-over point will vary from computer to computer, making it likely that our implementation would be somewhat slower for some people for some problems.

Overall, my inclination would be to provide FFT-based convolution as a separate function rather than reimplementing conv. And that's what we've done. See the function fftfilt in the Signal Processing Toolbox.

Do you disagree with this approach? Post your comment.

Get the MATLAB code

Published with MATLAB® 7.9

Pete replied on : 1 of 12

Hi,

Interesting as always!

I have another convolution-related point / query, which is hopefully sufficiently relevant. As I understand it, imfilter takes advantage of zeros in the filter kernel (and only applies multiplications for nonzero values), whereas conv2 does not appear to. Consequently, using imfilter with a large filter containing lots of zeros can be very fast, whereas it can be mighty slow using conv2. This is encountered a lot when computing the stationary / undecimated wavelet transform. I believe the Wavelet Toolbox uses conv2.

I normally use imfilter when applying such filters, but this is not always ideal. Sometimes, I’d like to use the conv2(…, ‘valid’) option when filtering an image in which I’m not too worried about keeping the boundaries (or I have applied some other, less standard, type of boundary padding myself). However, imfilter insists in applying its own boundary padding – which slows things down when the image is very large.

So in some special cases, it seems that imfilter is much better for speed in terms of the convolution itself, whereas conv2 can (I think) avoid the need for creating another array larger than the original image. Is it possible that one or other of the functions could be modified, so that this tradeoff is no longer necessary when choosing which function to use? I’m using R2009a, so don’t know if this has been addressed in R2009b.

Jeremy replied on : 2 of 12

Another reason is that implementing the DFT causes the result to be circular convolution and not linear convolution. This can be corrected by overlap-add, but this increases complexity. However, the big reason is accuracy, as you mentioned.

Also, going through the DFT you’re assuming a periodic boundary condition, whereas conv assumes zeros outside the boundaries. This can be significant, especially when using large convolution kernels.

I’m all for giving the user the option. It would be bad if you couldn’t control which method is being used.

Steve replied on : 4 of 12

To all—After almost four years of writing this blog, I still have no ability to predict which topics will draw reader comment! ;-)

Peter—Most the comments people attempt to post here are desperate queries for help with homework or a graduate school research project (I don’t let those through), so yours is admirably on topic. ;-)

Your suggestion is a good one. We can look into making some changes so that imfilter does not need to pad. Just to set your expectations, though … the feature change deadline for R2010a was some time ago, and we’re already working on R2010b.

Jeremy and Chris—I tend to think of circular convolution and periodic boundary conditions as being basically the same issue, and it is not a reason to avoid using the DFT to compute convolution. Here’s the theory, for any reader who might be interested. (Note that I’m using mathematical notation here, not MATLAB syntax.)

If x[n] is nonzero only for 0 < = n <= P-1, and h[n] is nonzero only for 0 <= n <= Q-1, then the convolution y[n] = (x*h)[n] is nonzero only for 0 <= n <= L-1, where L = P + Q - 1. Further, y[n] for 0 <= n <= L-1 is exactly equal to the L-point circular convolution of x and h. So my conv_fft in this post produces exactly the same thing (except possibly for floating-point round-off effects) as conv.

Overlap-and-add and similar methods were invented for a different reason. Consider the case where you want to filter a signal using FFT-based convolution, but you never have the entire signal at once. For example, you are building a system to process an input audio stream. Overlap-and-add allows you to perform FFT-based convolution on the input signal in chunks.

Mark Andrews replied on : 5 of 12

Just some minor points, but I think

K = 2^nextpow2(L);


should be

K = nextpow2(L);


otherwise K is enormous :-)

Are you sure that the complexity of the FFT is O(L log L) regardless of L? I thought this was a lower bound for L a power of 2.

Mark Andrews replied on : 6 of 12

Well, it would help if I read the documentation, wouldn’t it? I thought nextpow2 returned the number 2^n, not the exponent n. My apologies.

Steve replied on : 7 of 12

Mark—No problem about nextpow2. I’ve always thought that function name was confusing, and I usually look at the help for it whenever I use it.

About FFT complexity – yes, MATLAB uses O(L log L) algorithms for all sizes. Running time is fastest for powers of two (the constant of proportionality is smallest), but time scales with (L log L) for composite and even for prime L.

If you look towards the end of the fftfilt program, you will see that there’s a check to see if the inputs are real, an if so, to remove the imaginary part of the output, since it is only round-off. One of many things to do so users don’t get surprised.

–Loren

Francisco replied on : 9 of 12

Does caching play a role in the speed of conv vs. fft_conv? It would seem that a clever order of multiplications would increase cache locality greatly in the conv function.

If not, do you even need to take cache locality into consideration for MATLAB, or is it just “premature optimization”?

Steve replied on : 10 of 12

Francisco—Cache locality is always important, and changes in cache locality can certain change the location of the cross-over point in comparing the two algorithms. It won’t change the overall scaling, though.

Premature optimization is optimizating code before you know it is producing the correct answer, or before you know the code is actually a performance bottleneck.

Brad replied on : 11 of 12

one practical issue you might want to point users to is that when using conv if the data is not sampled at a high enough resolution then huge rounding errors can occur due to the numerical intergraton used in computing conv. I havent tested to see if the fft method does the same but it probably does. I usually interpolate the data by a factor of at least 4 before performing the conv function.

Steve replied on : 12 of 12

Brad—You and I are thinking about the conv function very differently. If I understand your comment, you are thinking of conv as an approximation to continuous-time convolution. I think of conv as implementing discrete-time convolution. It does this exactly; there are no approximations involved (other than the normal floating-point round-off issues). I’ve used discrete-time convolution about a gazillion times for signal and image processing in my work, and I don’t think I’ve ever used it to simulate or approximate continuous-time convolution. The FFT-based method here is mathematically equivalent to discrete-time convolution, so there are no approximation issues there.

These postings are the author's and don't necessarily represent the opinions of MathWorks.