{"id":297,"date":"2009-11-03T14:56:58","date_gmt":"2009-11-03T19:56:58","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/2009\/11\/03\/the-conv-function-and-implementation-tradeoffs\/"},"modified":"2019-10-29T08:03:07","modified_gmt":"2019-10-29T12:03:07","slug":"the-conv-function-and-implementation-tradeoffs","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2009\/11\/03\/the-conv-function-and-implementation-tradeoffs\/","title":{"rendered":"The conv function and implementation tradeoffs"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <p>A friend from my grad school days (back in the previous millenium) is an electrical engineering professor.  Students in his\r\n      class recently asked him why the <tt>conv<\/tt> function is not implemented using FFTs.\r\n   <\/p>\r\n   <p>I'm not on the team responsible for <tt>conv<\/tt>, but I wrote back with my thoughts, and I thought I would share them here as well.\r\n   <\/p>\r\n   <p>Let's review the basics.  Using the typical convolution formula to compute the one-dimensional convolution of a P-element\r\n      sequence A with Q-element sequence B has a computational complexity of <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2009\/fft_based_conv_eq04674.png\"> .  However, the discrete Fourier transform (DFT) can be used to implement convolution as follows:\r\n   <\/p>\r\n   <p>1. Compute the L-point DFT of A, where <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2009\/fft_based_conv_eq40865.png\"> .\r\n   <\/p>\r\n   <p>2. Compute the L-point DFT of B.<\/p>\r\n   <p>3. Multiply the two DFTs.<\/p>\r\n   <p>4. Compute the inverse DFT to get the convolution.<\/p>\r\n   <p>Here's a simple MATLAB function for computing convolution using the Fast Fourier Transform (FFT), which is simply a fast algorithm\r\n      for computing the DFT.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">conv_fft<\/span><\/pre><pre style=\"font-style:oblique\">\r\nfunction c = conv_fft(a, b)\r\n\r\nP = numel(a);\r\nQ = numel(b);\r\nL = P + Q - 1;\r\nK = 2^nextpow2(L);\r\n\r\nc = ifft(fft(a, K) .* fft(b, K));\r\nc = c(1:L);\r\n\r\n<\/pre><p>Note that the code uses the next power-of-two greater than or equal to L, although this is not strictly necessary. The <tt>fft<\/tt> function operates in <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2009\/fft_based_conv_eq00311.png\">  time whether or not L is a power of two.\r\n   <\/p>\r\n   <p>The overall computational complexity of these steps is <img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2009\/fft_based_conv_eq47887.png\"> . For P and Q sufficiently large, then, using the DFT to implement convolution is a computational win.\r\n   <\/p>\r\n   <p>So why don't we do it?<\/p>\r\n   <p>There are several technical factors. Let's look at speed, exact computation, and memory.<\/p>\r\n   <p><b>Speed<\/b><\/p>\r\n   <p>One factor is that DFT-based computation is not <b>always<\/b> faster.  Let's do an experiment where we compute the convolution of a 1000-element sequence with another sequence of varying\r\n      length.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">x = rand(1, 1000);\r\nnn = 25:25:1000;\r\n\r\nt_normal = zeros(size(nn));\r\nt_fft = zeros(size(nn));\r\n\r\n<span style=\"color: #0000FF\">for<\/span> k = 1:numel(nn)\r\n    n = nn(k);\r\n    y = rand(1, n);\r\n    t_normal(k) = timeit(@() conv(x, y));\r\n    t_fft(k) = timeit(@() conv_fft(x, y));\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">plot(nn, t_normal, nn, t_fft)\r\nlegend({<span style=\"color: #A020F0\">'Normal computation'<\/span>, <span style=\"color: #A020F0\">'FFT-based computation'<\/span>})<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2009\/fft_based_conv_01.png\"> <p>For sequences <tt>y<\/tt> shorter than a certain length, called the <i>cross-over point<\/i>, it's quicker to use the normal computation.\r\n   <\/p>\r\n   <p><b>Exact computation<\/b><\/p>\r\n   <p>A second consideration is whether the computation is subject to floating-point round-off errors and to what degree.  There\r\n      are applications, for example, where the convolution of integer-valued sequences is computed.  For such applications, a user\r\n      would reasonably expect the output sequence to be integer-valued as well.\r\n   <\/p>\r\n   <p>To illustrate, here's a simple function that computes <a href=\"http:\/\/en.wikipedia.org\/wiki\/Binomial_coefficient\">n-th order binomial coefficients<\/a> using convolution:\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">binom<\/span><\/pre><pre style=\"font-style:oblique\">\r\nfunction c = binom(n)\r\n\r\nc = 1;\r\nfor k = 1:n\r\n    c = conv(c, [1 1]);\r\nend\r\n\r\n<\/pre><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">binom(7)<\/pre><pre style=\"font-style:oblique\">\r\nans =\r\n\r\n     1     7    21    35    35    21     7     1\r\n\r\n<\/pre><p>I wrote a variation called <tt>binom_fft<\/tt> that is the same as <tt>binom<\/tt> except that it calls <tt>conv_fft<\/tt>.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">format <span style=\"color: #A020F0\">long<\/span>\r\nbinom_fft(7)<\/pre><pre style=\"font-style:oblique\">\r\nans =\r\n\r\n  Columns 1 through 4\r\n\r\n   1.000000000000000   6.999999999999998  21.000000000000000  35.000000000000000\r\n\r\n  Columns 5 through 8\r\n\r\n  35.000000000000000  21.000000000000000   7.000000000000000   0.999999999999996\r\n\r\n<\/pre><p>Whoops! What's going on? The answer is that the FFT-based implementation of convolution is subject to floating-point round-off\r\n      error.\r\n   <\/p>\r\n   <p>I imagine that most MATLAB users would consider the output of <tt>binom_fft<\/tt> to be <b>wrong<\/b>.\r\n   <\/p>\r\n   <p><b>Memory<\/b><\/p>\r\n   <p>The last technical consideration I want to mention is memory.  Because of the padding and complex arithmetic involved in the\r\n      FFT computations, the FFT-based convolution implementation requires a lot more memory than the normal computation. This may\r\n      not often be a problem for one-dimensional computations, but it can be a big deal for multidimensional computations.\r\n   <\/p>\r\n   <p><b>Final thoughts<\/b><\/p>\r\n   <p>The technical considerations listed above can all be solved in principle.  The implementation could switch to using the normal\r\n      method for short or integer-valued sequences, for example. And there are FFT-based techniques such as <a href=\"http:\/\/en.wikipedia.org\/wiki\/Overlap-add_method\">overlap-and-add<\/a> to reduce the memory load.\r\n   <\/p>\r\n   <p>But the problem can get quite complicated.  Testing floating-point values to see if they are integers, for example, can be\r\n      slow.  Also, I suspect (but have not checked) that a multithreaded implemention of the normal computation will take advantage\r\n      of multiple cores better than a multithreaded FFT-based method. That would change the cross-over point between the two methods.\r\n      And the exact cross-over point will vary from computer to computer, making it likely that our implementation would be somewhat\r\n      slower for some people for some problems.\r\n   <\/p>\r\n   <p>Overall, my inclination would be to provide FFT-based convolution as a separate function rather than reimplementing <tt>conv<\/tt>.  And that's what we've done. See the function <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/signal\/fftfilt.html\"><tt>fftfilt<\/tt><\/a> in the Signal Processing Toolbox.\r\n   <\/p>\r\n   <p>Do you disagree with this approach?  Post your comment.<\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_2a1830f6070e4f9bb82cb33b602509a3() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='2a1830f6070e4f9bb82cb33b602509a3 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 2a1830f6070e4f9bb82cb33b602509a3';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = '';\r\n        copyright = 'Copyright 2009 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_2a1830f6070e4f9bb82cb33b602509a3()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.9<br><\/p>\r\n<\/div>\r\n<!--\r\n2a1830f6070e4f9bb82cb33b602509a3 ##### SOURCE BEGIN #####\r\n%%\r\n% A friend from my grad school days (back in the previous millenium) is an\r\n% electrical engineering professor.  Students in his class recently asked him\r\n% why the |conv| function is not implemented using FFTs.\r\n%\r\n% I'm not on the team responsible for |conv|, but I wrote back with my\r\n% thoughts, and I thought I would share them here as well.\r\n%\r\n% Let's review the basics.  Using the typical convolution formula to compute the\r\n% one-dimensional convolution of a P-element sequence A with Q-element sequence\r\n% B has a computational complexity of $O(PQ)$.  However, the discrete Fourier transform (DFT)\r\n% can be used to implement convolution as follows:\r\n%\r\n% 1. Compute the L-point DFT of A, where $L \\geq P + Q - 1$.\r\n%\r\n% 2. Compute the L-point DFT of B.\r\n% \r\n% 3. Multiply the two DFTs.\r\n%\r\n% 4. Compute the inverse DFT to get the convolution.\r\n%\r\n% Here's a simple MATLAB function for computing convolution using the Fast\r\n% Fourier Transform (FFT), which is simply a fast algorithm for computing the\r\n% DFT.\r\n\r\ntype conv_fft\r\n\r\n%%\r\n% Note that the code uses the next power-of-two greater than or equal to L,\r\n% although this is not strictly necessary. The |fft| function operates in $L\r\n% \\log L$ time whether or not L is a power of two.\r\n%\r\n% The overall computational complexity of these steps is $O(L\\log L)$. For P and\r\n% Q sufficiently large, then, using the DFT to implement convolution is a\r\n% computational win.\r\n%\r\n% So why don't we do it?\r\n%\r\n% There are several technical factors. Let's look at speed, exact computation,\r\n% and memory.\r\n\r\n%% \r\n% *Speed*\r\n%\r\n% One factor is that DFT-based computation is not *always* faster.  Let's do an\r\n% experiment where we compute the convolution of a 1000-element sequence with\r\n% another sequence of varying length. (Get the |timeit| function from the \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/18798-timeit-benchmarking-function \r\n% MATLAB Central File Exchange>.)\r\n\r\nx = rand(1, 1000);\r\nnn = 25:25:1000;\r\n\r\nt_normal = zeros(size(nn));\r\nt_fft = zeros(size(nn));\r\n\r\nfor k = 1:numel(nn)\r\n    n = nn(k);\r\n    y = rand(1, n);\r\n    t_normal(k) = timeit(@() conv(x, y));\r\n    t_fft(k) = timeit(@() conv_fft(x, y));\r\nend\r\n\r\n%%\r\nplot(nn, t_normal, nn, t_fft)\r\nlegend({'Normal computation', 'FFT-based computation'})\r\n\r\n%%\r\n% For sequences |y| shorter than a certain length, called the _cross-over\r\n% point_, it's quicker to use the normal computation.\r\n\r\n%% \r\n% *Exact computation*\r\n%\r\n% A second consideration is whether the computation is subject to floating-point\r\n% round-off errors and to what degree.  There are applications, for example,\r\n% where the convolution of integer-valued sequences is computed.  For such\r\n% applications, a user would reasonably expect the output sequence to be\r\n% integer-valued as well.\r\n%\r\n% To illustrate, here's a simple function that computes \r\n% <http:\/\/en.wikipedia.org\/wiki\/Binomial_coefficient n-th order binomial\r\n% coefficients> using convolution:\r\n\r\ntype binom\r\n\r\n%%\r\nbinom(7)\r\n\r\n%%\r\n% I wrote a variation called |binom_fft| that is the same as |binom| except that\r\n% it calls |conv_fft|.\r\n\r\nformat long\r\nbinom_fft(7)\r\n\r\n%%\r\n% Whoops! What's going on? The answer is that the FFT-based implementation of\r\n% convolution is subject to floating-point round-off error.\r\n%\r\n% I imagine that most MATLAB users would consider the output of |binom_fft| to\r\n% be *wrong*.\r\n\r\n%% \r\n% *Memory*\r\n%\r\n% The last technical consideration I want to mention is memory.  Because of the\r\n% padding and complex arithmetic involved in the FFT computations, the FFT-based\r\n% convolution implementation requires a lot more memory than the normal\r\n% computation. This may not often be a problem for one-dimensional computations,\r\n% but it can be a big deal for multidimensional computations.\r\n\r\n%% \r\n% *Final thoughts*\r\n%\r\n% The technical considerations listed above can all be solved in principle.  The\r\n% implementation could switch to using the normal method for short or\r\n% integer-valued sequences, for example. And there are FFT-based techniques such as\r\n% <http:\/\/en.wikipedia.org\/wiki\/Overlap-add_method overlap-and-add> to reduce\r\n% the memory load.  \r\n% \r\n% But the problem can get quite complicated.  Testing floating-point values to\r\n% see if they are integers, for example, can be slow.  Also, I suspect (but have\r\n% not checked) that a multithreaded implemention of the normal computation will\r\n% take advantage of multiple cores better than a multithreaded FFT-based method.\r\n% That would change the cross-over point between the two methods. And the exact\r\n% cross-over point will vary from computer to computer, making it likely that\r\n% our implementation would be somewhat slower for some people for some problems.\r\n%\r\n% Overall, my inclination would be to provide FFT-based convolution as a\r\n% separate function rather than reimplementing |conv|.  And that's what we've\r\n% done. See the function \r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/signal\/fftfilt.html \r\n% |fftfilt|> in the Signal Processing Toolbox.\r\n%\r\n% Do you disagree with this approach?  Post your comment.\r\n##### SOURCE END ##### 2a1830f6070e4f9bb82cb33b602509a3\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   A friend from my grad school days (back in the previous millenium) is an electrical engineering professor.  Students in his\r\n      class recently asked him why the conv function is not... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2009\/11\/03\/the-conv-function-and-implementation-tradeoffs\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[689,426,693,430,543,92,691,162,68,460,190,474,130],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/297"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=297"}],"version-history":[{"count":1,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/297\/revisions"}],"predecessor-version":[{"id":2636,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/297\/revisions\/2636"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=297"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=297"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=297"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}