{"id":1134,"date":"2014-09-16T15:13:11","date_gmt":"2014-09-16T20:13:11","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/?p=1134"},"modified":"2014-09-16T15:13:29","modified_gmt":"2014-09-16T20:13:29","slug":"fft-stories","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2014\/09\/16\/fft-stories\/","title":{"rendered":"FFT Stories"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>My last meeting of the day ended just a little while ago, and I was starting to think seriously about figuring out a blog post for this week. What to write about? That's when I happened to see that Cleve posted <a href=\"https:\/\/blogs.mathworks.com\/cleve\/2014\/09\/15\/fft-fast-finite-fourier-transform\/\">just yesterday<\/a> about the FFT. Ah ha! I'll write about that, too (in honor of National Blog About the FFT Week).<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#6d4a328f-e6ae-401b-aace-9011dd79681d\">To Recurse or Not?<\/a><\/li><li><a href=\"#b06e7b2b-fb29-4cf7-8fda-5a6863b21f1e\">Butterflies and Laws<\/a><\/li><li><a href=\"#027fc84f-f19a-4c9e-a8dc-90d52ccef03d\">The Slowest Way (Ever?) to Fill a Vector with NaNs<\/a><\/li><\/ul><\/div><h4>To Recurse or Not?<a name=\"6d4a328f-e6ae-401b-aace-9011dd79681d\"><\/a><\/h4><p>I have three little sort-of stories to tell, the first of which was inspired directly by Cleve's post. Cleve described a very compact, elegant implementation of the divide-and-conquer idea that's central to fast Fourier transform algorithms. Here's the key code fragment of Cleve's textbook function <tt>ffttx<\/tt>:<\/p><pre>    % Recursive divide and conquer\r\n    k = (0:n\/2-1)';\r\n    w = omega .^ k;\r\n    u = ffttx(x(1:2:n-1));\r\n    v = w.*ffttx(x(2:2:n));\r\n    y = [u+v; u-v];<\/pre><p>Function <tt>ffttx<\/tt> calls itself recursively, twice. Each recursive call is on half of the original input values. Then <tt>ffttx<\/tt> does a litte extra computation at the end to produce the final result.<\/p><p>Like Cleve, I really love this little bit of code. But it makes me think back to all the FFT code that I saw back when I was first learning digital signal processing. (That would be the mid- to late-1980s.) Here's a typical textbook sample from the 1989 edition of <i>Discrete-Time Signal Processing<\/i> by Oppenheim and Schafer):<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2014\/fortran-fft.jpg\" alt=\"\"> <\/p><p>This code is typical for the time. The focus is on doing the computation in-place and using as few multiplies as possible. One thing you don't see here: explicit recursion! The function <tt>DITFFT<\/tt> does not call itself, either directly or indirectly. That was typical for the time. No one wanted to use the memory to make copies of the two half-length arrays, and no one wanted to incur the speed penalty of many recursive function calls. Another typical thing you see in this code is something charmingly called bit-reversed sorting, but that's a topic for another day (maybe).<\/p><p>Anyway, my point is that modern FFT implementations built for today's computer architectures often do make use of explicit recursion. The additional cost of dividing the input array into two or more smaller copies is more than offset by the gains of more cache-friendly execution as the divide-and-conquer problems get smaller and smaller.<\/p><h4>Butterflies and Laws<a name=\"b06e7b2b-fb29-4cf7-8fda-5a6863b21f1e\"><\/a><\/h4><p>My second little story was inspired by one line of code that Cleve posted:<\/p><pre>y = [u+v; u-v];<\/pre><p>In the case where <tt>u<\/tt> is <tt>x(1)<\/tt> and <tt>v<\/tt> is <tt>x(2)<\/tt>, then the computation above is just the 2-point DFT of <tt>x<\/tt>.<\/p><p>A signal flow diagram for it looks like this (also from <i>Discrete-Time Signal Processing<\/i>):<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2014\/butterfly.jpg\" alt=\"\"> <\/p><p>Digital signal processing experts have for decades referred to this little graph as a \"butterfly diagram\" because of its appearance.<\/p><p>Well, to make a little extra money for myself in grad school, I wrote a couple of chapters of the solutions manual for <i>Discrete-Time Signal Processing<\/i>. One of the chapters covered FFTs. This effort left me completely fed up with drawing butterfly diagrams and counting multiplies. One day I posted the following chart outside my cubicle:<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2014\/bill-becomes-a-law.jpg\" alt=\"\"> <\/p><p>Yes, digital signal processing people even call that mess above a \"butterfly diagram.\" We're a weird bunch.<\/p><p>That butterfly diagram indirectly leads me to my third little FFT story.<\/p><h4>The Slowest Way (Ever?) to Fill a Vector with NaNs<a name=\"027fc84f-f19a-4c9e-a8dc-90d52ccef03d\"><\/a><\/h4><p>About ten years ago, a tech support case was forwarded to me. A customer complained that MATLAB was very slow in computing the FFT of one specific vector. Other vectors of the same length would go really fast, but not this one.<\/p><p>So I loaded the customer's MAT-file containing the vector and started running FFTs on it. To my surprise, it really was slow! I scratched my head. Then I looked at the output values, and I was even more surprised to see that they were all NaNs! Strange! What was the point? And was this related to the speed issue?<\/p><p>I looked at the long input vector provided by the customer, and I found that just one of the input values was NaN.<\/p><p>Light bulb!<\/p><p>Take a look again at that complicated butterfly diagram above. Notice that every output value depends on every input value. That's true in general for FFT computations of any length. With the way NaNs propagate through arithmetic operations, that implies that just one NaN in the input vector means that the output of <tt>fft<\/tt> will always be completely filled with NaNs.<\/p><p>That was related to the speed issue, as it turned out. NaN calculations often are executed in a less-well optimized part of the CPU, or even in software. Lots and lots of NaN calculations bogged down this particular customer's problem.<\/p><p>FFTs are a never-ending source of fun and mystery!<\/p><p>I have written quite a few blog posts about Fourier transform theory and algorithms. You can see them all in the <a href=\"https:\/\/blogs.mathworks.com\/steve\/category\/fourier-transforms\">Fourier transforms category<\/a>.<\/p><p>PS. If you have ideas for even more fun ways to create a vector of NaNs, be sure to add them to the comments on this post.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_58731b74b95d47d8a34ae136ce89f695() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='58731b74b95d47d8a34ae136ce89f695 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 58731b74b95d47d8a34ae136ce89f695';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2014 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_58731b74b95d47d8a34ae136ce89f695()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014a<br><\/p><\/div><!--\r\n58731b74b95d47d8a34ae136ce89f695 ##### SOURCE BEGIN #####\r\n%%\r\n% My last meeting of the day ended just a little while ago, and I was starting\r\n% to think seriously about figuring out a blog post for this week. What to write\r\n% about? That's when I happened to see that Cleve posted\r\n% <https:\/\/blogs.mathworks.com\/cleve\/2014\/09\/15\/fft-fast-finite-fourier-transform\/\r\n% just yesterday> about the FFT. Ah ha! I'll write about that, too (in honor of\r\n% National Blog About the FFT Week).\r\n%\r\n%% To Recurse or Not?\r\n%\r\n% I have three little sort-of stories to tell, the first of which was inspired\r\n% directly by Cleve's post. Cleve described a very compact, elegant\r\n% implementation of the divide-and-conquer idea that's central to fast Fourier\r\n% transform algorithms. Here's the key code fragment of Cleve's textbook\r\n% function |ffttx|:\r\n%\r\n%      % Recursive divide and conquer\r\n%      k = (0:n\/2-1)';\r\n%      w = omega .^ k;\r\n%      u = ffttx(x(1:2:n-1));\r\n%      v = w.*ffttx(x(2:2:n));\r\n%      y = [u+v; u-v];\r\n%\r\n% Function |ffttx| calls itself recursively, twice. Each recursive call is on\r\n% half of the original input values. Then |ffttx| does a litte extra computation\r\n% at the end to produce the final result.\r\n%\r\n% Like Cleve, I really love this little bit of code. But it makes me think back\r\n% to all the FFT code that I saw back when I was first learning digital signal\r\n% processing. (That would be the mid- to late-1980s.) Here's a typical textbook\r\n% sample from the 1989 edition of _Discrete-Time Signal Processing_ by Oppenheim\r\n% and Schafer):\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/images\/steve\/2014\/fortran-fft.jpg>>\r\n%\r\n% This code is typical for the time. The focus is on doing the computation\r\n% in-place and using as few multiplies as possible. One thing you don't see\r\n% here: explicit recursion! The function |DITFFT| does not call itself, either\r\n% directly or indirectly. That was typical for the time. No one wanted to use\r\n% the memory to make copies of the two half-length arrays, and no one wanted to\r\n% incur the speed penalty of many recursive function calls. Another typical\r\n% thing you see in this code is something charmingly called bit-reversed\r\n% sorting, but that's a topic for another day (maybe).\r\n%\r\n% Anyway, my point is that modern FFT implementations built for today's computer\r\n% architectures often do make use of explicit recursion. The additional cost\r\n% of dividing the input array into two or more smaller copies is more than\r\n% offset by the gains of more cache-friendly execution as the divide-and-conquer\r\n% problems get smaller and smaller.\r\n%\r\n%% Butterflies and Laws\r\n%\r\n% My second little story was inspired by one line of code that Cleve posted:\r\n%\r\n%  y = [u+v; u-v];\r\n%\r\n% In the case where |u| is |x(1)| and |v| is |x(2)|, then the computation above\r\n% is just the 2-point DFT of |x|.\r\n%\r\n% A signal flow diagram for it looks like this (also from\r\n% _Discrete-Time Signal Processing_):\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/images\/steve\/2014\/butterfly.jpg>>\r\n%\r\n% Digital signal processing experts have for decades referred to this little\r\n% graph as a \"butterfly diagram\" because of its appearance.\r\n%\r\n% Well, to make a little extra money for myself in grad school, I wrote a couple\r\n% of chapters of the solutions manual for _Discrete-Time Signal Processing_. One\r\n% of the chapters covered FFTs. This effort left me completely fed up with\r\n% drawing butterfly diagrams and counting multiplies. One day I posted the\r\n% following chart outside my cubicle:\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/images\/steve\/2014\/bill-becomes-a-law.jpg>>\r\n%\r\n% Yes, digital signal processing people even call that mess above a \"butterfly\r\n% diagram.\" We're a weird bunch.\r\n%\r\n% That butterfly diagram indirectly leads me to my third little FFT story.\r\n%\r\n%% The Slowest Way (Ever?) to Fill a Vector with NaNs\r\n%\r\n% About ten years ago, a tech support case was forwarded to me. A customer\r\n% complained that MATLAB was very slow in computing the FFT of one specific\r\n% vector. Other vectors of the same length would go really fast, but not this\r\n% one.\r\n%\r\n% So I loaded the customer's MAT-file containing the vector and started running\r\n% FFTs on it. To my surprise, it really was slow! I scratched my head. Then I\r\n% looked at the output values, and I was even more surprised to see that they\r\n% were all NaNs! Strange! What was the point? And was this related to the speed\r\n% issue?\r\n%\r\n% I looked at the long input vector provided by the customer, and I found that\r\n% just one of the input values was NaN.\r\n%\r\n% Light bulb!\r\n%\r\n% Take a look again at that complicated butterfly diagram above. Notice that\r\n% every output value depends on every input value. That's true in general for\r\n% FFT computations of any length. With the way NaNs propagate through arithmetic\r\n% operations, that implies that just one NaN in the input vector means that the\r\n% output of |fft| will always be completely filled with NaNs.\r\n%\r\n% That was related to the speed issue, as it turned out. NaN calculations often\r\n% are executed in a less-well optimized part of the CPU, or even in software.\r\n% Lots and lots of NaN calculations bogged down this particular customer's\r\n% problem.\r\n%\r\n% FFTs are a never-ending source of fun and mystery!\r\n%\r\n% I have written quite a few blog posts about Fourier transform theory and\r\n% algorithms. You can see them all in the\r\n% <https:\/\/blogs.mathworks.com\/steve\/category\/fourier-transforms Fourier\r\n% transforms category>.\r\n%\r\n% PS. If you have ideas for even more fun ways to create a vector of NaNs, be\r\n% sure to add them to the comments on this post.\r\n\r\n##### SOURCE END ##### 58731b74b95d47d8a34ae136ce89f695\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2014\/fortran-fft.jpg\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>My last meeting of the day ended just a little while ago, and I was starting to think seriously about figuring out a blog post for this week. What to write about? That's when I happened to see that Cleve posted <a href=\"https:\/\/blogs.mathworks.com\/cleve\/2014\/09\/15\/fft-fast-finite-fourier-transform\/\">just yesterday<\/a> about the FFT. Ah ha! I'll write about that, too (in honor of National Blog About the FFT Week).... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2014\/09\/16\/fft-stories\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[20],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/1134"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=1134"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/1134\/revisions"}],"predecessor-version":[{"id":1138,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/1134\/revisions\/1138"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=1134"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=1134"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=1134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}