{"id":11345,"date":"2024-06-27T13:54:38","date_gmt":"2024-06-27T17:54:38","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=11345"},"modified":"2024-07-01T02:42:20","modified_gmt":"2024-07-01T06:42:20","slug":"supersum-in-defense-of-floating-point-arithmetic","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2024\/06\/27\/supersum-in-defense-of-floating-point-arithmetic\/","title":{"rendered":"SuperSum, In Defense of Floating Point Arithmetic"},"content":{"rendered":"<div class=\"content\"><!--introduction-->\r\n<p>Floating point arithmetic doesn't get the respect it deserves. Many people consider it mysterious, fuzzy, unpredictable. These misgivings often occur in discussion of vector sums. Our provocatively named <i>SuperSum<\/i> is intended to calm these fears.<\/p>\r\n<!--\/introduction-->\r\n<h3>Contents<\/h3>\r\n<div>\r\n<ul>\r\n<li>\r\n<a href=\"#6cc58603-9b70-4a29-bc08-fae5fe3d5db0\">Ledgers<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#cc3635b6-5a76-4420-baeb-94c5b36a6573\">Checksums<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#71f16826-8ff1-4a14-bc85-146da594d56f\">Order<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#5356ff4e-b231-4d40-9404-17b0d0d882af\">Speed<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#52953716-f4f9-4698-9680-8a5d38254f1e\">Three Sums<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#f8a4c9a1-edd5-4ac3-9dca-bc7306b7de0f\">Toy Example<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#8acba015-e19b-413e-8817-1e093e31ee2d\">Second Test<\/a>\r\n<\/li>\r\n<li>\r\n<a href=\"#e0e62ed1-a352-49ea-ac59-31107304d71e\">Conclusion<\/a>\r\n<\/li>\r\n<\/ul>\r\n<\/div>\r\n<h4>Ledgers<a name=\"6cc58603-9b70-4a29-bc08-fae5fe3d5db0\"><\/a>\r\n<\/h4>\r\n<p>A <i>ledger<\/i> is a list of transactions in an account. <i>Auditing<\/i> the ledger involves comparing the total of the items with the change in the account balance.<\/p>\r\n<p>If <i>v<\/i> is a vector of transaction amounts, then we need to compute<\/p>\r\n<pre>sum(v)<\/pre>\r\n<p>If this sum is equal to the change in the balance, then it is reasonable to assume that the ledger is correct. If not, the ledger must be examined line-by-line.<\/p>\r\n<h4>Checksums<a name=\"cc3635b6-5a76-4420-baeb-94c5b36a6573\"><\/a>\r\n<\/h4>\r\n<p>Have you ever used a checksum for a file transfer? One checksum is computed before the file is sent. After the file has been sent over a questionable channel, a second checksum is computed on the receiving end. If the two checksums agree, the transmission was probably okay. If not, the file must be sent again.<\/p>\r\n<h4>Order<a name=\"71f16826-8ff1-4a14-bc85-146da594d56f\"><\/a>\r\n<\/h4>\r\n<p>Floating point addition is not <i>associative<\/i>. This means<\/p>\r\n<pre>(a + b) + c<\/pre>\r\n<p>is not necessarily the same as<\/p>\r\n<pre class=\"language-matlab\">a + (b + c).\r\n<\/pre>\r\n<p>So the <i>order<\/i> of doing a computation is important.<\/p>\r\n<p>Computers are deterministic devices. If the same computation is done <i>in the same order<\/i> more than once, the results must be the same. If you get different sums from different runs on a fixed computer, then it must be because the order has been changed.<\/p>\r\n<h4>Speed<a name=\"5356ff4e-b231-4d40-9404-17b0d0d882af\"><\/a>\r\n<\/h4>\r\n<p>In recent years, we have made the built-in function <tt>sum(x)<\/tt> faster by parallelizing it. The input vector is broken into several pieces, the sum of each piece is computed separately and simultaneously, then the partial sums are combined to give the final result. If the number and size of the pieces varies from run to run, the order varies and consequently the computed sums may vary.<\/p>\r\n<h4>Three Sums<a name=\"52953716-f4f9-4698-9680-8a5d38254f1e\"><\/a>\r\n<\/h4>\r\n<p>Here are three replacements for <tt>nansum(x)<\/tt>, the version of <tt>sum(x)<\/tt> that skips over <tt>NaNs<\/tt> and <tt>infs<\/tt>.<\/p>\r\n<p>\r\n<b>simplesum<\/b>\r\n<\/p>\r\n<p>Avoid the effect of blocking in the built-in sum(x).<\/p>\r\n<pre>function s = simplesum(x)\r\n     % simplesum.  s = simplesum(x), for vector(x).\r\n     % Force sequential order for sum(x).\r\n     % Skips NaNs and infs.<\/pre>\r\n<pre>     s = 0;\r\n     for k = 1:length(x)\r\n         if isfinite(x(k))\r\n             s = s + x(k);\r\n         end\r\n     end\r\nend<\/pre>\r\n<p>\r\n<b>KahanSum<\/b>\r\n<\/p>\r\n<p>This is the Kahan summation algorithm. The sum is accumulated in two words, the more significant <tt>s<\/tt> and the correction <tt>c<\/tt>. If you were able to form <tt>s + c<\/tt> exactly, it would be more accurate than <tt>s<\/tt> alone.<\/p>\r\n<pre>function s = KahanSum(x)\r\n     % KahanSum.  s = KahanSum(x), for vector(x).\r\n     % More accurate, but slower, than sum(x).\r\n     % Skips NaNs and infs.\r\n     % https:\/\/en.wikipedia.org\/wiki\/Kahan_summation_algorithm.<\/pre>\r\n<pre>     s = 0;  % sum\r\n     c = 0;  % correction\r\n     for k = 1:length(x)\r\n         if isfinite(x(k))\r\n             y = x(k) - c;\r\n             t = s + y;\r\n             c = (t - s) - y;\r\n             s = t;\r\n         end\r\n     end\r\nend<\/pre>\r\n<p>\r\n<b>SuperSum<\/b>\r\n<\/p>\r\n<p>I gave it a pretentious name to grab attention. Use extended precision, with enough digits to hold any MATLAB double.<\/p>\r\n<pre>function s = SuperSum(x)\r\n     % SuperSum.  s = SuperSum(x), for vector(x).\r\n     % Symbolic Math Toolbox extended precision.\r\n     % Skips NaNs and infs.\r\n     %\r\n     % 632 decimal digits holds every IEEE-754 double.\r\n     % 632 = ceil(log10(realmax) - log10(eps*realmin));\r\n     %\r\n     din = digits(632);  % Remember current setting\r\n     s = double(sum(sym(x(isfinite(x)),'D')));\r\n     digits(din)         % Restore\r\nend<\/pre>\r\n<h4>Toy Example<a name=\"f8a4c9a1-edd5-4ac3-9dca-bc7306b7de0f\"><\/a>\r\n<\/h4>\r\n<p>A test case here at MathWorks, known by a French-Canadian name that translates to \"toy example\", has a vector <tt>e2<\/tt> of length 4243 and values that range from -3.3e7 to 6.8e9.<\/p>\r\n<p>When running tests involving floating point numbers it is a good idea to set the output format to <tt>hex<\/tt> so we can see every last bit.<\/p>\r\n<pre>format hex\r\nload jeuTest e2\r\nx = e2;<\/pre>\r\n<pre>[nansum(x)\r\n simplesum(x)\r\n KahanSum(x)\r\n SuperSum(x)]<\/pre>\r\n<pre>ans =\r\n   423785e8206150e2\r\n   423785e8206150e0\r\n   423785e8206150e1\r\n   423785e8206150e1<\/pre>\r\n<p>For <tt>jeuTest<\/tt>, Kahan summation gets the same result as <tt>SuperSum<\/tt>, while <tt>nansum<\/tt> and <tt>simplesum<\/tt> differ in the last bit or two.<\/p>\r\n<h4>Second Test<a name=\"8acba015-e19b-413e-8817-1e093e31ee2d\"><\/a>\r\n<\/h4>\r\n<p>The vector length is only three, but the third term completely cancels the first, and the second term rises from obscurity. In this situation, <tt>KahanSum<\/tt> is no more accurate than <tt>sum<\/tt>.<\/p>\r\n<pre>format hex\r\nx = [1 1e-14 -1]'<\/pre>\r\n<pre>[nansum(x)\r\n simplesum(x)\r\n KahanSum(x)\r\n SuperSum(x)]<\/pre>\r\n<pre>x =\r\n   3ff0000000000000\r\n   3d06849b86a12b9b\r\n   bff0000000000000<\/pre>\r\n<pre>ans =\r\n   3d06800000000000\r\n   3d06800000000000\r\n   3d06800000000000\r\n   3d06849b86a12b9b<\/pre>\r\n<h4>Conclusion<a name=\"e0e62ed1-a352-49ea-ac59-31107304d71e\"><\/a>\r\n<\/h4>\r\n<p>I will leave careful timing for another day. Let's just say that in situations like <tt>jeuTest<\/tt>, <tt>KahanSum<\/tt> is probably all you need. It is usually more accurate than <tt>sum<\/tt>, and not much slower.<\/p>\r\n<p>However, for complete reliability, use <tt>SuperSum<\/tt>. There is no substitute for the right answer.<\/p>\r\n<script language=\"JavaScript\"> <!-- \r\n    function grabCode_7bd084f1e3e84c86b0c419e2b2dcd8b5() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='7bd084f1e3e84c86b0c419e2b2dcd8b5 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 7bd084f1e3e84c86b0c419e2b2dcd8b5';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2024 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script>\r\n<p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\">\r\n<br>\r\n<a href=\"javascript:grabCode_7bd084f1e3e84c86b0c419e2b2dcd8b5()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript>\r\n<\/span><\/a>\r\n<br>\r\n<br>\r\n      Published with MATLAB&reg; R2024a<br>\r\n<\/p>\r\n<\/div>\r\n<!--\r\n7bd084f1e3e84c86b0c419e2b2dcd8b5 ##### SOURCE BEGIN #####\r\n%% SuperSum, In Defense of Floating Point Arithmetic\r\n% Floating point arithmetic doesn't get the respect it deserves.\r\n% Many people consider it mysterious, fuzzy, unpredictable.\r\n% These misgivings often occur in\r\n% discussion of vector sums.  Our provocatively named\r\n% _SuperSum_ is intended to calm these fears.\r\n\r\n%% Ledgers\r\n% A _ledger_ is a list of transactions in an account.\r\n% _Auditing_ the ledger involves comparing the total of \r\n% the items with the change in the account balance.\r\n%\r\n% If _v_ is a vector of transaction amounts, then we\r\n% need to compute\r\n%\r\n%  sum(v)\r\n% \r\n% If this sum is equal to the change in the balance, then it is\r\n% reasonable to assume that the ledger is correct.\r\n% If not, the ledger must be examined line-by-line.\r\n%\r\n%% Checksums\r\n% Have you ever used a checksum for a file transfer?\r\n% One checksum is computed before the file is sent.  After the file\r\n% has been sent over a questionable channel, a second checksum is \r\n% computed on the receiving end.  If the two checksums agree,\r\n% the transmission was probably okay.  If not, the file must be\r\n% sent again.\r\n\r\n%% Order\r\n% Floating point addition is not _associative_.  This means\r\n%\r\n%  (a + b) + c\r\n%\r\n% is not necessarily the same as\r\n%\r\n%   a + (b + c).\r\n%\r\n% So the _order_ of doing a computation is important.\r\n%\r\n% Computers are deterministic devices.  If the same\r\n% computation is done _in the same order_ more than once,\r\n% the results must be the same.\r\n% If you get different sums from different runs on a fixed\r\n% computer, then it must be because the order has been \r\n% changed.\r\n\r\n%% Speed\r\n% In recent years, we have made the built-in function |sum(x)|\r\n% faster by parallelizing it.  The input vector is broken into\r\n% several pieces, the sum of each piece is computed separately\r\n% and simultaneously, then the partial sums are combined to give\r\n% the final result.  If the number and size of the pieces varies\r\n% from run to run, the order varies and consequently the computed\r\n% sums may vary.\r\n\r\n%% Three Sums\r\n% Here are three replacements for |nansum(x)|,\r\n% the version of |sum(x)| that skips over |NaNs| and |infs|.\r\n%\r\n% *simplesum*\r\n%\r\n% Avoid the effect of blocking in the built-in sum(x).\r\n%\r\n%  function s = simplesum(x)\r\n%       % simplesum.  s = simplesum(x), for vector(x).\r\n%       % Force sequential order for sum(x).\r\n%       % Skips NaNs and infs.\r\n%\r\n%       s = 0;\r\n%       for k = 1:length(x)\r\n%           if isfinite(x(k))\r\n%               s = s + x(k);\r\n%           end\r\n%       end\r\n%  end\r\n%\r\n% *KahanSum*\r\n%\r\n% This is the Kahan summation algorithm.\r\n% The sum is accumulated in two words, the more significant\r\n% |s| and the correction |c|.\r\n% If you were able to form |s + c| exactly, it would be more\r\n% accurate than |s| alone.\r\n%\r\n%  function s = KahanSum(x)\r\n%       % KahanSum.  s = KahanSum(x), for vector(x).\r\n%       % More accurate, but slower, than sum(x).\r\n%       % Skips NaNs and infs.\r\n%       % https:\/\/en.wikipedia.org\/wiki\/Kahan_summation_algorithm.\r\n%   \r\n%       s = 0;  % sum\r\n%       c = 0;  % correction\r\n%       for k = 1:length(x)\r\n%           if isfinite(x(k))\r\n%               y = x(k) - c;\r\n%               t = s + y;\r\n%               c = (t - s) - y;\r\n%               s = t;\r\n%           end\r\n%       end\r\n%  end\r\n%\r\n% *SuperSum*\r\n%\r\n% I gave it a pretentious name to grab attention.\r\n% Use extended precision, with enough digits to hold any MATLAB double.\r\n%\r\n%  function s = SuperSum(x) \r\n%       % SuperSum.  s = SuperSum(x), for vector(x).\r\n%       % Symbolic Math Toolbox extended precision.\r\n%       % Skips NaNs and infs.\r\n%       %\r\n%       % 632 decimal digits holds every IEEE-754 double.\r\n%       % 632 = ceil(log10(realmax) - log10(eps*realmin));\r\n%       %\r\n%       din = digits(632);  % Remember current setting\r\n%       s = double(sum(sym(x(isfinite(x)),'D')));\r\n%       digits(din)         % Restore\r\n%  end\r\n\r\n%% Toy Example\r\n% A test case here at MathWorks, known by a French-Canadian name that\r\n% translates to \"toy example\", has a vector |e2|\r\n% of length 4243 and values that range from -3.3e7 to 6.8e9.\r\n%\r\n% When running tests involving floating point numbers\r\n% it is a good idea to set the output format to |hex|\r\n% so we can see every last bit.\r\n%\r\n%  format hex\r\n%  load jeuTest e2\r\n%  x = e2;\r\n%\r\n%  [nansum(x)\r\n%   simplesum(x)\r\n%   KahanSum(x)\r\n%   SuperSum(x)]\r\n%\r\n%  ans =\r\n%     423785e8206150e2\r\n%     423785e8206150e0\r\n%     423785e8206150e1\r\n%     423785e8206150e1\r\n%\r\n% For |jeuTest|,  Kahan summation gets the same result as |SuperSum|,\r\n% while |nansum| and |simplesum| differ in the last bit or two.\r\n%\r\n\r\n%% Second Test\r\n% The vector length is only three, but the third term completely\r\n% cancels the first, and the second term rises from obscurity.\r\n% In this situation, |KahanSum| is no more accurate than |sum|.\r\n% \r\n%  format hex\r\n%  x = [1 1e-14 -1]'\r\n%\r\n%  [nansum(x)\r\n%   simplesum(x)\r\n%   KahanSum(x)\r\n%   SuperSum(x)]\r\n%\r\n%  x =\r\n%     3ff0000000000000\r\n%     3d06849b86a12b9b\r\n%     bff0000000000000\r\n%\r\n%  ans =\r\n%     3d06800000000000\r\n%     3d06800000000000\r\n%     3d06800000000000\r\n%     3d06849b86a12b9b\r\n\r\n%% Conclusion\r\n% I will leave careful timing for another day.  Let's just say that  \r\n% in situations like |jeuTest|, |KahanSum| is probably all you need.\r\n% It is usually more accurate than |sum|, and not much slower.\r\n%\r\n% However, for complete reliability, use |SuperSum|.  There is no\r\n% substitute for the right answer.\r\n##### SOURCE END ##### 7bd084f1e3e84c86b0c419e2b2dcd8b5\r\n-->\r\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/super_cloud.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><!--introduction-->\r\n<p>Floating point arithmetic doesn't get the respect it deserves. Many people consider it mysterious, fuzzy, unpredictable. These misgivings often occur in discussion of vector sums. Our provocatively named <i>SuperSum<\/i> is intended to calm these fears.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2024\/06\/27\/supersum-in-defense-of-floating-point-arithmetic\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":11372,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,14,7,20],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/11345"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=11345"}],"version-history":[{"count":7,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/11345\/revisions"}],"predecessor-version":[{"id":11375,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/11345\/revisions\/11375"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media\/11372"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=11345"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=11345"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=11345"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}