{"id":1172,"date":"2015-03-02T12:00:18","date_gmt":"2015-03-02T17:00:18","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=1172"},"modified":"2015-02-17T21:20:54","modified_gmt":"2015-02-18T02:20:54","slug":"triple-precision-accumlated-inner-product","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2015\/03\/02\/triple-precision-accumlated-inner-product\/","title":{"rendered":"Triple Precision Accumlated Inner Product"},"content":{"rendered":"\r\n<div class=\"content\"><!--introduction--><p>Single and double precision are combined to facilitate a triple precision accumulated inner product.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#a715bce5-f085-466e-946f-a46caba8b681\">Iterative refinement<\/a><\/li><li><a href=\"#9fd5db06-96d9-47f6-9859-b4039f388e3e\">Example with double precision<\/a><\/li><li><a href=\"#431d886b-631b-494f-a8eb-a6b35a61d72a\">Triple precision<\/a><\/li><li><a href=\"#7b8e7cac-2165-4d23-93b7-ed0593a38af1\">dot3p<\/a><\/li><li><a href=\"#c54f50c8-7025-4d2d-ae1c-edb3e1dda6d6\">Example with triple precision<\/a><\/li><li><a href=\"#c882ae88-b60e-496a-9822-a1833be98632\">residual3p<\/a><\/li><\/ul><\/div><h4>Iterative refinement<a name=\"a715bce5-f085-466e-946f-a46caba8b681\"><\/a><\/h4><p>In my <a href=\"https:\/\/blogs.mathworks.com\/cleve\/2015\/02\/16\/iterative-refinement-for-solutions-to-linear-systems\/\">previous post on iterative refinement<\/a> I showed the need for an inner product routine that is more accurate than double precision.  This post is about such a function.<\/p><h4>Example with double precision<a name=\"9fd5db06-96d9-47f6-9859-b4039f388e3e\"><\/a><\/h4><p>The example I am going to use is contrived so that the first and third terms in the inner product exactly cancel each other, leaving the much smaller second term to arise from the ashes.<\/p><pre class=\"codeinput\">   format <span class=\"string\">long<\/span> <span class=\"string\">e<\/span>\r\n   x = [1 1\/3 1]'\r\n   y = [1 3e-9 -1]'\r\n<\/pre><pre class=\"codeoutput\">x =\r\n     1.000000000000000e+00\r\n     3.333333333333333e-01\r\n     1.000000000000000e+00\r\ny =\r\n     1.000000000000000e+00\r\n     3.000000000000000e-09\r\n    -1.000000000000000e+00\r\n<\/pre><p>Computing the inner product with a conventional MATLAB statement shows the intended difficulty.<\/p><pre class=\"codeinput\">   dot2p = x'*y\r\n<\/pre><pre class=\"codeoutput\">dot2p =\r\n     1.000000082740371e-09\r\n<\/pre><p>The result should be <tt>1.000000000000000e-09<\/tt>. We're getting only about half of the significant digits correct.<\/p><p>Of course, the problem is that the second intermediate sum is<\/p><pre class=\"codeinput\">   s2 = 1 + 1\/3*3e-9\r\n<\/pre><pre class=\"codeoutput\">s2 =\r\n     1.000000001000000e+00\r\n<\/pre><p>That's OK in decimal, but not in binary.  There is not enough room in one double precision floating point number to store the bits that are going to be needed when that leading one is cancelled by the third term in the sum.<\/p><h4>Triple precision<a name=\"431d886b-631b-494f-a8eb-a6b35a61d72a\"><\/a><\/h4><p>I do not have a full blown triple precision arithmetic package by any means. I have just enough to do this one task.  Here is the basic idea. A double precision floating point number has 14 hexadecimal digits in its fraction.  I can use <tt>single<\/tt> and <tt>double<\/tt> to break a double into 6 high order hex digits and 8 low order hex digits, like this.<\/p><pre class=\"codeinput\">   format <span class=\"string\">hex<\/span>\r\n   x = 1\/3\r\n   xhi = double(single(x))\r\n   xlo = x - xhi\r\n<\/pre><pre class=\"codeoutput\">x =\r\n   3fd5555555555555\r\nxhi =\r\n   3fd5555560000000\r\nxlo =\r\n   be45555556000000\r\n<\/pre><p>Two doubles with more than half of their low order bits equal to zero can be multiplied together with no roundoff error.  For example<\/p><pre class=\"codeinput\">   pihi = double(single(pi))\r\n   pio3hi = xhi*pihi\r\n<\/pre><pre class=\"codeoutput\">pihi =\r\n   400921fb60000000\r\npio3hi =\r\n   3ff0c1524860a920\r\n<\/pre><p>That trailing zero in <tt>pio3hi<\/tt> is an indication that the result is exact.<\/p><p>Additions are not exact, when the two numbers differ by several orders of magnitude.  This fact will eventually be the limiting factor of our inner product routine.<\/p><h4>dot3p<a name=\"7b8e7cac-2165-4d23-93b7-ed0593a38af1\"><\/a><\/h4><p>My inner production routine is both <i>accumulated<\/i>, which means it uses extra precise arithmetic, and <i>extended<\/i>, which means an extra scalar is added to the result using the extra precision. You can download this function from <a href=\"https:\/\/blogs.mathworks.com\/images\/cleve\/dot3p.m\">here<\/a>.<\/p><pre class=\"codeinput\">   dbtype <span class=\"string\">dot3p<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n1     function s = dot3p(x,y,s)\r\n2     % DOT3P s = dot3p(x,y,s)  Triple precision extended inner product.\r\n3     % s = x'*y + s for vectors x and y and scalar s.\r\n4     \r\n5        shi = double(single(s));\r\n6        slo = s - shi;\r\n7        for k = 1:length(x)\r\n8           xhi = double(single(x(k)));\r\n9           xlo = x(k) - xhi;\r\n10          yhi = double(single(y(k)));\r\n11          ylo = y(k) - yhi;\r\n12          tmp = xhi*yhi;\r\n13          zhi = double(single(tmp));\r\n14          zlo = tmp - zhi + xhi*ylo + xlo*yhi + xlo*ylo;\r\n15    \r\n16          tmp = shi + zhi;\r\n17          del = tmp - shi - zhi;\r\n18          shi = double(single(tmp));\r\n19          slo = tmp - shi + slo + zlo - del;\r\n20       end\r\n21       s = shi +  slo;\r\n<\/pre><h4>Example with triple precision<a name=\"c54f50c8-7025-4d2d-ae1c-edb3e1dda6d6\"><\/a><\/h4><p>Let's run my example with <tt>dot3p<\/tt> in the debugger and look at some intermediate results.<\/p><pre class=\"codeinput\"><span class=\"comment\">%  dbstop 16 dot3p<\/span>\r\n<span class=\"comment\">%  dbstop 20 dot3p<\/span>\r\n<span class=\"comment\">%  dot3p(x,y,0)<\/span>\r\n<\/pre><p>The variables <tt>shi<\/tt> and <tt>slo<\/tt> carry the sum in triple precision. The first time through the loop there are no roundoff errors, and <tt>shi<\/tt> and <tt>slo<\/tt> are set to <tt>1.0<\/tt> and <tt>0.0<\/tt>.<\/p><p>Let's look at the second pass through the loop.  Halfway through, at statement 16.<\/p><pre class=\"language-matlab\">K&gt;&gt; format hex\r\nK&gt;&gt; xhi,xlo,yhi,ylo,tmp,zhi,zlo\r\nxhi =\r\n   3fd5555560000000\r\nxlo =\r\n   be45555556000000\r\nyhi =\r\n   3e29c511e0000000\r\nylo =\r\n   bc7e2df108000000\r\ntmp =\r\n   3e112e0bf341b0a0\r\nzhi =\r\n   3e112e0c00000000\r\nzlo =\r\n   bc97d9296b9a0d85\r\nK&gt;&gt;\r\nK&gt;&gt; format long <span class=\"string\">e<\/span>\r\nK&gt;&gt; xhi,xlo,yhi,ylo,tmp,zhi,zlo\r\nxhi =\r\n     3.333333432674408e-01\r\nxlo =\r\n    -9.934107481068821e-09\r\nyhi =\r\n     3.000000026176508e-09\r\nylo =\r\n    -2.617650809219019e-17\r\ntmp =\r\n     1.000000038527825e-09\r\nzhi =\r\n     1.000000082740371e-09\r\nzlo =\r\n    -8.274037106125165e-17\r\n<\/pre><p>We can see that <tt>xhi<\/tt> is the first six hex digits of <tt>1\/3<\/tt> and <tt>xlo<\/tt> is the remaining digits.  Similarly, <tt>yhi<\/tt> is the first six hex digits of <tt>3e-9<\/tt> and <tt>ylo<\/tt> is the remaing digits.  <tt>zhi<\/tt> is the first six hex digits of the <tt>xhi*yhi<\/tt> and <tt>zlo<\/tt> is a crucial correction term.<\/p><p>Stopping at the end of the second pass through the loop, at statement 20.<\/p><pre class=\"language-matlab\">K&gt;&gt; format hex\r\nK&gt;&gt; tmp, del, shi, slo\r\ntmp =\r\n   3ff000000044b830\r\ndel =\r\n   0000000000000000\r\nshi =\r\n   3ff0000000000000\r\nslo =\r\n   3e112e0be826d694\r\nK&gt;&gt;\r\nK&gt;&gt; format long <span class=\"string\">e<\/span>,\r\nK&gt;&gt; tmp, del, shi, slo\r\ntmp =\r\n     1.000000001000000e+00\r\ndel =\r\n     0\r\nshi =\r\n     1\r\nslo =\r\n     9.999999999999999e-10\r\n<\/pre><p><tt>tmp<\/tt> is <tt>1.0<\/tt> plus some of the bits of <tt>1.0e-10<\/tt>. <tt>del<\/tt> is zero because it is not needed in this example. It is involved when the terms vary over an even wider range. <tt>shi<\/tt> is exactly <tt>1.0<\/tt>, which is the high order part of the evolving sum. And <tt>slo<\/tt> has become <tt>1.0e-10<\/tt> to full double precison accuracy.<\/p><p>On the third time through the loop there will be no roundoff errors, <tt>shi<\/tt> will be completely cancelled, and <tt>slo<\/tt> will bear the full responsibility for providing the final answer.<\/p><p>Of course, this example is contrived and unusual.  Ordinarily, we can expect some cancellation (otherwise there would be no need for an accumulated inner product), with the high order part losing at least a few digits and the low order part filling them in.<\/p><h4>residual3p<a name=\"c882ae88-b60e-496a-9822-a1833be98632\"><\/a><\/h4><p>With a dot product in hand, it is easy to write the residual function. Here the <i>extended<\/i> feature is essential because we expect extreme, if not total, cancellation when the right hand side is subtracted from the dot product.<\/p><pre class=\"codeinput\">   type <span class=\"string\">residual3p<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\nfunction r = residual3p(A,x,b)\r\n% RESIDUAL3p Triple precision residual, A*x - b.\r\n% r = residual3p(A,x,b) for matrix A, vectors x and b. \r\n\r\n   m = size(A,1);\r\n   r = zeros(m,1);\r\n   for k = 1:m\r\n      r(k) = dot3p(A(k,:),x,-b(k));\r\n   end\r\n\r\nend\r\n<\/pre><p>You can download this function from <a href=\"https:\/\/blogs.mathworks.com\/images\/cleve\/residual3p.m\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_d36202b4d5ba4ce2830716d9fc0b8abc() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='d36202b4d5ba4ce2830716d9fc0b8abc ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' d36202b4d5ba4ce2830716d9fc0b8abc';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2015 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_d36202b4d5ba4ce2830716d9fc0b8abc()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014b<br><\/p><\/div><!--\r\nd36202b4d5ba4ce2830716d9fc0b8abc ##### SOURCE BEGIN #####\r\n%% Triple Precision Accumlated Inner Product\r\n% Single and double precision are combined to facilitate a triple precision\r\n% accumulated inner product.\r\n\r\n%% Iterative refinement\r\n% In my <https:\/\/blogs.mathworks.com\/cleve\/2015\/02\/16\/iterative-refinement-for-solutions-to-linear-systems\/\r\n% previous post on iterative refinement> I showed the need for an inner\r\n% product routine that is more accurate than double precision.  This post\r\n% is about such a function.\r\n\r\n%% Example with double precision\r\n% The example I am going to use is contrived so that the first and third terms\r\n% in the inner product exactly cancel each other, leaving the much smaller\r\n% second term to arise from the ashes.\r\n\r\n   format long e\r\n   x = [1 1\/3 1]'\r\n   y = [1 3e-9 -1]'\r\n\r\n%%\r\n% Computing the inner product with a conventional MATLAB statement shows\r\n% the intended difficulty.\r\n\r\n   dot2p = x'*y\r\n\r\n%%\r\n% The result should be |1.000000000000000e-09|.\r\n% We're getting only about half of the significant digits correct.\r\n\r\n%%\r\n% Of course, the problem is that the second intermediate sum is\r\n\r\n   s2 = 1 + 1\/3*3e-9\r\n\r\n%%\r\n% That's OK in decimal, but not in binary.  There is not enough room in one\r\n% double precision floating point number to store the bits that are going to\r\n% be needed when that leading one is cancelled by the third term in the sum.\r\n\r\n%% Triple precision\r\n% I do not have a full blown triple precision arithmetic package by any means.\r\n% I have just enough to do this one task.  Here is the basic idea.\r\n% A double precision floating point number has 14 hexadecimal digits\r\n% in its fraction.  I can use |single| and |double| to break a double into\r\n% 6 high order hex digits and 8 low order hex digits, like this.\r\n\r\n   format hex\r\n   x = 1\/3\r\n   xhi = double(single(x))\r\n   xlo = x - xhi\r\n\r\n%%\r\n% Two doubles with more than half of their low order bits equal to zero\r\n% can be multiplied together with no roundoff error.  For example\r\n\r\n   pihi = double(single(pi))\r\n   pio3hi = xhi*pihi\r\n\r\n%%\r\n% That trailing zero in |pio3hi| is an indication that the result is exact.\r\n\r\n%%\r\n% Additions are not exact, when the two numbers differ by several orders\r\n% of magnitude.  This fact will eventually be the limiting factor of our\r\n% inner product routine.\r\n\r\n%% dot3p\r\n% My inner production routine is both _accumulated_, which means it uses\r\n% extra precise arithmetic, and _extended_, which means an extra scalar is\r\n% added to the result using the extra precision.\r\n% You can download this function from\r\n% <https:\/\/blogs.mathworks.com\/images\/cleve\/dot3p.m here>.\r\n\r\n   dbtype dot3p\r\n\r\n%% Example with triple precision\r\n% Let's run my example with |dot3p| in the debugger and look at some\r\n% intermediate results.\r\n\r\n%  dbstop 16 dot3p\r\n%  dbstop 20 dot3p\r\n%  dot3p(x,y,0)\r\n\r\n%%\r\n% The variables |shi| and |slo| carry the sum in triple precision.\r\n% The first time through the loop there are no roundoff errors,\r\n% and |shi| and |slo| are set to |1.0| and |0.0|.\r\n\r\n%%\r\n% Let's look at the second pass through the loop.  Halfway through,\r\n% at statement 16.\r\n%\r\n%   K>> format hex\r\n%   K>> xhi,xlo,yhi,ylo,tmp,zhi,zlo\r\n%   xhi =\r\n%      3fd5555560000000\r\n%   xlo =\r\n%      be45555556000000\r\n%   yhi =\r\n%      3e29c511e0000000\r\n%   ylo =\r\n%      bc7e2df108000000\r\n%   tmp =\r\n%      3e112e0bf341b0a0\r\n%   zhi =\r\n%      3e112e0c00000000\r\n%   zlo =\r\n%      bc97d9296b9a0d85\r\n%   K>>\r\n%   K>> format long e\r\n%   K>> xhi,xlo,yhi,ylo,tmp,zhi,zlo\r\n%   xhi =\r\n%        3.333333432674408e-01\r\n%   xlo =\r\n%       -9.934107481068821e-09\r\n%   yhi =\r\n%        3.000000026176508e-09\r\n%   ylo =\r\n%       -2.617650809219019e-17\r\n%   tmp =\r\n%        1.000000038527825e-09\r\n%   zhi =\r\n%        1.000000082740371e-09\r\n%   zlo =\r\n%       -8.274037106125165e-17\r\n\r\n%%\r\n% We can see that |xhi| is the first six hex digits of |1\/3| and |xlo| is\r\n% the remaining digits.  Similarly, |yhi| is the first six hex digits of\r\n% |3e-9| and |ylo| is the remaing digits.  |zhi| is the first six hex digits\r\n% of the |xhi*yhi| and |zlo| is a crucial correction term. \r\n\r\n%%\r\n% Stopping at the end of the second pass through the loop, at statement 20.\r\n%\r\n%   K>> format hex\r\n%   K>> tmp, del, shi, slo\r\n%   tmp =\r\n%      3ff000000044b830\r\n%   del =\r\n%      0000000000000000\r\n%   shi =\r\n%      3ff0000000000000\r\n%   slo =\r\n%      3e112e0be826d694\r\n%   K>> \r\n%   K>> format long e,\r\n%   K>> tmp, del, shi, slo\r\n%   tmp =\r\n%        1.000000001000000e+00\r\n%   del =\r\n%        0\r\n%   shi =\r\n%        1\r\n%   slo =\r\n%        9.999999999999999e-10 \r\n\r\n%%\r\n% |tmp| is |1.0| plus some of the bits of |1.0e-10|.\r\n% |del| is zero because it is not needed in this example.\r\n% It is involved when the terms vary over an even wider range.\r\n% |shi| is exactly |1.0|, which is the high order part of the evolving sum.\r\n% And |slo| has become |1.0e-10| to full double precison accuracy.\r\n\r\n%%\r\n% On the third time through the loop there will be no roundoff errors,\r\n% |shi| will be completely cancelled, and |slo| will bear the full\r\n% responsibility for providing the final answer.\r\n\r\n%%\r\n% Of course, this example is contrived and unusual.  Ordinarily, we can\r\n% expect some cancellation (otherwise there would be no need for an\r\n% accumulated inner product), with the high order part losing at least\r\n% a few digits and the low order part filling them in.\r\n\r\n%% residual3p\r\n% With a dot product in hand, it is easy to write the residual function.\r\n% Here the _extended_ feature is essential because we expect extreme, if not\r\n% total, cancellation when the right hand side is subtracted from the\r\n% dot product.\r\n\r\n   type residual3p\r\n\r\n%%\r\n% You can download this function from\r\n% <https:\/\/blogs.mathworks.com\/images\/cleve\/residual3p.m here>.\r\n\r\n\r\n##### SOURCE END ##### d36202b4d5ba4ce2830716d9fc0b8abc\r\n-->","protected":false},"excerpt":{"rendered":"<!--introduction--><p>Single and double precision are combined to facilitate a triple precision accumulated inner product.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2015\/03\/02\/triple-precision-accumlated-inner-product\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,6,16,7],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/1172"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=1172"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/1172\/revisions"}],"predecessor-version":[{"id":1174,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/1172\/revisions\/1174"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=1172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=1172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=1172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}