Triple Precision Accumlated Inner Product

Single and double precision are combined to facilitate a triple precision accumulated inner product.


Iterative refinement

In my previous post on iterative refinement I showed the need for an inner product routine that is more accurate than double precision. This post is about such a function.

Example with double precision

The example I am going to use is contrived so that the first and third terms in the inner product exactly cancel each other, leaving the much smaller second term to arise from the ashes.

   format long e
   x = [1 1/3 1]'
   y = [1 3e-9 -1]'
x =
y =

Computing the inner product with a conventional MATLAB statement shows the intended difficulty.

   dot2p = x'*y
dot2p =

The result should be 1.000000000000000e-09. We're getting only about half of the significant digits correct.

Of course, the problem is that the second intermediate sum is

   s2 = 1 + 1/3*3e-9
s2 =

That's OK in decimal, but not in binary. There is not enough room in one double precision floating point number to store the bits that are going to be needed when that leading one is cancelled by the third term in the sum.

Triple precision

I do not have a full blown triple precision arithmetic package by any means. I have just enough to do this one task. Here is the basic idea. A double precision floating point number has 14 hexadecimal digits in its fraction. I can use single and double to break a double into 6 high order hex digits and 8 low order hex digits, like this.

   format hex
   x = 1/3
   xhi = double(single(x))
   xlo = x - xhi
x =
xhi =
xlo =

Two doubles with more than half of their low order bits equal to zero can be multiplied together with no roundoff error. For example

   pihi = double(single(pi))
   pio3hi = xhi*pihi
pihi =
pio3hi =

That trailing zero in pio3hi is an indication that the result is exact.

Additions are not exact, when the two numbers differ by several orders of magnitude. This fact will eventually be the limiting factor of our inner product routine.


My inner production routine is both accumulated, which means it uses extra precise arithmetic, and extended, which means an extra scalar is added to the result using the extra precision. You can download this function from here.

   dbtype dot3p
1     function s = dot3p(x,y,s)
2     % DOT3P s = dot3p(x,y,s)  Triple precision extended inner product.
3     % s = x'*y + s for vectors x and y and scalar s.
5        shi = double(single(s));
6        slo = s - shi;
7        for k = 1:length(x)
8           xhi = double(single(x(k)));
9           xlo = x(k) - xhi;
10          yhi = double(single(y(k)));
11          ylo = y(k) - yhi;
12          tmp = xhi*yhi;
13          zhi = double(single(tmp));
14          zlo = tmp - zhi + xhi*ylo + xlo*yhi + xlo*ylo;
16          tmp = shi + zhi;
17          del = tmp - shi - zhi;
18          shi = double(single(tmp));
19          slo = tmp - shi + slo + zlo - del;
20       end
21       s = shi +  slo;

Example with triple precision

Let's run my example with dot3p in the debugger and look at some intermediate results.

%  dbstop 16 dot3p
%  dbstop 20 dot3p
%  dot3p(x,y,0)

The variables shi and slo carry the sum in triple precision. The first time through the loop there are no roundoff errors, and shi and slo are set to 1.0 and 0.0.

Let's look at the second pass through the loop. Halfway through, at statement 16.

K>> format hex
K>> xhi,xlo,yhi,ylo,tmp,zhi,zlo
xhi =
xlo =
yhi =
ylo =
tmp =
zhi =
zlo =
K>> format long e
K>> xhi,xlo,yhi,ylo,tmp,zhi,zlo
xhi =
xlo =
yhi =
ylo =
tmp =
zhi =
zlo =

We can see that xhi is the first six hex digits of 1/3 and xlo is the remaining digits. Similarly, yhi is the first six hex digits of 3e-9 and ylo is the remaing digits. zhi is the first six hex digits of the xhi*yhi and zlo is a crucial correction term.

Stopping at the end of the second pass through the loop, at statement 20.

K>> format hex
K>> tmp, del, shi, slo
tmp =
del =
shi =
slo =
K>> format long e,
K>> tmp, del, shi, slo
tmp =
del =
shi =
slo =

tmp is 1.0 plus some of the bits of 1.0e-10. del is zero because it is not needed in this example. It is involved when the terms vary over an even wider range. shi is exactly 1.0, which is the high order part of the evolving sum. And slo has become 1.0e-10 to full double precison accuracy.

On the third time through the loop there will be no roundoff errors, shi will be completely cancelled, and slo will bear the full responsibility for providing the final answer.

Of course, this example is contrived and unusual. Ordinarily, we can expect some cancellation (otherwise there would be no need for an accumulated inner product), with the high order part losing at least a few digits and the low order part filling them in.


With a dot product in hand, it is easy to write the residual function. Here the extended feature is essential because we expect extreme, if not total, cancellation when the right hand side is subtracted from the dot product.

   type residual3p
function r = residual3p(A,x,b)
% RESIDUAL3p Triple precision residual, A*x - b.
% r = residual3p(A,x,b) for matrix A, vectors x and b. 

   m = size(A,1);
   r = zeros(m,1);
   for k = 1:m
      r(k) = dot3p(A(k,:),x,-b(k));


You can download this function from here.

Published with MATLAB® R2014b

  • print
  • send email


To leave a comment, please click here to sign in to your MathWorks Account or create a new one.