Single and double precision are combined to facilitate a triple precision accumulated inner product.
In my previous post on iterative refinement I showed the need for an inner product routine that is more accurate than double precision. This post is about such a function.
The example I am going to use is contrived so that the first and third terms in the inner product exactly cancel each other, leaving the much smaller second term to arise from the ashes.
format long e x = [1 1/3 1]' y = [1 3e-9 -1]'
x = 1.000000000000000e+00 3.333333333333333e-01 1.000000000000000e+00 y = 1.000000000000000e+00 3.000000000000000e-09 -1.000000000000000e+00
Computing the inner product with a conventional MATLAB statement shows the intended difficulty.
dot2p = x'*y
dot2p = 1.000000082740371e-09
The result should be 1.000000000000000e-09. We're getting only about half of the significant digits correct.
Of course, the problem is that the second intermediate sum is
s2 = 1 + 1/3*3e-9
s2 = 1.000000001000000e+00
That's OK in decimal, but not in binary. There is not enough room in one double precision floating point number to store the bits that are going to be needed when that leading one is cancelled by the third term in the sum.
I do not have a full blown triple precision arithmetic package by any means. I have just enough to do this one task. Here is the basic idea. A double precision floating point number has 14 hexadecimal digits in its fraction. I can use single and double to break a double into 6 high order hex digits and 8 low order hex digits, like this.
format hex x = 1/3 xhi = double(single(x)) xlo = x - xhi
x = 3fd5555555555555 xhi = 3fd5555560000000 xlo = be45555556000000
Two doubles with more than half of their low order bits equal to zero can be multiplied together with no roundoff error. For example
pihi = double(single(pi)) pio3hi = xhi*pihi
pihi = 400921fb60000000 pio3hi = 3ff0c1524860a920
That trailing zero in pio3hi is an indication that the result is exact.
Additions are not exact, when the two numbers differ by several orders of magnitude. This fact will eventually be the limiting factor of our inner product routine.
My inner production routine is both accumulated, which means it uses extra precise arithmetic, and extended, which means an extra scalar is added to the result using the extra precision. You can download this function from here.
1 function s = dot3p(x,y,s) 2 % DOT3P s = dot3p(x,y,s) Triple precision extended inner product. 3 % s = x'*y + s for vectors x and y and scalar s. 4 5 shi = double(single(s)); 6 slo = s - shi; 7 for k = 1:length(x) 8 xhi = double(single(x(k))); 9 xlo = x(k) - xhi; 10 yhi = double(single(y(k))); 11 ylo = y(k) - yhi; 12 tmp = xhi*yhi; 13 zhi = double(single(tmp)); 14 zlo = tmp - zhi + xhi*ylo + xlo*yhi + xlo*ylo; 15 16 tmp = shi + zhi; 17 del = tmp - shi - zhi; 18 shi = double(single(tmp)); 19 slo = tmp - shi + slo + zlo - del; 20 end 21 s = shi + slo;
Let's run my example with dot3p in the debugger and look at some intermediate results.
% dbstop 16 dot3p % dbstop 20 dot3p % dot3p(x,y,0)
The variables shi and slo carry the sum in triple precision. The first time through the loop there are no roundoff errors, and shi and slo are set to 1.0 and 0.0.
Let's look at the second pass through the loop. Halfway through, at statement 16.
K>> format hex K>> xhi,xlo,yhi,ylo,tmp,zhi,zlo xhi = 3fd5555560000000 xlo = be45555556000000 yhi = 3e29c511e0000000 ylo = bc7e2df108000000 tmp = 3e112e0bf341b0a0 zhi = 3e112e0c00000000 zlo = bc97d9296b9a0d85 K>> K>> format long e K>> xhi,xlo,yhi,ylo,tmp,zhi,zlo xhi = 3.333333432674408e-01 xlo = -9.934107481068821e-09 yhi = 3.000000026176508e-09 ylo = -2.617650809219019e-17 tmp = 1.000000038527825e-09 zhi = 1.000000082740371e-09 zlo = -8.274037106125165e-17
We can see that xhi is the first six hex digits of 1/3 and xlo is the remaining digits. Similarly, yhi is the first six hex digits of 3e-9 and ylo is the remaing digits. zhi is the first six hex digits of the xhi*yhi and zlo is a crucial correction term.
Stopping at the end of the second pass through the loop, at statement 20.
K>> format hex K>> tmp, del, shi, slo tmp = 3ff000000044b830 del = 0000000000000000 shi = 3ff0000000000000 slo = 3e112e0be826d694 K>> K>> format long e, K>> tmp, del, shi, slo tmp = 1.000000001000000e+00 del = 0 shi = 1 slo = 9.999999999999999e-10
tmp is 1.0 plus some of the bits of 1.0e-10. del is zero because it is not needed in this example. It is involved when the terms vary over an even wider range. shi is exactly 1.0, which is the high order part of the evolving sum. And slo has become 1.0e-10 to full double precison accuracy.
On the third time through the loop there will be no roundoff errors, shi will be completely cancelled, and slo will bear the full responsibility for providing the final answer.
Of course, this example is contrived and unusual. Ordinarily, we can expect some cancellation (otherwise there would be no need for an accumulated inner product), with the high order part losing at least a few digits and the low order part filling them in.
With a dot product in hand, it is easy to write the residual function. Here the extended feature is essential because we expect extreme, if not total, cancellation when the right hand side is subtracted from the dot product.
function r = residual3p(A,x,b) % RESIDUAL3p Triple precision residual, A*x - b. % r = residual3p(A,x,b) for matrix A, vectors x and b. m = size(A,1); r = zeros(m,1); for k = 1:m r(k) = dot3p(A(k,:),x,-b(k)); end end
You can download this function from here.
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.