Experiments with Variable Format Half Precision
For the past month I have been working with the variable format 16-bit floating point arithmetic that I described in this post. It has been frustrating work. I have found that the limited precision and limited range of half precision make it barely usable for the kind of experiments with matrix computation that I like to do. In this post I will describe a few of these experiments.
I am not saying that half precision floating point is not useful. Organizations including Intel and Google are adopting both the IEEE 754 standard fp16 and the new bfloat16 formats for work in machine learning and image processing, link. I am just not recommending either format for general engineering and scientific computing.
Contents
vfp16
I have revised the help entry for my variable format half precision object. Here is the updated version.
doc vfp16
Here is the list of functions currently available. As you can see, this is far from a complete set. The workhorse functions mldivide (backslash), lu, qr and svd are available, but I have not yet done anything about complex arithmetic, so eig is not on the list.
methods(vfp16)
Methods for class vfp16: abs fix lu plus single tiny binary ge max power size tril ctranspose gt minus precision sqrt triu diag hex mldivide qr subsasgn uint disp horzcat mrdivide realmax subsindex uminus display isfinite mtimes realmin subsref vertcat double le ne rem sum vfp16 eps log2 norm round svd eq lt packed sign times
vfp16 Anatomy
This is a repeat of the chart I had in my previous post, except now I am using the vfp16 object itself to compute the parameters.
- p is the precision, the number of bits in the fraction.
- eps is the distance from 1 to the next larger vfp16 number.
- realmax is the largest vfp16 number.
- realmin is the smallest normalized vfp16 number.
- tiny is the smallest subnormal vfp16 number.
disp(' p eps realmax realmin tiny') for p = 1:15 vfp16('precision',p) x = vfp16(1); y = [eps(x) realmax(x) realmin(x) tiny(x)]; fprintf('%4d %12.4g %12.4g %12.4g %12.4g\n',p,double(y)) end
p eps realmax realmin tiny 1 0.5 Inf 0 0 2 0.25 Inf 0 0 3 0.125 Inf 0 0 4 0.0625 1.742e+308 2.225e-308 1.391e-309 5 0.03125 1.32e+154 2.983e-154 9.323e-156 6 0.01563 1.149e+77 3.454e-77 5.398e-79 7 0.007813 3.39e+38 1.175e-38 9.184e-41 8 0.003906 1.841e+19 2.168e-19 8.47e-22 9 0.001953 4.291e+09 9.313e-10 1.819e-12 10 0.0009766 6.55e+04 6.104e-05 5.96e-08 11 0.0004883 255.9 0.01563 7.629e-06 12 0.0002441 16 0.25 6.104e-05 13 0.0001221 4 1 0.0001221 14 0 2 Inf 0.0001221 15 0 4 -0 0.0001221
For precision equal to 1, 2 or 3 there are 12 or more bits devoted to the exponent and so realmax, realmin and tiny cannot be expressed in MATLAB's conventional double precision arithmetic This effectively means that vfp16 cannot take advantage of the extended range.
On the other hand, for precision equal to 11 or greater, eps is less than realmin, so underflow rather than roundoff dominates accuracy. Especially if gradual underflow is off, computation with these precisions behaves badly. Consequently I keep p in the range 4 <= p <= 10.
Gradual Underflow
Before we had the IEEE standard, code that was portable across different architectures had to discover machine accuracy. One way this was done involved the following computation. Let's use the standard fp16 which has p = 10 and 'subnormals' turned 'on'.
format short vfp16('fp16') a = vfp16(4)/vfp16(3) b = a-1 c = b+b+b e = 1-c
a = 1.3330 b = 0.3330 c = 0.9990 e = 9.7656e-04
With exact arithmetic, b would be 1/3, c would be 1, and e would be zero. But, unless we have a ternary computer, there must be a roundoff error in the division by 3. That's the only roundoff error. So b is not exactly 1/3, c is not exactly 1, and e is equal to the spacing of floating numbers near, but larger than, 1.0. In other words, e is equal to eps(1).
Let's do that again with precision equal to 11 and gradual underflow turned off.
vfp16('precision',11,'subnormals','off') a = vfp16(4)/vfp16(3) b = a-1 c = b+b+b e = 1-c
a = 1.3335 b = 0.3335 c = 1.0005 e = 0
Now the division by 3 rounds up instead of down and c is slightly larger than one, but the computation of e underflows and is flushed to zero. This is a simple example, but it demonstrates the importance of gradual underflow when we have a limited exponent range.
Fused Multiply Add
Fused multiply add makes only a modest difference in the roundoff error in the residual R from LU decomposition of a random 10-by-10 matrix, as measured by number of nonzeros in R.
vfp16('subnormals','on') form = {'fp16','bfloat16'}; fma = {'off','on'}; rng(17) for j = 1:2 vfp16(form{j}) A = vfp16(randn(10,10)); for k = 1:2 vfp16('fma',fma{k}) [L,U,p] = lu(A); R = L*U - A(p,:); subplot(2,2,2*(j-1)+k) spy(double(R)) title([form{j} ', fma ' fma{k}]) end end vfp16('subnormals','on')
QR with Column Pivoting, Frank Matrix
The Frank matrix $F_n$ is an interesting test matrix. Here is $F_6$.
F6 = gallery('frank',6)
F6 = 6 5 4 3 2 1 5 5 4 3 2 1 0 4 4 3 2 1 0 0 3 3 2 1 0 0 0 2 2 1 0 0 0 0 1 1
We are going to compute the QR decomposition with column pivoting and look at $r_n$, the last diagonal element of $R$. It turns out that $F_n$ is close to a matrix of rank $n-1$ and this is forecast by $r_n$ decaying to zero exponentially as $n$ increases. Here is the $R$ from the QR decomposition (with column pivoting) of $F_6$; you can see the beginnings of exponential smallness in the last diagonal element.
[Q,R,E] = qr(F6); R
R = -8.1240 -6.7700 -5.1698 -6.8931 -3.4466 -1.7233 0 -3.8944 0.5136 0.6847 0.3424 0.1712 0 0 3.6068 2.4986 2.7742 1.3871 0 0 0 -1.6652 0.5547 0.2773 0 0 0 0 1.0001 1.0000 0 0 0 0 0 0.0053
When the QR decomposition is computed by our half precision arithmetic this exponential decay cannot be tracked for long. In the following logarithmic plot the black line is $r_n$. The other lines are the values obtained with vfp16 arithmetic of varying precision. Although $r_{15}$ is less than $10^{-10}$, none of the other values is less than $10^{-4}$.
The broken lines are caused by exact zeros which cannot be plotted on a logarithmic plot.
Rn = zeros(1,15); rn = zeros(10,15); for n = 1:15 A = gallery('frank',n); [~,R,E] = qr(A); Rn(n) = R(n,n); for p = 4:10 vfp16('precision',p) X = vfp16(A); [~,r,e] = qrp(X); rn(p,n) = r(n,n); end end xqrp_print xqrp_plot
n 2 3 4 ... 13 14 15 Rn 4.47e-01 2.43e-01 -9.41e-02 ... -1.41e-09 1.09e-10 -7.82e-12 rn p 4 -0.4375 -0.2813 0.0547 ... -0.0547 -0.1875 -0.1406 5 -0.4844 -0.2969 0.0977 ... -0.0332 0.0000 0.0332 6 -0.4297 -0.2266 0.1074 ... -0.0308 0.0154 -0.0308 7 -0.4453 -0.2402 0.1016 ... 0.0000 0.0078 -0.0078 8 -0.4434 -0.2402 0.0879 ... -0.0118 0.0000 0.0000 9 -0.4473 -0.2412 0.0879 ... -0.0020 0.0000 -0.0039 10 -0.4468 -0.2429 0.0952 ... -0.0039 -0.0039 0.0000
SVD, Hilbert Matrix
Now for two players familiar to readers of this blog, the Hilbert matrix and the Singular Value Decomposition. Here is $H_5$.
format rat
H5 = hilb(5)
H5 = 1 1/2 1/3 1/4 1/5 1/2 1/3 1/4 1/5 1/6 1/3 1/4 1/5 1/6 1/7 1/4 1/5 1/6 1/7 1/8 1/5 1/6 1/7 1/8 1/9
The matrix is symmetric, positive definite, so its singular values are equal to its eigenvalues. See how the singular values $H_6$ are decaying rapidly.
format short g H6 = hilb(6); sigma6 = svd(H6)
sigma6 = 1.6189 0.24236 0.016322 0.00061575 1.2571e-05 1.0828e-07
This experiment is like the QR of the Frank matrix, except the logarithmic plot of the smallest singular value is almost perfectly linear and has a steeper slope. You can see that the red line from the IEEE standard with p = 10 does the best job of tracking $\sigma_n$.
sn = zeros(10,13); for n = 2:13 A = hilb(n); S = svd(A); Sn(n) = S(n); for p = 4:10 vfp16('precision',p) X = vfp16(A); s = svd(X); sn(p,n) = s(n); end end xsvd_hilbert_print xsvd_hilbert_plot
n 2 3 4 ... 11 12 13 Sn 6.57e-02 2.69e-03 9.67e-05 ... 3.40e-15 1.11e-16 3.79e-18 sn p 4 6.99e-02 4.81e-03 1.89e-03 ... 4.44e-04 2.11e-04 3.13e-04 5 6.91e-02 1.90e-03 2.24e-03 ... 2.20e-04 2.05e-04 5.45e-05 6 6.56e-02 2.36e-03 0.00e+00 ... 1.18e-04 2.44e-05 1.33e-04 7 6.72e-02 3.09e-03 9.89e-05 ... 1.22e-04 1.70e-05 6.97e-05 8 6.46e-02 2.79e-03 1.74e-04 ... 4.46e-05 2.36e-05 2.43e-05 9 6.59e-02 2.91e-03 3.12e-05 ... 8.70e-06 6.78e-06 1.27e-05 10 6.55e-02 2.61e-03 3.67e-05 ... 9.33e-07 8.26e-06 3.45e-06
SVD, Parter Matrix
Now for something different. I discussed what the gallery calls the Parter matrix five years ago. Here is $P_5$.
format rat P5 = gallery('parter',5)
P5 = 2 -2 -2/3 -2/5 -2/7 2/3 2 -2 -2/3 -2/5 2/5 2/3 2 -2 -2/3 2/7 2/5 2/3 2 -2 2/9 2/7 2/5 2/3 2
Most of the singular value are converging to $\pi$. I'm not sure where the smallest singular value is going, but it is certainly not going to zero exponentially. Now the computations with vfp16 with any precision in the range 4 <= p <= 10 does a good job.
Sn = zeros(1,13); sn = zeros(10,13); for n = 2:13 A = gallery('parter',n); S = svd (A); Sn(n) = S(n); for p = 4:10 vfp16('precision',p) X = vfp16(A); s = svd(X); sn(p,n) = s(n); end end xsvd_parter_print xsvd_parter_plot
n 2 3 4 ... 11 12 13 Sn 1.7370 1.6053 1.5217 ... 1.2814 1.2640 1.2485 sn p 4 1.7430 1.5907 1.4846 ... 1.3287 1.3204 1.3339 5 1.7430 1.5502 1.4985 ... 1.2493 1.2373 1.2223 6 1.7227 1.5997 1.5093 ... 1.2799 1.2582 1.2521 7 1.7327 1.6050 1.5283 ... 1.2874 1.2639 1.2488 8 1.7407 1.6038 1.5229 ... 1.2843 1.2641 1.2478 9 1.7380 1.6040 1.5216 ... 1.2799 1.2628 1.2474 10 1.7361 1.6051 1.5206 ... 1.2822 1.2646 1.2479
Cleve's Laboratory
I will update Cleve_s Laboratory in the MATLAB Central File Exchange to Version 4.30 to include my changes to @vfp16.
- Category:
- Eigenvalues,
- Matrices,
- Numerical Analysis,
- Precision
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.