Experiments with Variable Format Half Precision

Posted by Cleve Moler,

For the past month I have been working with the variable format 16-bit floating point arithmetic that I described in this post. It has been frustrating work. I have found that the limited precision and limited range of half precision make it barely usable for the kind of experiments with matrix computation that I like to do. In this post I will describe a few of these experiments.

I am not saying that half precision floating point is not useful. Organizations including Intel and Google are adopting both the IEEE 754 standard fp16 and the new bfloat16 formats for work in machine learning and image processing, link. I am just not recommending either format for general engineering and scientific computing.

Contents

vfp16

I have revised the help entry for my variable format half precision object. Here is the updated version.

doc vfp16

Here is the list of functions currently available. As you can see, this is far from a complete set. The workhorse functions mldivide (backslash), lu, qr and svd are available, but I have not yet done anything about complex arithmetic, so eig is not on the list.

   methods(vfp16)
Methods for class vfp16:

abs         fix         lu          plus        single      tiny        
binary      ge          max         power       size        tril        
ctranspose  gt          minus       precision   sqrt        triu        
diag        hex         mldivide    qr          subsasgn    uint        
disp        horzcat     mrdivide    realmax     subsindex   uminus      
display     isfinite    mtimes      realmin     subsref     vertcat     
double      le          ne          rem         sum         vfp16       
eps         log2        norm        round       svd         
eq          lt          packed      sign        times       

vfp16 Anatomy

This is a repeat of the chart I had in my previous post, except now I am using the vfp16 object itself to compute the parameters.

  • p is the precision, the number of bits in the fraction.
  • eps is the distance from 1 to the next larger vfp16 number.
  • realmax is the largest vfp16 number.
  • realmin is the smallest normalized vfp16 number.
  • tiny is the smallest subnormal vfp16 number.
    disp('   p          eps      realmax      realmin         tiny')
    for p = 1:15
        vfp16('precision',p)
        x = vfp16(1);
        y = [eps(x) realmax(x) realmin(x) tiny(x)];
        fprintf('%4d %12.4g %12.4g %12.4g %12.4g\n',p,double(y))
    end
   p          eps      realmax      realmin         tiny
   1          0.5          Inf            0            0
   2         0.25          Inf            0            0
   3        0.125          Inf            0            0
   4       0.0625   1.742e+308   2.225e-308   1.391e-309
   5      0.03125    1.32e+154   2.983e-154   9.323e-156
   6      0.01563    1.149e+77    3.454e-77    5.398e-79
   7     0.007813     3.39e+38    1.175e-38    9.184e-41
   8     0.003906    1.841e+19    2.168e-19     8.47e-22
   9     0.001953    4.291e+09    9.313e-10    1.819e-12
  10    0.0009766     6.55e+04    6.104e-05     5.96e-08
  11    0.0004883        255.9      0.01563    7.629e-06
  12    0.0002441           16         0.25    6.104e-05
  13    0.0001221            4            1    0.0001221
  14            0            2          Inf    0.0001221
  15            0            4           -0    0.0001221

For precision equal to 1, 2 or 3 there are 12 or more bits devoted to the exponent and so realmax, realmin and tiny cannot be expressed in MATLAB's conventional double precision arithmetic This effectively means that vfp16 cannot take advantage of the extended range.

On the other hand, for precision equal to 11 or greater, eps is less than realmin, so underflow rather than roundoff dominates accuracy. Especially if gradual underflow is off, computation with these precisions behaves badly. Consequently I keep p in the range 4 <= p <= 10.

Gradual Underflow

Before we had the IEEE standard, code that was portable across different architectures had to discover machine accuracy. One way this was done involved the following computation. Let's use the standard fp16 which has p = 10 and 'subnormals' turned 'on'.

   format short
   vfp16('fp16')
   a = vfp16(4)/vfp16(3)
   b = a-1
   c = b+b+b
   e = 1-c
 
a = 
 
    1.3330
 
 
b = 
 
    0.3330
 
 
c = 
 
    0.9990
 
 
e = 
 
   9.7656e-04
 

With exact arithmetic, b would be 1/3, c would be 1, and e would be zero. But, unless we have a ternary computer, there must be a roundoff error in the division by 3. That's the only roundoff error. So b is not exactly 1/3, c is not exactly 1, and e is equal to the spacing of floating numbers near, but larger than, 1.0. In other words, e is equal to eps(1).

Let's do that again with precision equal to 11 and gradual underflow turned off.

   vfp16('precision',11,'subnormals','off')
   a = vfp16(4)/vfp16(3)
   b = a-1
   c = b+b+b
   e = 1-c
 
a = 
 
    1.3335
 
 
b = 
 
    0.3335
 
 
c = 
 
    1.0005
 
 
e = 
 
     0
 

Now the division by 3 rounds up instead of down and c is slightly larger than one, but the computation of e underflows and is flushed to zero. This is a simple example, but it demonstrates the importance of gradual underflow when we have a limited exponent range.

Fused Multiply Add

Fused multiply add makes only a modest difference in the roundoff error in the residual R from LU decomposition of a random 10-by-10 matrix, as measured by number of nonzeros in R.

    vfp16('subnormals','on')
    form = {'fp16','bfloat16'};
    fma = {'off','on'};
    rng(17)
    for j = 1:2
        vfp16(form{j})
        A = vfp16(randn(10,10));
        for k = 1:2
            vfp16('fma',fma{k})

            [L,U,p] = lu(A);
            R = L*U - A(p,:);

            subplot(2,2,2*(j-1)+k)
            spy(double(R))
            title([form{j} ', fma ' fma{k}])
        end
    end
    vfp16('subnormals','on')

QR with Column Pivoting, Frank Matrix

The Frank matrix $F_n$ is an interesting test matrix. Here is $F_6$.

    F6 = gallery('frank',6)
F6 =
     6     5     4     3     2     1
     5     5     4     3     2     1
     0     4     4     3     2     1
     0     0     3     3     2     1
     0     0     0     2     2     1
     0     0     0     0     1     1

We are going to compute the QR decomposition with column pivoting and look at $r_n$, the last diagonal element of $R$. It turns out that $F_n$ is close to a matrix of rank $n-1$ and this is forecast by $r_n$ decaying to zero exponentially as $n$ increases. Here is the $R$ from the QR decomposition (with column pivoting) of $F_6$; you can see the beginnings of exponential smallness in the last diagonal element.

    [Q,R,E] = qr(F6);
    R
R =
   -8.1240   -6.7700   -5.1698   -6.8931   -3.4466   -1.7233
         0   -3.8944    0.5136    0.6847    0.3424    0.1712
         0         0    3.6068    2.4986    2.7742    1.3871
         0         0         0   -1.6652    0.5547    0.2773
         0         0         0         0    1.0001    1.0000
         0         0         0         0         0    0.0053

When the QR decomposition is computed by our half precision arithmetic this exponential decay cannot be tracked for long. In the following logarithmic plot the black line is $r_n$. The other lines are the values obtained with vfp16 arithmetic of varying precision. Although $r_{15}$ is less than $10^{-10}$, none of the other values is less than $10^{-4}$.

The broken lines are caused by exact zeros which cannot be plotted on a logarithmic plot.

    Rn = zeros(1,15);
    rn = zeros(10,15);
    for n = 1:15

        A = gallery('frank',n);
        [~,R,E] = qr(A);
        Rn(n) = R(n,n);

        for p = 4:10
            vfp16('precision',p)
            X = vfp16(A);
            [~,r,e] = qrp(X);
            rn(p,n) = r(n,n);
        end
    end

    xqrp_print
    xqrp_plot
  n     2         3         4    ...     13        14        15

 Rn 4.47e-01  2.43e-01 -9.41e-02 ... -1.41e-09  1.09e-10 -7.82e-12

    rn
  p
  4 -0.4375   -0.2813    0.0547  ...  -0.0547   -0.1875   -0.1406
  5 -0.4844   -0.2969    0.0977  ...  -0.0332    0.0000    0.0332
  6 -0.4297   -0.2266    0.1074  ...  -0.0308    0.0154   -0.0308
  7 -0.4453   -0.2402    0.1016  ...   0.0000    0.0078   -0.0078
  8 -0.4434   -0.2402    0.0879  ...  -0.0118    0.0000    0.0000
  9 -0.4473   -0.2412    0.0879  ...  -0.0020    0.0000   -0.0039
 10 -0.4468   -0.2429    0.0952  ...  -0.0039   -0.0039    0.0000

SVD, Hilbert Matrix

Now for two players familiar to readers of this blog, the Hilbert matrix and the Singular Value Decomposition. Here is $H_5$.

    format rat
    H5 = hilb(5)
H5 =
       1              1/2            1/3            1/4            1/5     
       1/2            1/3            1/4            1/5            1/6     
       1/3            1/4            1/5            1/6            1/7     
       1/4            1/5            1/6            1/7            1/8     
       1/5            1/6            1/7            1/8            1/9     

The matrix is symmetric, positive definite, so its singular values are equal to its eigenvalues. See how the singular values $H_6$ are decaying rapidly.

    format short g
    H6 = hilb(6);
    sigma6 = svd(H6)
sigma6 =
       1.6189
      0.24236
     0.016322
   0.00061575
   1.2571e-05
   1.0828e-07

This experiment is like the QR of the Frank matrix, except the logarithmic plot of the smallest singular value is almost perfectly linear and has a steeper slope. You can see that the red line from the IEEE standard with p = 10 does the best job of tracking $\sigma_n$.

    sn = zeros(10,13);
    for n = 2:13

        A = hilb(n);
        S = svd(A);
        Sn(n) = S(n);

        for p = 4:10
            vfp16('precision',p)
            X = vfp16(A);
            s = svd(X);
            sn(p,n) = s(n);
        end
    end

    xsvd_hilbert_print
    xsvd_hilbert_plot
  n       2         3         4    ...     11        12        13

 Sn  6.57e-02  2.69e-03  9.67e-05  ... 3.40e-15  1.11e-16  3.79e-18

    sn
  p
  4  6.99e-02  4.81e-03  1.89e-03  ... 4.44e-04  2.11e-04  3.13e-04
  5  6.91e-02  1.90e-03  2.24e-03  ... 2.20e-04  2.05e-04  5.45e-05
  6  6.56e-02  2.36e-03  0.00e+00  ... 1.18e-04  2.44e-05  1.33e-04
  7  6.72e-02  3.09e-03  9.89e-05  ... 1.22e-04  1.70e-05  6.97e-05
  8  6.46e-02  2.79e-03  1.74e-04  ... 4.46e-05  2.36e-05  2.43e-05
  9  6.59e-02  2.91e-03  3.12e-05  ... 8.70e-06  6.78e-06  1.27e-05
 10  6.55e-02  2.61e-03  3.67e-05  ... 9.33e-07  8.26e-06  3.45e-06

SVD, Parter Matrix

Now for something different. I discussed what the gallery calls the Parter matrix five years ago. Here is $P_5$.

    format rat
    P5 = gallery('parter',5)
P5 =
       2             -2             -2/3           -2/5           -2/7     
       2/3            2             -2             -2/3           -2/5     
       2/5            2/3            2             -2             -2/3     
       2/7            2/5            2/3            2             -2       
       2/9            2/7            2/5            2/3            2       

Most of the singular value are converging to $\pi$. I'm not sure where the smallest singular value is going, but it is certainly not going to zero exponentially. Now the computations with vfp16 with any precision in the range 4 <= p <= 10 does a good job.

    Sn = zeros(1,13);
    sn = zeros(10,13);
    for n = 2:13

        A = gallery('parter',n);
        S = svd (A);
        Sn(n) = S(n);

        for p = 4:10
            vfp16('precision',p)
            X = vfp16(A);
            s = svd(X);
            sn(p,n) = s(n);
        end
    end

    xsvd_parter_print
    xsvd_parter_plot
  n       2         3         4    ...     11        12        13

 Sn    1.7370    1.6053    1.5217  ...   1.2814    1.2640    1.2485

    sn
  p
  4    1.7430    1.5907    1.4846  ...   1.3287    1.3204    1.3339
  5    1.7430    1.5502    1.4985  ...   1.2493    1.2373    1.2223
  6    1.7227    1.5997    1.5093  ...   1.2799    1.2582    1.2521
  7    1.7327    1.6050    1.5283  ...   1.2874    1.2639    1.2488
  8    1.7407    1.6038    1.5229  ...   1.2843    1.2641    1.2478
  9    1.7380    1.6040    1.5216  ...   1.2799    1.2628    1.2474
 10    1.7361    1.6051    1.5206  ...   1.2822    1.2646    1.2479

Cleve's Laboratory

I will update Cleve_s Laboratory in the MATLAB Central File Exchange to Version 4.30 to include my changes to @vfp16.


Get the MATLAB code

Published with MATLAB® R2018b

21 views (last 30 days)  | |

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.