{"id":2918,"date":"2017-12-20T16:26:14","date_gmt":"2017-12-20T21:26:14","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=2918"},"modified":"2017-12-20T16:26:48","modified_gmt":"2017-12-20T21:26:48","slug":"bug-in-half-precision-floating-point-object","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2017\/12\/20\/bug-in-half-precision-floating-point-object\/","title":{"rendered":"Bug in Half-Precision Floating Point Object"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p><a href=\"https:\/\/blogs.mathworks.com\/cleve\/2017\/05\/08\">My post on May 8<\/a> was about \"half-precision\" and \"quarter-precision\" arithmetic.  I also added code for objects <tt>fp16<\/tt> and <tt>fp8<\/tt> to <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/59085-cleve-laboratory\">Cleve's Laboratory<\/a>.  A few days ago I heard from Pierre Blanchard and my good friend Nick Higham at the University of Manchester about a serious bug in the constructors for those objects.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#2f23d356-a278-4abb-ae44-8f72b3e50754\">The Bug<\/a><\/li><li><a href=\"#c7bb46ab-0ebe-4c24-804f-f2482db5d65d\">The Fix<\/a><\/li><li><a href=\"#b2c2b088-309a-4313-ab70-d32ed1ddc9f5\">Cleve's Laboratory<\/a><\/li><li><a href=\"#fa618890-6610-453d-8b28-c018a6874b2e\">Thanks<\/a><\/li><\/ul><\/div><h4>The Bug<a name=\"2f23d356-a278-4abb-ae44-8f72b3e50754\"><\/a><\/h4><p>Let<\/p><pre class=\"codeinput\">    format <span class=\"string\">long<\/span>\r\n    e = eps(fp16(1))\r\n<\/pre><pre class=\"codeoutput\"> \r\ne = \r\n \r\n     9.765625000000000e-04\r\n \r\n<\/pre><p>The value of <tt>e<\/tt> is<\/p><pre class=\"codeinput\">    1\/1024\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n     9.765625000000000e-04\r\n<\/pre><p>This is the relative precision of half-precision floating point numbers, which is the spacing of half-precision numbers in the interval between 1 and 2.  So in binary the next number after 1 is<\/p><pre class=\"codeinput\">    disp(binary(1+e))\r\n<\/pre><pre class=\"codeoutput\">0 01111 0000000001\r\n<\/pre><p>And the last number before 2 is<\/p><pre class=\"codeinput\">    disp(binary(2-e))\r\n<\/pre><pre class=\"codeoutput\">0 01111 1111111111\r\n<\/pre><p>The three fields displayed are the sign, which is one bit, the exponent, which has five bits and the fraction, which has ten bits.<\/p><p>So far, so good.  The bug shows up when I try to convert any number between <tt>2-e<\/tt> and <tt>2<\/tt> to half-precision.  There aren't any more half-precision numbers between those limits.  The values in the lower half of the interval should round down to <tt>2-e<\/tt> and the values in the upper half should round up to <tt>2<\/tt>.  The round-to-even convention says that the midpoint, <tt>2-e\/2<\/tt>, should round to <tt>2<\/tt>.<\/p><p>But I wasn't careful about how I did the rounding. I just used the MATLAB <tt>round<\/tt> function, which doesn't follow the round-to-even convention.  Worse yet, I didn't check to see if the fraction rounded up into the exponent field.  I tried to do everything in one statement.<\/p><pre class=\"codeinput\">   dbtype <span class=\"string\">oldfp16<\/span> <span class=\"string\">48:49<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n48                    u = bitxor(uint16(round(1024*f)), ...\r\n49                        bitshift(uint16(e+15),10));\r\n<\/pre><p>For values between <tt>2-e\/2<\/tt> and <tt>2<\/tt>,  the <tt>round(1024*f)<\/tt> is <tt>1024<\/tt>, which requires eleven bits.  The <tt>bitxor<\/tt> then clobbers the exponent field. I won't show the result here.  If you have the May half-precision object on your machine, give it a try.<\/p><p>This doesn't just happen for values a little bit less than 2, it happens close to any power of 2.<\/p><h4>The Fix<a name=\"c7bb46ab-0ebe-4c24-804f-f2482db5d65d\"><\/a><\/h4><p>We need a proper round-to-nearest-even function.<\/p><pre class=\"codeinput\">   dbtype <span class=\"string\">fp16<\/span> <span class=\"string\">31<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n31            rndevn = @(s) round(s-(rem(s,2)==0.5));\r\n<\/pre><p>Then don't try to do it all at once.<\/p><pre class=\"codeinput\">   dbtype <span class=\"string\">fp16<\/span> <span class=\"string\">50:56<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n50                    % normal\r\n51                    t = uint16(rndevn(1024*f));\r\n52                    if t==1024\r\n53                        t = uint16(0);\r\n54                        e = e+1;\r\n55                    end\r\n56                    u = bitxor(t, bitshift(uint16(e+15),10));\r\n<\/pre><p>It turns out that the branch for denormals is OK, once <tt>round<\/tt> is replaced by <tt>rndeven<\/tt>.  The exponent for denormals is all zeros, so when the fraction encroaches it produces the correct result.<\/p><p>A similar fix is required for the quarter-precision constructor, <tt>fp8<\/tt>.<\/p><h4>Cleve's Laboratory<a name=\"b2c2b088-309a-4313-ab70-d32ed1ddc9f5\"><\/a><\/h4><p>I am updating the code on the <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/59085-cleve-laboratory\">MATLAB Central File Exchange<\/a>.  Only <tt>@fp16\/fp16<\/tt> and <tt>@fp8\/fp8<\/tt> are affected.  (Give me a few days to complete the update process.)<\/p><h4>Thanks<a name=\"fa618890-6610-453d-8b28-c018a6874b2e\"><\/a><\/h4><p>Thanks to Pierre and Nick.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_8d4defc65bb14c19941eaea060051666() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='8d4defc65bb14c19941eaea060051666 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 8d4defc65bb14c19941eaea060051666';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_8d4defc65bb14c19941eaea060051666()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2017a<br><\/p><\/div><!--\r\n8d4defc65bb14c19941eaea060051666 ##### SOURCE BEGIN #####\r\n%% Bug in Half-Precision Floating Point Object\r\n% <https:\/\/blogs.mathworks.com\/cleve\/2017\/05\/08\r\n% My post on May 8> was about \"half-precision\" and \"quarter-precision\"\r\n% arithmetic.  I also added code for objects |fp16| and |fp8| to\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/59085-cleve-laboratory\r\n% Cleve's Laboratory>.  A few days ago I heard from Pierre Blanchard and my\r\n% good friend Nick Higham at the University of Manchester about a serious\r\n% bug in the constructors for those objects.\r\n\r\n%% The Bug\r\n% Let\r\n\r\n    format long\r\n    e = eps(fp16(1))\r\n    \r\n%%    \r\n% The value of |e| is\r\n\r\n    1\/1024\r\n\r\n%%\r\n% This is the relative precision of half-precision floating point\r\n% numbers, which is the spacing of half-precision numbers in the interval\r\n% between 1 and 2.  So in binary the next number after 1 is\r\n\r\n    disp(binary(1+e))\r\n    \r\n%%\r\n% And the last number before 2 is\r\n\r\n    disp(binary(2-e))\r\n    \r\n%%\r\n% The three fields displayed are the sign, which is one bit, the exponent,\r\n% which has five bits and the fraction, which has ten bits.\r\n\r\n%%\r\n% So far, so good.  The bug shows up when I try to convert any\r\n% number between |2-e| and |2| to half-precision.  There aren't any more\r\n% half-precision numbers between those limits.  The values in the lower\r\n% half of the interval should round down to |2-e| and the values in the\r\n% upper half should round up to |2|.  The round-to-even convention\r\n% says that the midpoint, |2-e\/2|, should round to |2|.\r\n\r\n%%\r\n% But I wasn't careful about how I did the rounding.\r\n% I just used the MATLAB |round| function, which doesn't follow the\r\n% round-to-even convention.  Worse yet, I didn't check to see if the\r\n% fraction rounded up into the exponent field.  I tried to do\r\n% everything in one statement.\r\n\r\n   dbtype oldfp16 48:49\r\n   \r\n%%\r\n% For values between |2-e\/2| and |2|,  the |round(1024*f)| is |1024|, which\r\n% requires eleven bits.  The |bitxor| then clobbers the exponent field.\r\n% I won't show the result here.  If you have the May half-precision object\r\n% on your machine, give it a try.\r\n\r\n%%\r\n% This doesn't just happen for values a little bit less than 2, it \r\n% happens close to any power of 2. \r\n\r\n%% The Fix\r\n% We need a proper round-to-nearest-even function.\r\n\r\n   dbtype fp16 31\r\n   \r\n%%\r\n% Then don't try to do it all at once.\r\n\r\n   dbtype fp16 50:56\r\n   \r\n%%\r\n% It turns out that the branch for denormals is OK, once |round| is\r\n% replaced by |rndeven|.  The exponent for denormals is all zeros, so\r\n% when the fraction encroaches it produces the correct result.\r\n\r\n%%\r\n% A similar fix is required for the quarter-precision constructor, |fp8|.\r\n\r\n%% Cleve's Laboratory\r\n% I am updating the code on the\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/59085-cleve-laboratory\r\n% MATLAB Central File Exchange>.  Only |@fp16\/fp16| and |@fp8\/fp8| are\r\n% affected.  (Give me a few days to complete the update process.)\r\n\r\n%% Thanks\r\n% Thanks to Pierre and Nick.\r\n\r\n##### SOURCE END ##### 8d4defc65bb14c19941eaea060051666\r\n-->","protected":false},"excerpt":{"rendered":"<!--introduction--><p><a href=\"https:\/\/blogs.mathworks.com\/cleve\/2017\/05\/08\">My post on May 8<\/a> was about \"half-precision\" and \"quarter-precision\" arithmetic.  I also added code for objects <tt>fp16<\/tt> and <tt>fp8<\/tt> to <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/59085-cleve-laboratory\">Cleve's Laboratory<\/a>.  A few days ago I heard from Pierre Blanchard and my good friend Nick Higham at the University of Manchester about a serious bug in the constructors for those objects.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2017\/12\/20\/bug-in-half-precision-floating-point-object\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,7],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/2918"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=2918"}],"version-history":[{"count":1,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/2918\/revisions"}],"predecessor-version":[{"id":2920,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/2918\/revisions\/2920"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=2918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=2918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=2918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}