{"id":589,"date":"2013-04-29T12:00:49","date_gmt":"2013-04-29T17:00:49","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=589"},"modified":"2013-05-02T09:56:42","modified_gmt":"2013-05-02T14:56:42","slug":"pentium-division-bug","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2013\/04\/29\/pentium-division-bug\/","title":{"rendered":"Pentium Division Bug Revisited"},"content":{"rendered":"\r\n<div class=\"content\"><!--introduction--><p>The Pentium division bug episode in the fall of 1994 was a defining moment for the MathWorks, for the Internet, for Intel Corporation, and for me personally.  In this blog I am reprinting the article that I wrote for the Winter 1995 issue of MATLAB News and Notes and for SIAM News.  In my next blog I want to discuss the episode's impact.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#a46fa584-a8d4-4639-9567-076d1e81b389\">A Tale of Two Numbers (Reprinted from MATLAB News and Notes, Winter 1995)<\/a><\/li><li><a href=\"#34552650-affd-4c89-ba72-1167094318c6\">Thomas Nicely<\/a><\/li><li><a href=\"#9a1a7494-2828-47d2-b7c1-10e5d0155efb\">Alex Wolfe<\/a><\/li><li><a href=\"#a3e80685-b94c-4527-955b-7a904b127178\">Terje Mathisen<\/a><\/li><li><a href=\"#d55a83b9-dc33-4a74-93f5-d10fb65fbc04\">Tim Coe<\/a><\/li><li><a href=\"#5f737c9c-31a5-48fb-8935-54c509115b09\">Press Coverage<\/a><\/li><li><a href=\"#1baa927c-85e7-4f40-a1f3-4096b18ee8aa\">Peter Tang<\/a><\/li><li><a href=\"#c7388acb-ae3c-496c-912d-e71c58a4d8d8\">SRT Algorithm<\/a><\/li><li><a href=\"#d0ff0ffb-3973-49a7-b1ac-0e62b14ddc9b\">Our Workaround<\/a><\/li><li><a href=\"#5037fa8b-05f7-4bdb-818f-126e38a119cb\">Pentium-Safe Division<\/a><\/li><\/ul><\/div><h4>A Tale of Two Numbers (Reprinted from MATLAB News and Notes, Winter 1995)<a name=\"a46fa584-a8d4-4639-9567-076d1e81b389\"><\/a><\/h4><p>This is the tale of two numbers, and how they found their way over the Internet to the world's front pages on Thanksgiving day, embarrassing the world's premier computer chip manufacturer.<\/p><h4>Thomas Nicely<a name=\"34552650-affd-4c89-ba72-1167094318c6\"><\/a><\/h4><p>The first number is a twelve digit integer<\/p><p>$$ p  =  824633702441 $$<\/p><p>Thomas Nicely of Lynchburg College in Virginia is interested in this number because both <i>p<\/i> and <i>p+2<\/i> are prime. Two consecutive odd numbers that are both prime are called <i>twin primes<\/i>. Nicely's research involves the distribution of twin primes and, in particular, the sum of their reciprocals,<\/p><p>$$ S = 1\/5 + 1\/7 + ... + 1\/29 + 1\/31 + ... + 1\/p + 1\/(p+2) + ... $$<\/p><p>It is known that this infinite sum has a finite value, but nobody knows what that value is.  For over a year, Nicely has been carrying out various computations involving twin, triple and quadruple primes. One of his objectives has been to demonstrate that today's PCs are just as effective as supercomputers for this kind of research. He was using half a dozen different computers in his work until March, when he added a machine based on Intel Corporation's Pentium chip to this collection.<\/p><p>In June, Nicely noticed inconsistencies among several different quantities he was computing.  He traced it to the values he was getting for $1\/p$ and $1\/(p+2)$.  Although he was using double precision, they were accurate only to roughly single precision. He blamed at first on his own programs, then the compiler and operating system, then the bus and motherboard in his machine. By late October though, he was convinced that the difficulty was in the chip itself.<\/p><h4>Alex Wolfe<a name=\"9a1a7494-2828-47d2-b7c1-10e5d0155efb\"><\/a><\/h4><p>On October 30, in an e-mail message to several other people who had access to Pentium systems, Nicely said that he thought there was a bug in the processor's floating point arithmetic unit.  One of the recipients of Nicely's memo posted it on the CompuServe network. Alex Wolfe, a reporter for the EE Times, spotted the posting and forwarded it to Terje Mathisen.<\/p><h4>Terje Mathisen<a name=\"a3e80685-b94c-4527-955b-7a904b127178\"><\/a><\/h4><p>Terje Mathisen, a computer jock at Norsk Hydro in Oslo, Norway, has published articles on programming the Pentium, and posted notes on the Internet about the accuracy of its transcendental functions. Within hours of receiving Wolfe's query, Mathisen confirmed Nicely's example, wrote a short, assembly language test program, and, on November 3, initiated a chain of Internet postings in newsgroup <tt>comp.sys.intel<\/tt> about the FDIV bug.  (FDIV is the floating point divide instruction on the Pentium.)  A day later, Germany's Andreas Kaiser posted a list of two dozen numbers whose reciprocals are computed to only single precision accuracy on the Pentium.<\/p><h4>Tim Coe<a name=\"d55a83b9-dc33-4a74-93f5-d10fb65fbc04\"><\/a><\/h4><p>Tim Coe is an engineer at Vitesse Semiconductor in southern California. He has designed floating point arithmetic units himself and saw in Kaiser's list of erroneous reciprocals clues to how other designers had tackled the same task.  He surmised, correctly it turns out, that the Pentium's division instruction employs a technique known to chip designers as a radix 4 SLT algorithm.  This produces two bits of quotient per machine cycle and thereby makes the Pentium twice as fast at division as previous Intel chips running at the same clock rate.<\/p><p>Coe created a model that explained all the errors Kaiser had reported in reciprocals.  He also realized that division operations involving numerators other than one potentially had even larger relative errors. His model lead him to a pair of seven digit integers whose ratio<\/p><p>$$ c  =  4195835 \/ 3145727 $$<\/p><p>appeared to be an instance of the worst case error.   Coe did his analysis without actually using a Pentium -- he doesn't own one. To verify his prediction, he had to bundle his year-and-a-half old daughter into his car, drive down to a local computer store, and borrow a demonstration machine.<\/p><p>This true value of Coe's ratio is<\/p><p>$$ c  =  1.33382044... $$<\/p><p>But computed on a Pentium, it is<\/p><p>$$ c  =  1.33373906... $$<\/p><p>The computed quotient is accurate to only 14 bits.  The relative error is $6.1 \\cdot 10^{-5}$.  This is worse than single precision roundoff error and over ten orders of magnitude worse than what we expect from double precision computation.  The error doesn't occur very often, but when it does, it can be very significant.  We are faced with a very small probability of making a very big error.<\/p><h4>Press Coverage<a name=\"5f737c9c-31a5-48fb-8935-54c509115b09\"><\/a><\/h4><p>I first learned of the FDIV bug a few days before Coe's revelation from an electronic mailing list maintained by David Hough; at that point I began to follow <tt>comp.sys.intel<\/tt>.  At first, I was curious but not alarmed. It seemed to be some kind of single- versus double-precision problem which, although annoying, is not all that uncommon.  But Coe's discovery, together with a couple of customer calls to The MathWorks tech support, raised my level of interest considerably.<\/p><p>On November 15, I posted a summary of what I knew then to both the Intel and the MATLAB newsgroups, using Nicely's prime and Coe's ratio as examples. I also pointed out that the divisor in both cases is a little less than 3 times a power of two:<\/p><p>$$ 824633702441  =  3 \\cdot 2^{38} - 18391 $$<\/p><p>and<\/p><p>$$ 3145727  =  3 \\cdot 2^{20} - 1 $$<\/p><p>By this time, the Net had become hyperactive and my posting was redistributed widely.  A week later, reporters for major newspapers and news services had Xeroxed copies of faxed copies of printouts of my posting.<\/p><p>The story began to get popular press coverage when Net activists called their local newspapers and TV stations.  I was interviewed by CNN on Tuesday, the 22nd, and spent most of the next day on the telephone with other reporters.  My picture was in the <i>New York Times<\/i> and the <i>San Jose Mercury News<\/i> on Thursday which happened to be Thanksgiving. The <i>Times<\/i> story included a sidebar, titled \"Close, But Not Close Enough\", which used Coe's ratio to illustrate the problem. The story was front-page news in the <i>Boston Globe<\/i>, with the headline \"Sorry, Wrong Number.\" The <i>Globe<\/i>'s sidebar demonstrated the error by evaluating Coe's ratio in a spreadsheet.<\/p><p>In the month since I learned of Nicely's prime and Coe's ratio they have certainly earned their place in the Great Numbers Hall of Fame.<\/p><p>One challenging aspect of this has been how to explaing to the press how big the error is. The focus of attention has been on what we numerical analysts call the residual,<\/p><p>$$ r  =  4195835 - (4195835 \/ 3145727) (3145727) $$<\/p><p>With exact computation, <i>r<\/i> would be zero, but on the Pentium it is 256. To most people, 256 seems like a pretty big error.  But when compared to the input data, of course, it doesn't seem so large.  This gets us into relative error and significant digits -- terms that are not encountered in everyday journalistic prose.  There was even confusion on the part of some reporters about where to starting counting \"decimal digits\". Not everybody got the details right.  The <i>New York Times<\/i> was the only publication I saw that thought to show the actual value of the erroneous quotient.  The British publication, <i>The Economist<\/i>, had the good sense to describe the error as 0.006%.  In retrospect, perhaps the best description of the error is \"about 61 parts per million.\"<\/p><p>Since double precision floating point computation is vital to MATLAB, and since Pentium-based machines account for a significant portion of new MATLAB usage, we decided early on to provide a release of our software that works around the bug.  My first thought was to modify our source code so that every division operation<\/p><pre>   |z = x\/y|<\/pre><p>was followed by a residual calculation<\/p><pre>   |r = x - y*z|<\/pre><p>When the residual is too large, the division could be redone in software using Newton's method.  We abandoned both aspects of this approach -- we don't compute the residual, and we don't do software emulation.<\/p><h4>Peter Tang<a name=\"1baa927c-85e7-4f40-a1f3-4096b18ee8aa\"><\/a><\/h4><p>Coe, Mathisen and I are now working with Peter Tang of Argonne Laboratory (on sabbatical in Hong Kong) and a group of hardware and software engineers from Intel, who are in California and Oregon.  We have never met face-to-face. Our collaboration is all by e-mail and telephone.  (11 am in Massachusetts is 8 am in California and Oregon, 5 pm in Oslo and 1 am in Hong Kong.) We are developing, implementing, testing and proving the correctness of software that will work around the FDIV bug and related bugs in the Pentium's on-chip tangent, arctangent and remainder instructions.  We will offer the software to compiler vendors, commercial software developers and individuals folks via the Net.  We hope this software will be used by anyone doing floating point arithmetic on a Pentium who is unable to replace the chip.<\/p><h4>SRT Algorithm<a name=\"c7388acb-ae3c-496c-912d-e71c58a4d8d8\"><\/a><\/h4><p>The Pentium's division woes can be traced to five missing entries in a lookup table that is actually part of the chip's circuitry.  The radix-4 SRT algorithm is a variation of the familiar long division technique we all earned in school.  Each step of the algorithm takes one machine cycle and appends another two-bit digit to the quotient. Both the quotient and the remainder are represented in a redundant fashion as the difference between two numbers.  In this way, the next digit of the quotient can be obtained by table lookup, with several high-order bits from both the divisor and remainder used as indices into the table. The table is not rectangular; a triangular portion of it is eliminated by constraints on the possible indices.<\/p><p>When the algorithm was implemented in silicon, five entries on the boundary of this triangular portion were dropped.  These missing entries (each of which should have a value of +2) effectively mean that, for certain combinations of bits developed during the division process, the chips make an error while updating the remainder.<\/p><h4>Our Workaround<a name=\"d0ff0ffb-3973-49a7-b1ac-0e62b14ddc9b\"><\/a><\/h4><p>The key to our workaround is the fact the chip does a perfectly good job with division almost all the time.  It is simply a question of avoiding the operands that don't work very well,  which our software does with a quick test of the divisor before each attempted FDIV.  The absence of dangerous bit patterns indicates that the FDIV can be done safely. The presence of one of the patterns does not guarantee that the error will occur, just a signal that it might.  In this case, scaling both numerator and denominator by 15\/16 takes the divisor out of the unsafe region and insures that the subsequent division will be fully accurate.<\/p><p>With this approach, it is not necessary to test the magnitude of the residual resulting from a division.  It is known <i>a priori<\/i> that all divisions will produce fully accurate results.  If desired, an additional test can compare the result of scaled and unscaled divisions and count the number of errors that would occur on an unmodified Pentium.  We will offer this test in MATLAB, but it can be turned off for maximum execution speed.<\/p><p>The regions to be avoided can be characterized by examining the high order bits of the fraction in the IEEE floating point representation. Floating point numbers in any of the three available precisions can be thought of as real numbers of the form<\/p><p>$$ \\pm (1 + f ) \\cdot 2^e $$<\/p><p>where <i>e<\/i> is an integer in the range determined by under- and overflow and <i>f<\/i>  is a rational number in the half open interval [0, 1). The eight high order bits of <i>f<\/i> divide this interval into $2^8$ = 256 subintervals.  Five of these subintervals contain all the at-risk divisors. The hexadecimal representations of the leading bits of the five subintervals are<\/p><pre>  1F, 4F, 7F, AF, and DF<\/pre><p>(As I write this article, Coe, Tang and some of the Intel people are still working on a proof of correctness.  The ultimate version of the algorithm may well use more than eight bits, but the idea will be the same.)<\/p><p>A picture of the floating point numbers between any two powers-of-two looks something like this<\/p><pre>  [____()______()______()______()______()____]\r\n           1F      4F      7F      AF      DF<\/pre><p>The length of each of the five subintervals is 1\/256 of the overall length. The result of any division involving a divisor outside these subintervals will be accurate.  Only a small fraction of divisions involving divisors inside the subintervals produce erroneous results, but it is easiest, and quickest, to avoid the subintervals altogether.  Scaling the numerator and denominator by 15\/16 shifts it to a safe region.  (I had originally proposed scaling by 3\/4, but this takes the 7Fthird subinterval into the 1F subinterval.)<\/p><p>For example, the denominator in Coe's ratio is<\/p><p>$$ 3145727  =  1.49999952316284 \\cdot 2^{21} $$<\/p><p>The hexadecimal representation is printed as <tt>4147FFFF80000000<\/tt>. So <i>e<\/i>= 21  and <i>f<\/i> = $2^{-1} - 2^{-21}$, and the number is close to the right end point of the 7F subinterval.  The 17 consecutive high order 1 bits in <i>f<\/i> conspire with the bits in Coe's numerator to produce an instance of worst-case error.<\/p><p>The Pentium, like most other microprocessors, saves floating point numbers in memory with either a 32-bit single precision format, or a 64-bit double precision format.  In the floating point arithmetic unit itself, however, an 80-bit extended-precision format is used for the calculations. Thus, if an at-risk denominator is encountered in either single or double precision, it can be scaled by 15\/16 and extended precision can be used for the division.  Then, when the resulting quotient is rounded back to the working precision, it will yield exactly the same result, down to the last bit, as would be produced by a chip without the bug.<\/p><h4>Pentium-Safe Division<a name=\"5037fa8b-05f7-4bdb-818f-126e38a119cb\"><\/a><\/h4><p>The entire algorithm can be summarized in a few lines of MATLAB pseudo-code. This doesn't actually work, because MATLAB doesn't do bit-level operations on floating point numbers.  Moreover, we do not make the distinction between the 64- and 80-bit representations that ensure full accuracy when scaling is required.  But I hope this explanation has given readers an idea of the thinking that went into the workaround.<\/p><p>The algorithm is:<\/p><pre>  function z = fdiv(x,y)\r\n    if at_risk(y)\r\n       x = (15\/16)*x;\r\n       y = (15\/16)*y;\r\n    end\r\n    z = x\/y;\r\n  end<\/pre><p>The safety filter is:<\/p><pre>  function a = at_risk(y)\r\n     f = and(hex(y),'000FF00000000000');\r\n     a = any(f == ['1F'; '4F'; '7F'; 'AF'; 'DF'])\r\n  end<\/pre><script language=\"JavaScript\"> <!-- \r\n    function grabCode_73d6d2af8c2d416b9d49cf67828d0cf2() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='73d6d2af8c2d416b9d49cf67828d0cf2 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 73d6d2af8c2d416b9d49cf67828d0cf2';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2013 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_73d6d2af8c2d416b9d49cf67828d0cf2()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2012b<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2012b<br><\/p><\/div><!--\r\n73d6d2af8c2d416b9d49cf67828d0cf2 ##### SOURCE BEGIN #####\r\n%% Pentium Division Bug Revisited\r\n% The Pentium division bug episode in the fall of 1994 was a defining\r\n% moment for the MathWorks, for the Internet, for Intel Corporation,\r\n% and for me personally.  In this blog I am reprinting the\r\n% article that I wrote for the Winter 1995 issue of MATLAB News and Notes \r\n% and for SIAM News.  In my next blog I want to discuss the episode's\r\n% impact.\r\n%\r\n\r\n%% A Tale of Two Numbers (Reprinted from MATLAB News and Notes, Winter 1995)\r\n% \r\n% This is the tale of two numbers, and how they found their way over the\r\n% Internet to the world's front pages on Thanksgiving day, embarrassing the\r\n% world's premier computer chip manufacturer.\r\n% \r\n\r\n%% Thomas Nicely\r\n% The first number is a twelve digit integer\r\n% \r\n% $$ p  =  824633702441 $$\r\n% \r\n% Thomas Nicely of Lynchburg College in Virginia is interested in this number\r\n% because both _p_ and _p+2_ are prime.\r\n% Two consecutive odd numbers that are both prime are called _twin primes_.\r\n% Nicely's research involves the distribution of twin primes and,\r\n% in particular, the sum of their reciprocals,\r\n% \r\n% $$ S = 1\/5 + 1\/7 + ... + 1\/29 + 1\/31 + ... + 1\/p + 1\/(p+2) + ... $$\r\n% \r\n% It is known that this infinite sum has a finite value, but nobody knows\r\n% what that value is.  For over a year, Nicely has been carrying out various\r\n% computations involving twin, triple and quadruple primes.\r\n% One of his objectives has been to demonstrate that today's PCs are just as\r\n% effective as supercomputers for this kind of research.\r\n% He was using half a dozen different computers in his work until March,\r\n% when he added a machine based on Intel Corporation's Pentium chip\r\n% to this collection.\r\n% \r\n% In June, Nicely noticed inconsistencies among several different quantities\r\n% he was computing.  He traced it to the values he was getting for\r\n% $1\/p$ and $1\/(p+2)$.  Although he was using double precision,\r\n% they were accurate only to roughly single precision.\r\n% He blamed at first on his own programs, then the compiler and operating\r\n% system, then the bus and motherboard in his machine.\r\n% By late October though, he was convinced that the difficulty was in\r\n% the chip itself.\r\n\r\n%% Alex Wolfe\r\n% On October 30, in an e-mail message to several other people who had access\r\n% to Pentium systems, Nicely said that he thought there was a bug in the\r\n% processor's floating point arithmetic unit.  One of the recipients of\r\n% Nicely's memo posted it on the CompuServe network.\r\n% Alex Wolfe, a reporter for the EE Times, spotted the posting and\r\n% forwarded it to Terje Mathisen.\r\n\r\n%% Terje Mathisen \r\n% Terje Mathisen, a computer jock at Norsk Hydro in Oslo, Norway,\r\n% has published articles on programming the Pentium, and posted notes\r\n% on the Internet about the accuracy of its transcendental functions.\r\n% Within hours of receiving Wolfe's query, Mathisen confirmed Nicely's\r\n% example, wrote a short, assembly language test program, and, on November 3,\r\n% initiated a chain of Internet postings in newsgroup |comp.sys.intel|\r\n% about the FDIV bug.  (FDIV is the floating point divide instruction\r\n% on the Pentium.)  A day later, Germany's Andreas Kaiser posted a list\r\n% of two dozen numbers whose reciprocals are computed to only single\r\n% precision accuracy on the Pentium.\r\n\r\n%% Tim Coe \r\n% Tim Coe is an engineer at Vitesse Semiconductor in southern California.\r\n% He has designed floating point arithmetic units himself and saw in Kaiser's\r\n% list of erroneous reciprocals clues to how other designers had tackled\r\n% the same task.  He surmised, correctly it turns out, that the Pentium's\r\n% division instruction employs a technique known to chip designers as a\r\n% radix 4 SLT algorithm.  This produces two bits of quotient per machine\r\n% cycle and thereby makes the Pentium twice as fast at division as previous\r\n% Intel chips running at the same clock rate.\r\n% \r\n% Coe created a model that explained all the errors Kaiser had reported in\r\n% reciprocals.  He also realized that division operations involving\r\n% numerators other than one potentially had even larger relative errors.\r\n% His model lead him to a pair of seven digit integers whose ratio\r\n% \r\n% $$ c  =  4195835 \/ 3145727 $$\r\n% \r\n% appeared to be an instance of the worst case error.   Coe did his analysis\r\n% without actually using a Pentium REPLACE_WITH_DASH_DASH he doesn't own one.\r\n% To verify his prediction, he had to bundle his year-and-a-half old daughter\r\n% into his car, drive down to a local computer store, and borrow a\r\n% demonstration machine.\r\n% \r\n% This true value of Coe's ratio is\r\n% \r\n% $$ c  =  1.33382044... $$\r\n% \r\n% But computed on a Pentium, it is\r\n% \r\n% $$ c  =  1.33373906... $$\r\n% \r\n% The computed quotient is accurate to only 14 bits.  The relative error\r\n% is $6.1 \\cdot 10^{-5}$.  This is worse than single precision roundoff error\r\n% and over ten orders of magnitude worse than what we expect from double\r\n% precision computation.  The error doesn't occur very often, but when it does,\r\n% it can be very significant.  We are faced with a very small probability\r\n% of making a very big error.\r\n\r\n%% Press Coverage\r\n% I first learned of the FDIV bug a few days before Coe's revelation from\r\n% an electronic mailing list maintained by David Hough; at that point I\r\n% began to follow |comp.sys.intel|.  At first, I was curious but not alarmed.\r\n% It seemed to be some kind of single- versus double-precision problem\r\n% which, although annoying, is not all that uncommon.  But Coe's discovery,\r\n% together with a couple of customer calls to The MathWorks tech support,\r\n% raised my level of interest considerably.  \r\n% \r\n% On November 15, I posted a summary of what I knew then to both the Intel\r\n% and the MATLAB newsgroups, using Nicely's prime and Coe's ratio as examples.\r\n% I also pointed out that the divisor in both cases is a little less than\r\n% 3 times a power of two:\r\n% \r\n% $$ 824633702441  =  3 \\cdot 2^{38} - 18391 $$\r\n% \r\n% and\r\n% \r\n% $$ 3145727  =  3 \\cdot 2^{20} - 1 $$\r\n% \r\n% By this time, the Net had become hyperactive and my posting was\r\n% redistributed widely.  A week later, reporters for major newspapers and\r\n% news services had Xeroxed copies of faxed copies of printouts of my posting.\r\n% \r\n% The story began to get popular press coverage when Net activists called\r\n% their local newspapers and TV stations.  I was interviewed by CNN on\r\n% Tuesday, the 22nd, and spent most of the next day on the telephone with\r\n% other reporters.  My picture was in the _New York Times_ and the\r\n% _San Jose Mercury News_ on Thursday which happened to be Thanksgiving.\r\n% The _Times_ story included a sidebar, titled \"Close, But Not Close Enough\",\r\n% which used Coe's ratio to illustrate the problem. The story was front-page\r\n% news in the _Boston Globe_, with the headline \"Sorry, Wrong Number.\"\r\n% The _Globe_'s sidebar demonstrated the error by evaluating Coe's ratio\r\n% in a spreadsheet.\r\n% \r\n% In the month since I learned of Nicely's prime and Coe's ratio they have\r\n% certainly earned their place in the Great Numbers Hall of Fame.\r\n% \r\n% One challenging aspect of this has been how to explaing to the press how big\r\n% the error is. The focus of attention has been on what we numerical analysts\r\n% call the residual,\r\n% \r\n% $$ r  =  4195835 - (4195835 \/ 3145727) (3145727) $$\r\n% \r\n% With exact computation, _r_ would be zero, but on the Pentium it is 256.\r\n% To most people, 256 seems like a pretty big error.  But when compared\r\n% to the input data, of course, it doesn't seem so large.  This gets us\r\n% into relative error and significant digits REPLACE_WITH_DASH_DASH terms that are not encountered\r\n% in everyday journalistic prose.  There was even confusion on the part of\r\n% some reporters about where to starting counting \"decimal digits\".\r\n% Not everybody got the details right.  The _New York Times_ was the only\r\n% publication I saw that thought to show the actual value of the erroneous\r\n% quotient.  The British publication, _The Economist_, had the good sense\r\n% to describe the error as 0.006%.  In retrospect, perhaps the best\r\n% description of the error is \"about 61 parts per million.\" \r\n% \r\n% Since double precision floating point computation is vital to MATLAB,\r\n% and since Pentium-based machines account for a significant portion of new\r\n% MATLAB usage, we decided early on to provide a release of our software\r\n% that works around the bug.  My first thought was to modify our source code\r\n% so that every division operation\r\n% \r\n%     |z = x\/y|\r\n% \r\n% was followed by a residual calculation\r\n% \r\n%     |r = x - y*z|\r\n% \r\n% When the residual is too large, the division could be redone in software\r\n% using Newton's method.  We abandoned both aspects of this approach REPLACE_WITH_DASH_DASH\r\n% we don't compute the residual, and we don't do software emulation.\r\n\r\n%% Peter Tang\r\n% Coe, Mathisen and I are now working with Peter Tang of Argonne Laboratory\r\n% (on sabbatical in Hong Kong) and a group of hardware and software engineers\r\n% from Intel, who are in California and Oregon.  We have never met face-to-face.\r\n% Our collaboration is all by e-mail and telephone.  (11 am in Massachusetts\r\n% is 8 am in California and Oregon, 5 pm in Oslo and 1 am in Hong Kong.)\r\n% We are developing, implementing, testing and proving the correctness of\r\n% software that will work around the FDIV bug and related bugs in the Pentium's\r\n% on-chip tangent, arctangent and remainder instructions.  We will offer the\r\n% software to compiler vendors, commercial software developers and individuals\r\n% folks via the Net.  We hope this software will be used by anyone doing\r\n% floating point arithmetic on a Pentium who is unable to replace the chip.\r\n\r\n%% SRT Algorithm\r\n% The Pentium's division woes can be traced to five missing entries in a\r\n% lookup table that is actually part of the chip's circuitry.  The radix-4\r\n% SRT algorithm is a variation of the familiar long division technique we all\r\n% earned in school.  Each step of the algorithm takes one machine cycle and\r\n% appends another two-bit digit to the quotient.\r\n% Both the quotient and the remainder are represented in a redundant fashion\r\n% as the difference between two numbers.  In this way, the next digit of the\r\n% quotient can be obtained by table lookup, with several high-order bits\r\n% from both the divisor and remainder used as indices into the table.\r\n% The table is not rectangular; a triangular portion of it is eliminated\r\n% by constraints on the possible indices.\r\n% \r\n% When the algorithm was implemented in silicon, five entries on the boundary\r\n% of this triangular portion were dropped.  These missing entries (each of\r\n% which should have a value of +2) effectively mean that, for certain\r\n% combinations of bits developed during the division process, the chips make\r\n% an error while updating the remainder.\r\n\r\n%% Our Workaround\r\n% The key to our workaround is the fact the chip does a perfectly good job\r\n% with division almost all the time.  It is simply a question of avoiding\r\n% the operands that don't work very well,  which our software does with a\r\n% quick test of the divisor before each attempted FDIV.  The absence of\r\n% dangerous bit patterns indicates that the FDIV can be done safely.\r\n% The presence of one of the patterns does not guarantee that the error will\r\n% occur, just a signal that it might.  In this case, scaling both numerator\r\n% and denominator by 15\/16 takes the divisor out of the unsafe region and\r\n% insures that the subsequent division will be fully accurate.\r\n% \r\n% With this approach, it is not necessary to test the magnitude of the\r\n% residual resulting from a division.  It is known _a priori_ that all\r\n% divisions will produce fully accurate results.  If desired, an additional\r\n% test can compare the result of scaled and unscaled divisions and count the\r\n% number of errors that would occur on an unmodified Pentium.  We will offer\r\n% this test in MATLAB, but it can be turned off for maximum execution speed.\r\n% \r\n% The regions to be avoided can be characterized by examining the high order\r\n% bits of the fraction in the IEEE floating point representation.\r\n% Floating point numbers in any of the three available precisions can be\r\n% thought of as real numbers of the form\r\n% \r\n% $$ \\pm (1 + f ) \\cdot 2^e $$\r\n% \r\n% where _e_ is an integer in the range determined by under- and overflow and\r\n% _f_  is a rational number in the half open interval [0, 1).\r\n% The eight high order bits of _f_ divide this interval into $2^8$ = 256\r\n% subintervals.  Five of these subintervals contain all the at-risk divisors.\r\n% The hexadecimal representations of the leading bits of the five subintervals\r\n% are\r\n% \r\n%    1F, 4F, 7F, AF, and DF\r\n% \r\n% (As I write this article, Coe, Tang and some of the Intel people are still\r\n% working on a proof of correctness.  The ultimate version of the algorithm\r\n% may well use more than eight bits, but the idea will be the same.)\r\n% \r\n% A picture of the floating point numbers between any two powers-of-two looks\r\n% something like this\r\n% \r\n%    [____()______()______()______()______()____]\r\n%             1F      4F      7F      AF      DF    \r\n% \r\n% The length of each of the five subintervals is 1\/256 of the overall length.\r\n% The result of any division involving a divisor outside these subintervals\r\n% will be accurate.  Only a small fraction of divisions involving divisors\r\n% inside the subintervals produce erroneous results, but it is easiest,\r\n% and quickest, to avoid the subintervals altogether.  Scaling the numerator\r\n% and denominator by 15\/16 shifts it to a safe region.  (I had originally\r\n% proposed scaling by 3\/4, but this takes the 7Fthird subinterval\r\n% into the 1F subinterval.)\r\n% \r\n% For example, the denominator in Coe's ratio is\r\n% \r\n% $$ 3145727  =  1.49999952316284 \\cdot 2^{21} $$\r\n% \r\n% The hexadecimal representation is printed as |4147FFFF80000000|.\r\n% So _e_= 21  and _f_ = $2^{-1} - 2^{-21}$, and the number is\r\n% close to the right end point of the 7F subinterval.  The 17 consecutive\r\n% high order 1 bits in _f_ conspire with the bits in Coe's numerator to\r\n% produce an instance of worst-case error.\r\n% \r\n% The Pentium, like most other microprocessors, saves floating point numbers\r\n% in memory with either a 32-bit single precision format, or a 64-bit double\r\n% precision format.  In the floating point arithmetic unit itself, however,\r\n% an 80-bit extended-precision format is used for the calculations.\r\n% Thus, if an at-risk denominator is encountered in either single or double\r\n% precision, it can be scaled by 15\/16 and extended precision can be used\r\n% for the division.  Then, when the resulting quotient is rounded back to\r\n% the working precision, it will yield exactly the same result, down to the\r\n% last bit, as would be produced by a chip without the bug.\r\n\r\n%% Pentium-Safe Division\r\n% The entire algorithm can be summarized in a few lines of MATLAB pseudo-code.\r\n% This doesn't actually work, because MATLAB doesn't do bit-level operations\r\n% on floating point numbers.  Moreover, we do not make the distinction between\r\n% the 64- and 80-bit representations that ensure full accuracy when scaling\r\n% is required.  But I hope this explanation has given readers an idea of the \r\n% thinking that went into the workaround.\r\n% \r\n\r\n%%\r\n% The algorithm is:\r\n%\r\n%    function z = fdiv(x,y)\r\n%      if at_risk(y)\r\n%         x = (15\/16)*x;\r\n%         y = (15\/16)*y;\r\n%      end\r\n%      z = x\/y;\r\n%    end\r\n\r\n%%\r\n% The safety filter is:\r\n%\r\n%    function a = at_risk(y)\r\n%       f = and(hex(y),'000FF00000000000');\r\n%       a = any(f == ['1F'; '4F'; '7F'; 'AF'; 'DF']) \r\n%    end\r\n\r\n##### SOURCE END ##### 73d6d2af8c2d416b9d49cf67828d0cf2\r\n-->","protected":false},"excerpt":{"rendered":"<!--introduction--><p>The Pentium division bug episode in the fall of 1994 was a defining moment for the MathWorks, for the Internet, for Intel Corporation, and for me personally.  In this blog I am reprinting the article that I wrote for the Winter 1995 issue of MATLAB News and Notes and for SIAM News.  In my next blog I want to discuss the episode's impact.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2013\/04\/29\/pentium-division-bug\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/589"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=589"}],"version-history":[{"count":13,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/589\/revisions"}],"predecessor-version":[{"id":621,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/589\/revisions\/621"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=589"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=589"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=589"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}