<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: A Glimpse into Floating-Point Accuracy</title>
	<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/</link>
	<description>Loren Shure  works on design of the MATLAB language at &#60;a href="http://www.mathworks.com/"&#62;The MathWorks&#60;/a&#62;. She writes here about once a week on MATLAB programming and related topics. &#60;br&#62;&#60;br&#62;&#60;a href="/images/loren-full.jpg"&#62;&#60;img src="/images/loren.jpg"&#62;&#60;/a&#62;</description>
	<pubDate>Mon, 23 Nov 2009 00:49:00 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: arda</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30058</link>
		<dc:creator>arda</dc:creator>
		<pubDate>Mon, 23 Feb 2009 20:21:09 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30058</guid>
		<description>Steve,
Thanks for the help. For a footnote, I investigated the trigonometric errors for both correct and incorrect values. It turns out to be coincidence of correct values (not incorrect ones) :)</description>
		<content:encoded><![CDATA[<p>Steve,<br />
Thanks for the help. For a footnote, I investigated the trigonometric errors for both correct and incorrect values. It turns out to be coincidence of correct values (not incorrect ones) :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve L</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30056</link>
		<dc:creator>Steve L</dc:creator>
		<pubDate>Mon, 23 Feb 2009 14:14:40 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30056</guid>
		<description>arda,

The coefficient C3 used by FDLIBM's implementation of COS is NOT exactly 1/factorial(8).  See this Cleve's Corner article for a discussion of how the FDLIBM implementation of SIN was designed; the implementation of COS was designed in much the same way.

http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter02_cleve.html

Quoting from that article:

&lt;i&gt;The six coefficients are close to, but not exactly equal to, the power series coefficients 1/3!, 1/5!, …, 1/13! . They minimize the maximum relative error, &#124; (sin() - p())/sin() &#124;, over the interval. Six terms are enough to make this approximation error less than 2^(-52), which is the roundoff error involved when all the terms, and the sum, are less than one.&lt;/i&gt;

That article is titled "The Tetragamma Function and Numerical Craftsmanship" -- and there's definitely some craftmanship involved in the implementation of the trig functions.</description>
		<content:encoded><![CDATA[<p>arda,</p>
<p>The coefficient C3 used by FDLIBM&#8217;s implementation of COS is NOT exactly 1/factorial(8).  See this Cleve&#8217;s Corner article for a discussion of how the FDLIBM implementation of SIN was designed; the implementation of COS was designed in much the same way.</p>
<p><a href="http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter02_cleve.html" rel="nofollow">http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter02_cleve.html</a></p>
<p>Quoting from that article:</p>
<p><i>The six coefficients are close to, but not exactly equal to, the power series coefficients 1/3!, 1/5!, …, 1/13! . They minimize the maximum relative error, | (sin() - p())/sin() |, over the interval. Six terms are enough to make this approximation error less than 2^(-52), which is the roundoff error involved when all the terms, and the sum, are less than one.</i></p>
<p>That article is titled &#8220;The Tetragamma Function and Numerical Craftsmanship&#8221; &#8212; and there&#8217;s definitely some craftmanship involved in the implementation of the trig functions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: arda</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30054</link>
		<dc:creator>arda</dc:creator>
		<pubDate>Mon, 23 Feb 2009 10:07:53 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30054</guid>
		<description>Thanks for the reply,
I'm interested in cos() errors in Matlab and wanted to see the major error. I know these errors are mostly because floating points, and they are all look like floating point errors;
&#62;&#62; x=0:1/2^15:0.3;
&#62;&#62; plot(x,cos(x).^2+sin(x).^2-1,'.')

But i do want to see the exact reason for every number and then maybe come up with a solution. See i also found some interesting things;

Take a look at the cos() function which is said to be used in Matlab. I have cross checked and ensure that this is the code Matlab uses (http://www.netlib.org/fdlibm/k_cos.c). 
Inside the code the constants are given as forexample C3=2.48015872894767294178e-05 and this is 1/8! (http://en.wikipedia.org/wiki/Cosine). However the difference is;
&#62;&#62; 1/factorial(8)-2.48015872894767294178e-05
ans =
  1.2111e-014

These are clearly different numbers! And what is more interesting is that errors appear more when i try to use exact factorial values. Even with higher series..</description>
		<content:encoded><![CDATA[<p>Thanks for the reply,<br />
I&#8217;m interested in cos() errors in Matlab and wanted to see the major error. I know these errors are mostly because floating points, and they are all look like floating point errors;<br />
&gt;&gt; x=0:1/2^15:0.3;<br />
&gt;&gt; plot(x,cos(x).^2+sin(x).^2-1,&#8217;.')</p>
<p>But i do want to see the exact reason for every number and then maybe come up with a solution. See i also found some interesting things;</p>
<p>Take a look at the cos() function which is said to be used in Matlab. I have cross checked and ensure that this is the code Matlab uses (http://www.netlib.org/fdlibm/k_cos.c).<br />
Inside the code the constants are given as forexample C3=2.48015872894767294178e-05 and this is 1/8! (http://en.wikipedia.org/wiki/Cosine). However the difference is;<br />
&gt;&gt; 1/factorial(8)-2.48015872894767294178e-05<br />
ans =<br />
  1.2111e-014</p>
<p>These are clearly different numbers! And what is more interesting is that errors appear more when i try to use exact factorial values. Even with higher series..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve L</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30053</link>
		<dc:creator>Steve L</dc:creator>
		<pubDate>Fri, 20 Feb 2009 14:17:01 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30053</guid>
		<description>arda,

EPS is not the min length to point 1.  It is the distance from 1 to the _next largest_ floating point number.  You're correct that the next _largest_ number after 1 is 1+eps(1).  However, the next _smallest_ number before 1 is not 1-eps(1) but is actually 1-(eps(1)/2).

You can see this by looking at the hexadecimal representation of these two numbers and one.  To do this, use the HEX option for the FORMAT command.  The first three hex digits are the sign and the exponent of the number, the last thirteen are the mantissa.

&lt;pre&gt;&lt;code&gt;
% Tell MATLAB to display the numbers in hex format
format hex

% Look at the number 1
one = 1
 
% This is the next largest number
% The last bit of the mantissa has been incremented
nextLargest = one+eps(one)
 
% This is the next smallest number
% We decremented the last bit of the mantissa
% which required borrowing from the exponent
nextSmallest = one - (eps(one)/2)

% Is nextSmallest actually less than 1?
isSmaller = nextSmallest &#60; one
 
% Reset the format back to the default
format
&lt;/pre&gt;&lt;/code&gt;

Cleve's newsletter article, linked at the end of Loren's blog posting, describes the hexadecimal format displayed by FORMAT HEX.</description>
		<content:encoded><![CDATA[<p>arda,</p>
<p>EPS is not the min length to point 1.  It is the distance from 1 to the _next largest_ floating point number.  You&#8217;re correct that the next _largest_ number after 1 is 1+eps(1).  However, the next _smallest_ number before 1 is not 1-eps(1) but is actually 1-(eps(1)/2).</p>
<p>You can see this by looking at the hexadecimal representation of these two numbers and one.  To do this, use the HEX option for the FORMAT command.  The first three hex digits are the sign and the exponent of the number, the last thirteen are the mantissa.</p>
<pre><code>
% Tell MATLAB to display the numbers in hex format
format hex

% Look at the number 1
one = 1

% This is the next largest number
% The last bit of the mantissa has been incremented
nextLargest = one+eps(one)

% This is the next smallest number
% We decremented the last bit of the mantissa
% which required borrowing from the exponent
nextSmallest = one - (eps(one)/2)

% Is nextSmallest actually less than 1?
isSmaller = nextSmallest &lt; one

% Reset the format back to the default
format
</code></pre>
<p></p>
<p>Cleve&#8217;s newsletter article, linked at the end of Loren&#8217;s blog posting, describes the hexadecimal format displayed by FORMAT HEX.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Tearle</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30052</link>
		<dc:creator>Matt Tearle</dc:creator>
		<pubDate>Fri, 20 Feb 2009 14:16:17 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30052</guid>
		<description>Addendum: having played with Cleve's floatgui, I see that my answer is basically correct.  From his book: "within each binary interval 2^e \leq x \leq 2^{e+1}, the numbers are equally spaced with an increment of 2^{e-t}" (t = number of bits to store the mantissa).  So numbers in [1,2] are separated by eps, but those in [1/2,1] are separated by eps/2.</description>
		<content:encoded><![CDATA[<p>Addendum: having played with Cleve&#8217;s floatgui, I see that my answer is basically correct.  From his book: &#8220;within each binary interval 2^e \leq x \leq 2^{e+1}, the numbers are equally spaced with an increment of 2^{e-t}&#8221; (t = number of bits to store the mantissa).  So numbers in [1,2] are separated by eps, but those in [1/2,1] are separated by eps/2.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Tearle</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30051</link>
		<dc:creator>Matt Tearle</dc:creator>
		<pubDate>Fri, 20 Feb 2009 14:03:00 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30051</guid>
		<description>@arda: I've encountered this before and I *think* the answer is related to how fp numbers are distributed: the distance between each exactly-representable fp number gets bigger as the numbers get bigger.  OK, so eps(1) is defined as the distance between 1 and the next (ie higher) fp number.  Since you're doing subtraction, I think you're getting the distance between 1 and the previous (ie lower) fp number, which, apparently, is eps/2.

Have a look at floatgui.m from Cleve Moler's NCM book: http://www.mathworks.com/moler/ncmfilelist.html</description>
		<content:encoded><![CDATA[<p>@arda: I&#8217;ve encountered this before and I *think* the answer is related to how fp numbers are distributed: the distance between each exactly-representable fp number gets bigger as the numbers get bigger.  OK, so eps(1) is defined as the distance between 1 and the next (ie higher) fp number.  Since you&#8217;re doing subtraction, I think you&#8217;re getting the distance between 1 and the previous (ie lower) fp number, which, apparently, is eps/2.</p>
<p>Have a look at floatgui.m from Cleve Moler&#8217;s NCM book: <a href="http://www.mathworks.com/moler/ncmfilelist.html" rel="nofollow">http://www.mathworks.com/moler/ncmfilelist.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: arda</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30050</link>
		<dc:creator>arda</dc:creator>
		<pubDate>Fri, 20 Feb 2009 12:04:38 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30050</guid>
		<description>Loren,
Thanks for the guide, but i am confused with the following result;


&#62;&#62; 1-(cos(1/8)^2+sin(1/8)^2)
ans =
  1.1102e-016
&#62;&#62; ans/eps(1)
ans =
    0.5000

Eps is supposed to be the min length to point 1. How come half of it appears?</description>
		<content:encoded><![CDATA[<p>Loren,<br />
Thanks for the guide, but i am confused with the following result;</p>
<p>&gt;&gt; 1-(cos(1/8)^2+sin(1/8)^2)<br />
ans =<br />
  1.1102e-016<br />
&gt;&gt; ans/eps(1)<br />
ans =<br />
    0.5000</p>
<p>Eps is supposed to be the min length to point 1. How come half of it appears?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loren</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30049</link>
		<dc:creator>Loren</dc:creator>
		<pubDate>Thu, 19 Feb 2009 15:39:56 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30049</guid>
		<description>Banu-

We could rearchitect MATLAB to do that, and it would slow everything down intolerably.  In addition, who's to say that 100*eps is the right cutoff?

--Loren</description>
		<content:encoded><![CDATA[<p>Banu-</p>
<p>We could rearchitect MATLAB to do that, and it would slow everything down intolerably.  In addition, who&#8217;s to say that 100*eps is the right cutoff?</p>
<p>&#8211;Loren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Banu</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30048</link>
		<dc:creator>Banu</dc:creator>
		<pubDate>Thu, 19 Feb 2009 15:26:14 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-30048</guid>
		<description>Hi,
This could sound a little amateur but here is what i dont understand;
I perfectly figured the eps concept, but since we get this error most of the times why cant we work on say 14 significant digits to be sure. I mean we can round-off the 15. digit for all calculations and that ensures the correct answer. Is there such a way available in Matlab?</description>
		<content:encoded><![CDATA[<p>Hi,<br />
This could sound a little amateur but here is what i dont understand;<br />
I perfectly figured the eps concept, but since we get this error most of the times why cant we work on say 14 significant digits to be sure. I mean we can round-off the 15. digit for all calculations and that ensures the correct answer. Is there such a way available in Matlab?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-29626</link>
		<dc:creator>John</dc:creator>
		<pubDate>Mon, 28 Jul 2008 21:08:33 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/#comment-29626</guid>
		<description>Thanks Loren and Mike. I'm amazed at the great and friendly support on my problem.

So far I've decided to take the numeric workaround by John D'Errico and Roger Stafford. My algorithm will greatly benefit  from using straight numeric arithmetics, plus their workaround is compatible with required bit-wise operations.

I'll check the fi toolbox, no doubt ;)

Thanks for your time, it's much appreciated.</description>
		<content:encoded><![CDATA[<p>Thanks Loren and Mike. I&#8217;m amazed at the great and friendly support on my problem.</p>
<p>So far I&#8217;ve decided to take the numeric workaround by John D&#8217;Errico and Roger Stafford. My algorithm will greatly benefit  from using straight numeric arithmetics, plus their workaround is compatible with required bit-wise operations.</p>
<p>I&#8217;ll check the fi toolbox, no doubt ;)</p>
<p>Thanks for your time, it&#8217;s much appreciated.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
