{"id":75,"date":"2009-12-02T02:59:06","date_gmt":"2009-12-02T02:59:06","guid":{"rendered":"https:\/\/blogs.mathworks.com\/seth\/2009\/12\/02\/floating-point-numbers\/"},"modified":"2009-12-07T18:56:47","modified_gmt":"2009-12-07T18:56:47","slug":"floating-point-numbers","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/simulink\/2009\/12\/02\/floating-point-numbers\/","title":{"rendered":"Floating-Point Numbers"},"content":{"rendered":"<link type=\"text\/css\" rel=\"stylesheet\" href=\"\/images\/seth\/seth.css\">\r\n\r\n<p>Numeric simulation is all about the numbers.  In a previous post,\r\nI talked about <a\r\nhref=\"https:\/\/blogs.mathworks.com\/seth\/2008\/12\/14\/representing-numbers-integers-and-fixed-point\/\">integer\r\nand fixed-point number representations<\/a>.  These numbers are especially\r\nuseful for discrete simulation and embedded systems.  For continuous dynamic\r\nsystems, the values do not represent discrete values but continuously changing\r\nfunctions in time.  For this, floating-point numbers provide the flexibility and\r\nrange of representation needed to store results.  In this post, I will review\r\nthe fundamentals related to floating-point numbers.<\/p>\r\n\r\n<p><strong>Sign, Exponent, Fraction<\/strong><\/p>\r\n\r\n<p>Floating-point numbers extend the idea of a fixed-point\r\nnumber by defining an exponent.  A normalized floating-point number has a sign\r\nbit, the exponent, and the fraction.<\/p>\r\n\r\n<table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0\r\n style='border-collapse:collapse;border:none'>\r\n <tr style='height:12.1pt'>\r\n  <td width=45 valign=top style='width:33.55pt;border:solid black 1.0pt;\r\n  padding:0in 5.4pt 0in 5.4pt;height:12.1pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>sign<\/p>\r\n  <\/td>\r\n  <td width=88 valign=top style='width:66.3pt;border:solid black 1.0pt;\r\n  border-left:none;padding:0in 5.4pt 0in 5.4pt;height:12.1pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>exponent (e)<\/p>\r\n  <\/td>\r\n  <td width=162 valign=top style='width:121.85pt;border:solid black 1.0pt;\r\n  border-left:none;padding:0in 5.4pt 0in 5.4pt;height:12.1pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>fraction (f)<\/p>\r\n  <\/td>\r\n <\/tr>\r\n<\/table>\r\n\r\n\r\n\r\n<p>The fraction can represent numbers where 0&#8804;X&lt;1. The\r\nexponent provides the ability to scale the range of the numbers represented by\r\nthe fraction.  The spacing of floating-point numbers is relative to the number\r\nof fractional bits and the magnitude of the number represented.  For very large\r\nvalues of the exponent, the spacing between the numbers is large.   For small\r\nnumbers, the spacing is small.  This space between the numbers you can\r\nrepresent in floating point is called epsilon, or eps.  When calculations\r\nresult in a number that falls into one of these spaces between floating-point\r\nrepresentations, rounding occurs.  This rounding introduces an error to the\r\ncalculation on the order of eps.<\/p>\r\n\r\n<p><a\r\nhref=\"https:\/\/www.mathworks.com\/company\/aboutus\/founders\/clevemoler.html\">Cleve\r\nMoler<\/a> wrote a great article titled <a\r\nhref=\"https:\/\/www.mathworks.com\/content\/dam\/mathworks\/mathworks-dot-com\/company\/newsletters\/news_notes\/pdf\/Fall96Cleve.pdf\">Floating\r\nPoints<\/a>.  It gives a great explanation of how floating point works and some\r\nof the historical context for the IEEE double precision standard.  In the article,\r\nhe describes a toy floating-point system consisting of one sign bit, a three-bit\r\nexponent, and a three-bit fraction.<\/p>\r\n\r\n<table border=1 cellspacing=0 cellpadding=0\r\n style='border-collapse:collapse;border:none'>\r\n <tr style='height:13.5pt'>\r\n  <td width=41 valign=top style='width:30.8pt;border:solid black 1.0pt;\r\n  padding:0in 5.4pt 0in 5.4pt;height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>sign<\/p>\r\n  <\/td>\r\n  <td width=123 colspan=3 valign=top style='width:92.45pt;border:solid black 1.0pt;\r\n  border-left:none;padding:0in 5.4pt 0in 5.4pt;height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>exponent (e)<\/p>\r\n  <\/td>\r\n  <td width=123 colspan=3 valign=top style='width:92.45pt;border:solid black 1.0pt;\r\n  border-left:none;padding:0in 5.4pt 0in 5.4pt;height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>fraction (f)<\/p>\r\n  <\/td>\r\n <\/tr>\r\n <tr style='height:13.5pt'>\r\n  <td width=41 valign=top style='width:30.8pt;border:solid black 1.0pt;\r\n  border-top:none;padding:0in 5.4pt 0in 5.4pt;height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>+\/-<\/p>\r\n  <\/td>\r\n  <td width=41 valign=top style='width:30.8pt;border-top:none;border-left:none;\r\n  border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:0in 5.4pt 0in 5.4pt;\r\n  height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>+\/-<\/p>\r\n  <\/td>\r\n  <td width=41 valign=top style='width:30.8pt;border-top:none;border-left:none;\r\n  border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:0in 5.4pt 0in 5.4pt;\r\n  height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>2<sup>1<\/sup><\/p>\r\n  <\/td>\r\n  <td width=41 valign=top style='width:30.8pt;border-top:none;border-left:none;\r\n  border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:0in 5.4pt 0in 5.4pt;\r\n  height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>2<sup>0<\/sup><\/p>\r\n  <\/td>\r\n  <td width=41 valign=top style='width:30.8pt;border-top:none;border-left:none;\r\n  border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:0in 5.4pt 0in 5.4pt;\r\n  height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>2<sup>-1<\/sup><\/p>\r\n  <\/td>\r\n  <td width=41 valign=top style='width:30.8pt;border-top:none;border-left:none;\r\n  border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:0in 5.4pt 0in 5.4pt;\r\n  height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>2<sup>-2<\/sup><\/p>\r\n  <\/td>\r\n  <td width=41 valign=top style='width:30.8pt;border-top:none;border-left:none;\r\n  border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:0in 5.4pt 0in 5.4pt;\r\n  height:13.5pt'>\r\n  <p style='margin-bottom:0in;margin-bottom:.0001pt;line-height:\r\n  normal'>2<sup>-3<\/sup><\/p>\r\n  <\/td>\r\n <\/tr>\r\n<\/table>\r\n\r\n\r\n<p>The exponent can hold integer values between -4 and 3.  The\r\nfraction holds values of 0 to &#8542; with a &#8539; spacing.  The value of a\r\nnormalized floating-point number is:<\/p>\r\n\r\n<p>x = \u00b1 (1 + f ) \u00d7 2<sup>e<\/sup><\/p>\r\n\r\n<p>The following graphic from Cleve\u2019s article illustrates the\r\nspacing between floating point numbers in this toy system.<\/p>\r\n\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2009Q4\/floating_pointCleve.png\" alt=\"A toy floating point number system showing the spacing between numbers.\"> <\/p>\r\n\r\n<p>This is my mental image when I think about floating-point\r\nnumbers and issues of precision in floating point calculations.<\/p>\r\n\r\n<p><strong>Resources<\/strong><\/p>\r\n\r\n<p>There are many great sources of knowledge about floating-point\r\nnumbers on the web and everyone seems to have a favorite reference.  My\r\nfavorite is from Cleve, but here are some more resources to check out for\r\nyourself.<\/p>\r\n\r\n<p><a\r\nhref=\"https:\/\/www.mathworks.com\/content\/dam\/mathworks\/mathworks-dot-com\/company\/newsletters\/news_notes\/pdf\/Fall96Cleve.pdf\">Floating\r\nPoints<\/a> by Cleve Moler<\/p>\r\n\r\n<p><a\r\ntitle=\"http:\/\/docs.sun.com\/source\/806-3568\/ncg_goldberg.html (link no longer works)\">What Every\r\nComputer Scientist Should Know About Floating-Point Arithmetic<\/a> by David\r\nGoldbeg <\/p>\r\n\r\n<p><a\r\ntitle=\"http:\/\/uvu.freshsources.com\/decimals.pdf (link no longer works)\">Where\r\nDid All My Decimals Go?<\/a> by Chuck Allison<\/p>\r\n\r\n<p><strong>Now it\u2019s your turn<\/strong><\/p>\r\n\r\n<p>Can you think of an example of an embedded system that needs\r\nto represent numbers over a full range from 2.2251e-308 to 1.7977e+308?  What\r\nresource do you turn to when you have questions about floating-point numbers? \r\nLeave a <a href=\"https:\/\/blogs.mathworks.com\/seth\/?p=75&amp;#comment\">comment\r\nhere<\/a> and share it.<\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>\r\n\r\nNumeric simulation is all about the numbers.  In a previous post,\r\nI talked about integer\r\nand fixed-point number representations.  These numbers are especially\r\nuseful for discrete simulation... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/simulink\/2009\/12\/02\/floating-point-numbers\/\">read more >><\/a><\/p>","protected":false},"author":40,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[67,76],"tags":[114,458],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/75"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/comments?post=75"}],"version-history":[{"count":0,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/75\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/media?parent=75"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/categories?post=75"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/tags?post=75"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}