{"id":454,"date":"2012-06-01T02:46:57","date_gmt":"2012-06-01T07:46:57","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=454"},"modified":"2012-06-01T02:46:57","modified_gmt":"2012-06-01T07:46:57","slug":"benfords-law-what-are-the-odds-that-the-first-digit-is-a-1-2","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2012\/06\/01\/benfords-law-what-are-the-odds-that-the-first-digit-is-a-1-2\/","title":{"rendered":"Benford&#8217;s Law &#8211; What are the odds that the first digit is a &#8216;1&#8217;?"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p>I'd like to introduce this week's guest blogger Sam Mirsky.  Sam is an Application Engineer here at MathWorks who focuses\r\n         on real-time testing applications using Simulink.  However, in this post he will talk about a non-intuitive characteristic\r\n         of large data sets, and test the idea with a data set which ships with MATLAB.\r\n      <\/p>\r\n      <p>In a large set of data, it seems that the probability of individual numbers starting with 1 would be the same as any other\r\n         digit.  However, this is not true.  There is a much higher probability that the first digit is a 1.\r\n      <\/p>\r\n      <p>Since the first significant digit is not zero, the intuitive probability of a number starting with 1 (or any other digit)\r\n         would be 1\/9 = 11%.  According to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Benford%27s_law\">Wikipedia<\/a>: \"The first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency,\r\n         to the point where 9 as a first digit occurs less than 5% of the time.\"\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#1\">Load Data<\/a><\/li>\r\n         <li><a href=\"#2\">Find digit statistics<\/a><\/li>\r\n         <li><a href=\"#3\">Plot results<\/a><\/li>\r\n         <li><a href=\"#4\">How this is used<\/a><\/li>\r\n         <li><a href=\"#5\">How would you use MATLAB to calculate these statistics?<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Load Data<a name=\"1\"><\/a><\/h3>\r\n   <p>Let us test this with a data set which ships with MATLAB:  quake.mat. This is a data set with accelerometer data from an earthquake\r\n      in California.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">load <span style=\"color: #A020F0\">quake<\/span><\/pre><h3>Find digit statistics<a name=\"2\"><\/a><\/h3><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">stat(1:9) = 0;\r\n<span style=\"color: #0000FF\">for<\/span> i = 1:length(v)\r\n    string = sprintf(<span style=\"color: #A020F0\">'%0.5e'<\/span>, abs(v(i)));\r\n    firstDigit = str2double(string(1));\r\n    <span style=\"color: #0000FF\">switch<\/span> firstDigit\r\n        <span style=\"color: #0000FF\">case<\/span> 1\r\n            stat(1) = stat(1) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 2\r\n            stat(2) = stat(2) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 3\r\n            stat(3) = stat(3) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 4\r\n            stat(4) = stat(4) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 5\r\n            stat(5) = stat(5) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 6\r\n            stat(6) = stat(6) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 7\r\n            stat(7) = stat(7) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 8\r\n            stat(8) = stat(8) +1;\r\n        <span style=\"color: #0000FF\">case<\/span> 9\r\n            stat(9) = stat(9) +1;\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><h3>Plot results<a name=\"3\"><\/a><\/h3><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">statPercent = stat \/ sum(v ~= 0);  <span style=\"color: #228B22\">%only use non-zero numbers for stats<\/span>\r\nbar(statPercent);\r\ngrid <span style=\"color: #A020F0\">on<\/span>;\r\nxlabel(<span style=\"color: #A020F0\">'First digit'<\/span>);\r\nylabel(<span style=\"color: #A020F0\">'Percent'<\/span>);<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/454\/Benford_LorenBlog_01.png\"> <h3>How this is used<a name=\"4\"><\/a><\/h3>\r\n   <p>This is one test that is done to test if a data set is real or fabricated.  For example, if you collect all the numbers on\r\n      a federal income tax return, it should also obey Benford's Law.\r\n   <\/p>\r\n   <h3>How would you use MATLAB to calculate these statistics?<a name=\"5\"><\/a><\/h3>\r\n   <p>As is typical with MATLAB, there are many ways to derive the same answer:<\/p>\r\n   <div>\r\n      <ul>\r\n         <li>What MATLAB commands would you use to analyze the first digit of numbers in a data set?<\/li>\r\n         <li>Does Benford's Law apply to a data set you have (or not)?  Show us your results <a href=\"<https:\/\/blogs.mathworks.com\/loren\/?p=454#respond\">here<\/a>.\r\n         <\/li>\r\n      <\/ul>\r\n   <\/div><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_e0c963f5b1b042f3b62e250780ad1f07() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='e0c963f5b1b042f3b62e250780ad1f07 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' e0c963f5b1b042f3b62e250780ad1f07';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Sam Mirsky';\r\n        copyright = 'Copyright 2012 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_e0c963f5b1b042f3b62e250780ad1f07()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.14<br><\/p>\r\n<\/div>\r\n<!--\r\ne0c963f5b1b042f3b62e250780ad1f07 ##### SOURCE BEGIN #####\r\n%% Benford's Law - What are the odds that the first digit is a '1'?\r\n% I'd like to introduce this week's guest blogger Sam Mirsky.  Sam is an\r\n% Application Engineer here at MathWorks who focuses on real-time testing\r\n% applications using Simulink.  However, in this post he will talk about a\r\n% non-intuitive characteristic of large data sets, and test the idea with a\r\n% data set which ships with MATLAB.\r\n%\r\n% In a large set of data, it seems that the probability of individual\r\n% numbers starting with 1 would be the same as any other digit.  However,\r\n% this is not true.  There is a much higher probability that the first digit\r\n% is a 1.\r\n%\r\n% Since the first significant digit is not zero, the intuitive\r\n% probability of a number starting with 1 (or any other digit) would be\r\n% 1\/9 = 11%.  According to <http:\/\/en.wikipedia.org\/wiki\/Benford%27s_law Wikipedia>:\r\n% \"The first digit is 1 about 30% of the time, and larger digits occur as \r\n% the leading digit with lower and lower frequency, to the point where 9 \r\n% as a first digit occurs less than 5% of the time.\"\r\n\r\n%% Load Data\r\n% Let us test this with a data set which ships with MATLAB:  quake.mat.\r\n% This is a data set with accelerometer data from an earthquake in\r\n% California.\r\nload quake\r\n\r\n%% Find digit statistics\r\nstat(1:9) = 0;\r\nfor i = 1:length(v)\r\n    string = sprintf('%0.5e', abs(v(i)));\r\n    firstDigit = str2double(string(1));\r\n    switch firstDigit\r\n        case 1\r\n            stat(1) = stat(1) +1;\r\n        case 2\r\n            stat(2) = stat(2) +1;\r\n        case 3\r\n            stat(3) = stat(3) +1;\r\n        case 4\r\n            stat(4) = stat(4) +1;\r\n        case 5\r\n            stat(5) = stat(5) +1;\r\n        case 6\r\n            stat(6) = stat(6) +1;\r\n        case 7\r\n            stat(7) = stat(7) +1;\r\n        case 8\r\n            stat(8) = stat(8) +1;\r\n        case 9\r\n            stat(9) = stat(9) +1;\r\n    end\r\nend\r\n\r\n%% Plot results\r\nstatPercent = stat \/ sum(v ~= 0);  %only use non-zero numbers for stats\r\nbar(statPercent);\r\ngrid on;\r\nxlabel('First digit');\r\nylabel('Percent');\r\n\r\n%% How this is used\r\n% This is one test that is done to test if a data set is real or\r\n% fabricated.  For example, if you collect all the numbers on a federal\r\n% income tax return, it should also obey Benford's Law.\r\n%\r\n%% How would you use MATLAB to calculate these statistics?\r\n% As is typical with MATLAB, there are many ways to derive the same answer:\r\n%\r\n% * What MATLAB commands would you use to analyze the first digit of numbers in a data set?\r\n% * Does Benford's Law apply to a data set you have (or not)?  Show us your\r\n% results <<https:\/\/blogs.mathworks.com\/loren\/?p=454#respond here>.\r\n\r\n##### SOURCE END ##### e0c963f5b1b042f3b62e250780ad1f07\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      I'd like to introduce this week's guest blogger Sam Mirsky.  Sam is an Application Engineer here at MathWorks who focuses\r\n         on real-time testing applications using Simulink. ... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2012\/06\/01\/benfords-law-what-are-the-odds-that-the-first-digit-is-a-1-2\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[33],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/454"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=454"}],"version-history":[{"count":8,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/454\/revisions"}],"predecessor-version":[{"id":462,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/454\/revisions\/462"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}