Doug's MATLAB Video Tutorials

April 8th, 2010

Puzzler: Benford’s law

I had a question come in recently from Denmark about wanting to implement Benford’s law.

I first heard about this law on the Radiolab episode about Numbers. Radiolab is one of my favorite podcasts, highly recommended!

This law basically says that

“in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 almost one third of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than one time in twenty. This distribution of first digits arises whenever a set of values has logarithms that are distributed uniformly, as is approximately the case with many measurements of real-world values.”
(wikipedia)

Apparently this law can be applied to measurements of natural phenomenon like coastal windfarm energy output, numbers of geese migrating, and various areas of waterflow if they follow the above described distribution.

Where this law becomes more than a novelty is looking at accounting expense reports. Don’t try and sneak those tickets to see the OB play or the Mighty Boosh in among your travel expenses (even if you are about to leave the company!) When people are making up fake expenses, they will choose numbers from a normal distributions. However, they should be choosing them with more ones, and less nines as the leading digit!

Anyways, I found it surprisingly challenging to extract the leading non-zero digit from a number. Remember, the number can be a decimal, negative or both. I am always amazed at the elegant solutions people come up with for simple problems like this.

SAfe to say that if you post your function in the comments, the most elegant (whatever that means at the time I am looking them over…) get’s some MATLAB swag.

0.001345 –> 1

3452.3 –> 3

-582.3 –> 5

etc…

21 Responses to “Puzzler: Benford’s law”

  1. Mark Patterson replied on :
    f = @(x) floor(abs(x) ./ 10.^(floor(log10(abs(x))))) ;

    The denominator normalizes x by dividing by the highest power of 10 that is less than x. This yeilds a number between 1 and 9.9999999. Then we just use floor to get the leading digit. To handle negative numbers, we just use abs(x) instead of x. This will break down for x=0, but then I’m not sure what the best return value is in that case.

  2. Karthik replied on :

    Guess this works, but has four function calls at least:

    floor(abs(x./10.^floor(log10(abs(x)))))
    
  3. the cyclist replied on :

    This was the best I could come up with, during a quick lunchtime diversion. I would NOT classify this as elegant, but at least it sets a bar:

    >> d = abs(fix(x./(10.^floor(log10(abs(x))))))

  4. the cyclist replied on :

    lol … I guess I was 8 minutes too slow (but I swear it was independent)

  5. Doug Schwarz replied on :
    s = sprintf('%.16e',abs(x));
    d = s(1); % or d = s(1) - '0';
    
  6. the cyclist replied on :

    Doug’s is pretty interesting. Won’t work on vector x, though.

  7. MATLABician replied on :
    % for scalar x
    x_str = num2str(x);
    x_str(find(x_str > '0', 1))
    
  8. Sven replied on :

    I was thinking along Doug’s lines… but was too late.
    Anyway, this one is vectorable.

    function leadingDigits = extractLeadingDigit(x)
    str = num2str(abs(x(:)),’%.16e’);
    leadingDigits = reshape(str2num(str(:,1)),size(x));
    end

  9. Daniel Armyr replied on :

    I see solutions have allready been presented. I wrote something quite similar to Mark Pattersons solution in my first edition of cloudPlot. Knowing the first digit of your ranges is critical to drawing pretty axis with appropriate limits and grid sizes.

  10. Joe Kirk replied on :

    My first thought was:

    str = num2str(x);
    firstDigit = str(regexp(str,'[1-9]','once'))
    

    … but alas, it does not work for vectors.

  11. Joe Kirk replied on :

    However, Doug’s could be vectorized as follows…

    str = sprintf('%1.16e',abs(x))';
    firstDigit = reshape(str2num(str(1:23:end)),size(x))
    
  12. Kenneth Eaton replied on :

    Here’s are vectorized solution using REGEXP:

    numberStrings = cellstr(num2str(x(:)));
    firstIndices = regexp(numberStrings,'[1-9]','once');
    firstNumbers = cellfun(@(s,i) s(i),numberStrings,firstIndices);
    

    One other interesting tidbit: the string-based solutions will give the right answer for x = eps(realmin), while the log-based solutions give Inf.

  13. Kenneth Eaton replied on :

    Oops! Forgot to include “,16″ in the call to NUM2STR.

  14. Zane replied on :

    Keep values numerical instead of string-based.
    —————-

    function first_dig_input(x)
    y=abs(x);
    if y < 1            % for input < 1
        while y  10       % for input > 10
        while y > 10
            y=y./10;
        end
    else
    end
    soln=floor(y);
    sprintf('The first digit of %f is %d.',x,soln)
    
  15. Zane replied on :

    ‘<’ wordpress error fixed
    —————

    function first_dig_input(x)
    y=abs(x);
    if y < 1
        while y < 1
            y=10*y;
        end
    elseif y > 10
        while y > 10
            y=y./10;
        end
    else
    end
    soln=floor(y);
    sprintf('The first digit of %f is %d.',x,soln)
    end
    
  16. B replied on :

    The decimal part can be extracted with mod( X, 1 ), so:

    >> X = [ 0.001345 3452.3 -582.3 ];
    >> D = floor( 10.^mod( log10( abs(X) ), 1 ) )
    
    D =
    
         1     3     5
    
  17. B replied on :
    >> X = 5000;
    >> D = floor( 10.^mod( log10( abs(X) ), 1 ) )
    
    D =
    
         4
    

    Welcome to sick sad world. Second version, less pretty, more functional:

    >> X = [ 0.001345 3452.3 -582.3 5000 ];
    >> D = floor( round( 10.^( 1 + mod( log10( abs(X) ), 1 ) ) ) / 10 )
    
    D =
    
         1     3     5     5
    
  18. B replied on :

    My second solution is worth than the first.

  19. dhull replied on :

    @Mark Patterson,

    All the solutions were good, yours was the first, so send me your t-shirt size and address. Will get something out to you soon.

    Thanks all!
    Doug

  20. Sam Mirsky replied on :

    Doug,

    I learned of Benford’s Law years ago and was facintated by it. It has been used to find fake tax returns among other things. The history of the discovery of this is also fascinating, and dates back to (paper) books of logarithm values used by scientists before calculators (or MATLAB) was available.

    To test the law, I used the script below on real earthquake data which is shipped with MATLAB. I feel there are much more elegant and efficient methods to program this, but this was my first thought.

    Besides the earthquake data, I also tried it with other shipping data: ‘seamount’, ‘penny’ (2-D), and ‘wind’ (3-D)

    %% Load Data
    load quake
    v = e;
    vdim = length(v);
    
    %% Find digit statistics
    stat(1:9) = 0;
    for i = 1:vdim
        s = sprintf('%0.5e', abs(v(i)));
        v2(i) = str2num(s(1,1));
        switch v2(i)
            case 1
                stat(1) = stat(1) +1;
            case 2
                stat(2) = stat(2) +1;
            case 3
                stat(3) = stat(3) +1;
            case 4
                stat(4) = stat(4) +1;
            case 5
                stat(5) = stat(5) +1;
            case 6
                stat(6) = stat(6) +1;
            case 7
                stat(7) = stat(7) +1;
            case 8
                stat(8) = stat(8) +1;
            case 9
                stat(9) = stat(9) +1;
        end
    end
    
    %% Plot results
    statn = stat / vdim;
    bar(statn);
    grid on;
    
  21. dhull replied on :

    @Sam,

    Interesting. There is a weird spike in the ’5′ bin, but it fits pretty well otherwise!

    Doug

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


MathWorks

Doug Hull is a proud MathWorker who is on a mission to help you with MATLAB.

Doug's picture

These postings are the author's and don't necessarily represent the opinions of The MathWorks.