Round, With Ties to Even

We are considering a MATLAB Enhancement Request to support options for round(x) when x is exactly halfway between two integers.

Contents

Classic round

The Classic MATLAB calculator was written in Fortran forty years ago, before IEEE 754 and before MathWorks. It had only 71 functions and it was not easy to add more. But one of those functions was ROUND. The help entry had only one sentence.

   ROUND   ROUND(X) rounds the elements of X to the nearest integers.

That doesn't say what happens when there is a tie. The code for ROUND was essentially this one line, which relied on FLOOR.

   ROUND(X) = SIGN(X)*FLOOR(ABS(X) + 0.5)

In particular:

   ROUND(0.5) = 1.0
   ROUND(1.5) = 2.0.

When MathWorks began producing the modern versions of MATLAB in 1984, we retained this simple definition for round.

roundoff

The function round can be used to clean up after roundoff.

   format long e
   H = hilb(5)
   X = inv(H)
   X = round(X)
   XH = X*H
   XH = round(XH)
H =

  Columns 1 through 3

     1.000000000000000e+00     5.000000000000000e-01     3.333333333333333e-01
     5.000000000000000e-01     3.333333333333333e-01     2.500000000000000e-01
     3.333333333333333e-01     2.500000000000000e-01     2.000000000000000e-01
     2.500000000000000e-01     2.000000000000000e-01     1.666666666666667e-01
     2.000000000000000e-01     1.666666666666667e-01     1.428571428571428e-01

  Columns 4 through 5

     2.500000000000000e-01     2.000000000000000e-01
     2.000000000000000e-01     1.666666666666667e-01
     1.666666666666667e-01     1.428571428571428e-01
     1.428571428571428e-01     1.250000000000000e-01
     1.250000000000000e-01     1.111111111111111e-01


X =

  Columns 1 through 3

     2.499999999998526e+01    -2.999999999997500e+02     1.049999999998995e+03
    -2.999999999997500e+02     4.799999999995864e+03    -1.889999999998367e+04
     1.049999999998995e+03    -1.889999999998367e+04     7.937999999993641e+04
    -1.399999999998561e+03     2.687999999997693e+04    -1.175999999999111e+05
     6.299999999993241e+02    -1.259999999998929e+04     5.669999999995910e+04

  Columns 4 through 5

    -1.399999999998561e+03     6.299999999993241e+02
     2.687999999997693e+04    -1.259999999998929e+04
    -1.175999999999111e+05     5.669999999995910e+04
     1.791999999998769e+05    -8.819999999994377e+04
    -8.819999999994377e+04     4.409999999997448e+04


X =

          25        -300        1050       -1400         630
        -300        4800      -18900       26880      -12600
        1050      -18900       79380     -117600       56700
       -1400       26880     -117600      179200      -88200
         630      -12600       56700      -88200       44100


XH =

  Columns 1 through 3

     1.000000000000000e+00                         0                         0
                         0     1.000000000000000e+00                         0
                         0                         0     1.000000000000000e+00
                         0                         0    -3.637978807091713e-12
                         0                         0                         0

  Columns 4 through 5

                         0                         0
                         0                         0
                         0                         0
     1.000000000000000e+00                         0
                         0     1.000000000000000e+00


XH =

     1     0     0     0     0
     0     1     0     0     0
     0     0     1     0     0
     0     0     0     1     0
     0     0     0     0     1

flints

round works on an array element-wise, mapping all elements to flints, floating point numbers whose values are integers. All values less than one-half are mapped to zero. All values larger than flintmax are already flints, so they are mapped to themselves. And all values between one-half and flintmax, except those exactly halfway between two flints, are mapped to the nearest flints. Pretty much everybody agrees to all of that.

The only leeway, and the primary reason for the Enhancement Request and this blog post, is the behavior of round for values exactly halfway between two flints.

round

I suggest that we add a tie-breaker option to the built-in function round. If you don't use this ties option, round will continue to function as it always has. And if you never encounter a tie, all of these options give the same results.

The most important option is 'even'. There is a statistical argument to be made in its favor. If you never encounter a tie, the mean of expected changes made by rounding data is zero. With the 'even' option', that is still true even if you do encounter ties.

The 'up' option is the default and the traditional MATLAB behavior. The 'odd' and 'down' options together provide a mirror image of the 'even' and 'up' pair.

   type Round
function r = round(x,ties)
% r = round(x) rounds the elements of x to the nearest integers.
% Elements halfway between integers are rounded away from zero.
%
% r = round(x,'even') rounds to integers, ties to even.
% r = round(x,'odd') rounds to integers, ties to odd.
% r = round(x,'down') rounds to integers, ties towards zero.
% r = round(x,'up') rounds to integers, ties away from zero (default).
% 
   a = abs(x) + 0.5;
   r = floor(a);
   if nargin == 2
       switch ties
           case 'even'
               m = (r == a) & (mod(r,2) == 1); 
           case 'odd'
               m = (r == a) & (mod(r,2) == 0);
           case 'down'
               m = (r == a);
           case 'up'
               m = [];
       end
       r(m) = r(m) - 1;
   end
   r = sign(x).*r;
end

test round

   x = (0.5:1:4.5)';
   xRound(x)
     x    round  even  down   odd 

   0.500     1     0     0     1
   1.500     2     2     1     1
   2.500     3     2     2     3
   3.500     4     4     3     3
   4.500     5     4     4     5

cities

Download the spread sheet available at www.census.gov/cities and use the Import Wizard to select the last column. This gives us the population of 788 US cities in 2019.

    load cities.mat cities census

Round the data to nearest 1,000.

    rcensus = 1000*Round(census/1000);

Here are the five largest cities, their population, and that population rounded.

    disp(cities(1:5))
    disp([census(1:5) rcensus(1:5)])
    "New York city, New York"
    "Los Angeles city, California"
    "Chicago city, Illinois"
    "Houston city, Texas"
    "Phoenix city, Arizona"

     8336817     8337000
     3979576     3980000
     2693976     2694000
     2320268     2320000
     1680992     1681000

And here are the five smallest,

    disp(cities(end-4:end))
    disp([census(end-4:end) rcensus(end-4:end)])
    "Lakewood city, Ohio"
    "Troy city, New York"
    "Saginaw city, Michigan"
    "Niagara Falls city, New York"
    "Charleston city, West Virginia"

       49678       50000
       49154       49000
       48115       48000
       47720       48000
       46536       47000

Here is the histogram of the changes made by rounding. Despite its ragged appearance, this is uniform.

    delta = census - rcensus;
    histogram(delta,40)

About half of the 788 rounded down and about half rounded up. Only one city reported a 2019 population that was already a multiple of 1,000.

   disp([nnz(delta < 0), nnz(delta == 0), nnz(delta > 0)])
   396     1   391

   k = find(delta == 0)
   disp(cities(k))
   disp([census(k) rcensus(k)])
k =

    69

Anchorage municipality, Alaska
      288000      288000

How many cities reported population ties, i.e. populations halfway between multiples of 1,000?

   k = find(mod(census,1000)==500)
   disp(cities(k))
   disp([census(k) rcensus(k)])
k =

   131
   189
   757

    "Tallahassee city, Florida"
    "Roseville city, California"
    "Burien city, Washington"

      194500      195000
      141500      142000
       51500       52000

Only one city out of the 788 reported a tie and a population that rounded to an odd multiple of 1,000, there by triggering the 'even' option.

   ecensus = 1000*Round(census/1000,'even');
   k = find(ecensus ~= rcensus)
   disp(cities(k))
   disp([census(k) rcensus(k)  ecensus(k)])
k =

   131

Tallahassee city, Florida
      194500      195000      194000

IEEE 754

The IEEE 754 rounding modes are easily confused with the round function. IEEE 754 is about hardware; round is software. Floating point arithmetic operations are taking place at an accuracy scale that is 16 orders of magnitude finer and at a rate that is many orders of magnitude faster. Some of the considerations are similar, but the setting and the details are very different.

Which?

Should MATLAB have something like round(x,'even')? It's really a matter of taste. I happen to like the way we've always done things. To others, the statistical properties of round-ties-to-even might be relevant.

But I think the most important argument in favor of the Enhancement Request is providing compatibility with other mathematical software that doesn't offer a choice.




Published with MATLAB® R2020a

|
  • print

コメント

コメントを残すには、ここ をクリックして MathWorks アカウントにサインインするか新しい MathWorks アカウントを作成します。