# Arithmetic Associativity – Not So Fast 7

Posted by **Loren Shure**,

Arithmetic is associative, right? Well, in the world of paper and pencil, where you can often do calculations exactly, that can be true. However, in the computing world, where real numbers can't always be represented exactly because of working with finite precision datatypes, it turns out that you can't depend on the arithmetic to behave the way you were taught in grade school.

### Contents

#### Let's Do Some Math

Suppose I want to check the following:

$$ \sqrt {2} = 2/\sqrt {2} $$

I can do this analytically using the Symbolic Math Toolbox.

symsqrt2 = sqrt(sym(2)); shouldBeZero = symsqrt2 - 2/symsqrt2

shouldBeZero = 0

Now let's perform the same calculation numerically.

mightBeZero = sqrt(2) - 2/sqrt(2)

mightBeZero = 2.2204e-16

What is happening here is that we are seeing the influence of the accuracy of floating point numbers and calculations with them. I discussed this in an earlier post as well.

#### Let's Try Another Example

Now let's try something a little different. First, let's find out what the value of `eps` is for %sqrt {2}%. This should be the smallest (in magnitude) floating point number which, when added to %sqrt {2}%, produces a number different than %sqrt {2}%.

sqrt2eps = eps(sqrt(2))

sqrt2eps = 2.2204e-16

Next, we want a number smaller in magnitude than this to play with. I'll use half its value.

halfsqrt2eps = sqrt2eps/2

halfsqrt2eps = 1.1102e-16

And now let's calculate the following expressions, symbolically and numerically.

$$expr1 = \sqrt{2} - \sqrt{2} + halfsqrt2eps$$

$$expr2 = (\sqrt{2} - \sqrt{2}) + halfsqrt2eps$$

$$expr3 = \sqrt{2} + (-\sqrt{2} + halfsqrt2eps)$$

First we do them all symbolically.

expr1 = symsqrt2 - symsqrt2 + sym(sqrt2eps)/2 expr2 = (symsqrt2 - symsqrt2) + sym(sqrt2eps)/2 expr3 = symsqrt2 + (-symsqrt2 + sym(sqrt2eps)/2) double(expr1)

expr1 = 1/9007199254740992 expr2 = 1/9007199254740992 expr3 = 1/9007199254740992 ans = 1.1102e-16

Symbolic results are all the same and return half the value of `eps`.

Now we'll calculate the same expressions numerically.

expr1 = sqrt(2) - sqrt(2) + halfsqrt2eps expr2 = (sqrt(2) - sqrt(2)) + halfsqrt2eps expr3 = sqrt(2) + (-sqrt(2) + halfsqrt2eps)

expr1 = 1.1102e-16 expr2 = 1.1102e-16 expr3 = 2.2204e-16

So what's going on here? As I stated earlier, this example illustrates that floating point arithmetic is not associative the way symbolic arithmetic is. There's on reason to get upset about this. But it is worth understanding. And it might well be worth rewriting a computation occasionally, especially if you are trying to compute a very small difference between two large numbers.

#### Have You Rewritten Expressions to Get Better Accuracy?

Have you found yourself in a situation where you needed to rewrite how to calculate a numeric result (like here, by different groupings) to ensure you got a more accurate solution. Let me know about it here.

Get the MATLAB code

Published with MATLAB® R2013b

### Note

Comments are closed.

## 7 CommentsOldest to Newest

**1**of 7

Hello Loren,

thanks for this short overview on the issue. Incidentally I came along the same issue in distributive law:

Since

(22/1000-17/1000) – (22-17)/1000 = -2.602085213965211e-018

one always has to be careful as discussed. But for me, the issue was quite more involved. Consider the situation, where you are building vectors out of the calculation results:

[0:0.0005:(22/1000-17/1000)], [0:0.0005:(22-17)/1000]

From this you see, that the results do have different lengths:

ans =

0 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045

ans =

0 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045 **0.0050**

That is not to nice and finally, when putting to an outer loop like

for m = 18:22

array1(m-17,:) = [0:0.0005:(m-(m-2))/1000]

array2(m-17,:) = [0:0.0005:(m/1000-(m-2)/1000)]

end

you end up with an unexpected error

Subscripted assignment dimension mismatch.

@ m = 20 since array2 so far has 4 columns but suddenly gets 5 columns, the number of columns for array1.

This might be / is very annoying.

To conclude: I currently guarantee my precision by using

round((m/1000-(m-2)/1000)*1000)/1000

but would appreciate better solutions.

Thanks for your very nice and helpful blog.

Alexander

**2**of 7

Hello Loren,

me again. It is strange that the preview of the comment is so well formatted but the final, published version looks so poor.

Sorry for that, but where is the nice format gone?

Alexander

**3**of 7

Alexander,

I usually rewrite my loop counter differently if length is important. Something more like:

xi = xmin + (xmax-xmin)*(1:totalNum)/totalNum

Then I know the number of elements for sure.

–Loren

**4**of 7

Alexander-

You should see “?” mark next to “Write your comment:” which shows the markup hints that we had before. Then you could more fully format your comments.

–Loren

**5**of 7

expr3 = sqrt(2) + (-sqrt(2) + halfsqrt2eps)

Shouldn’t the above expression evaluate to zero instead of 2.2204e-16 as a number smaller the eps (sqrt(2) is being added to sqrt(2) ?

Thanks

**6**of 7

Rounding modes can be set differently, but generally it rounds nearest, except with a twist for handling ties. Normally if you were rounding, say, 3.5, to a whole number, you would round up to 4. Since 3.5 is halfway between 3 and 4, rounding up at the halfway point is arbitrary, and the twist I spoke of is that we usually round floating point numbers to the nearest float in an “unbiased” way by rounding ties to the nearest float with the least significant bit turned off. That means that about half the ties will be rounded up and about half the time rounded down. If you have a list of consecutive floating point numbers and add half of epsilon for each one to each one, the results will alternate between being unaffected and rounding to the next higher float. I’m not sure how to format answers here, but this illustrates the point in MATLAB with the 5 consecutive floating point numbers starting with 1.

>> x = 1; >> x = [x,x(end)+eps(x(end))]; >> x = [x,x(end)+eps(x(end))]; >> x = [x,x(end)+eps(x(end))]; >> x = [x,x(end)+eps(x(end))]; >> diff(x) > 0 ans = 1 1 1 1 >> x1 = x + eps(x)/2; >> x1 - x ans = 0 2.2204e-16 0 2.2204e-16 0

**7**of 7

I think Sanjan was concerned about something else, Mike, namely the definition of the function eps(x). If you read the help entry, you will see that eps(x) is the distance from x to the next larger floating point number. It is not the smallest number that affects x in floating point addition. By definition, the true mathematical value of (-sqrt(2)+eps(sqrt(2))/2) falls half way between two floating point numbers. So now Mike’s observations about rounding enter the picture and we find that this quantity is not rounded to -sqrt(2), but rather to -sqrt(2)+eps(sqrt(2)). Look at the quantities with format hex.

## Recent Comments