SuperSum, In Defense of Floating Point Arithmetic
Floating point arithmetic doesn't get the respect it deserves. Many people consider it mysterious, fuzzy, unpredictable. These misgivings often occur in discussion of vector sums. Our provocatively named SuperSum is intended to calm these fears.
Contents
Ledgers
A ledger is a list of transactions in an account. Auditing the ledger involves comparing the total of the items with the change in the account balance.
If v is a vector of transaction amounts, then we need to compute
sum(v)
If this sum is equal to the change in the balance, then it is reasonable to assume that the ledger is correct. If not, the ledger must be examined line-by-line.
Checksums
Have you ever used a checksum for a file transfer? One checksum is computed before the file is sent. After the file has been sent over a questionable channel, a second checksum is computed on the receiving end. If the two checksums agree, the transmission was probably okay. If not, the file must be sent again.
Order
Floating point addition is not associative. This means
(a + b) + c
is not necessarily the same as
a + (b + c).
So the order of doing a computation is important.
Computers are deterministic devices. If the same computation is done in the same order more than once, the results must be the same. If you get different sums from different runs on a fixed computer, then it must be because the order has been changed.
Speed
In recent years, we have made the built-in function sum(x) faster by parallelizing it. The input vector is broken into several pieces, the sum of each piece is computed separately and simultaneously, then the partial sums are combined to give the final result. If the number and size of the pieces varies from run to run, the order varies and consequently the computed sums may vary.
Three Sums
Here are three replacements for nansum(x), the version of sum(x) that skips over NaNs and infs.
simplesum
Avoid the effect of blocking in the built-in sum(x).
function s = simplesum(x) % simplesum. s = simplesum(x), for vector(x). % Force sequential order for sum(x). % Skips NaNs and infs.
s = 0; for k = 1:length(x) if isfinite(x(k)) s = s + x(k); end end end
KahanSum
This is the Kahan summation algorithm. The sum is accumulated in two words, the more significant s and the correction c. If you were able to form s + c exactly, it would be more accurate than s alone.
function s = KahanSum(x) % KahanSum. s = KahanSum(x), for vector(x). % More accurate, but slower, than sum(x). % Skips NaNs and infs. % https://en.wikipedia.org/wiki/Kahan_summation_algorithm.
s = 0; % sum c = 0; % correction for k = 1:length(x) if isfinite(x(k)) y = x(k) - c; t = s + y; c = (t - s) - y; s = t; end end end
SuperSum
I gave it a pretentious name to grab attention. Use extended precision, with enough digits to hold any MATLAB double.
function s = SuperSum(x) % SuperSum. s = SuperSum(x), for vector(x). % Symbolic Math Toolbox extended precision. % Skips NaNs and infs. % % 632 decimal digits holds every IEEE-754 double. % 632 = ceil(log10(realmax) - log10(eps*realmin)); % din = digits(632); % Remember current setting s = double(sum(sym(x(isfinite(x)),'D'))); digits(din) % Restore end
Toy Example
A test case here at MathWorks, known by a French-Canadian name that translates to "toy example", has a vector e2 of length 4243 and values that range from -3.3e7 to 6.8e9.
When running tests involving floating point numbers it is a good idea to set the output format to hex so we can see every last bit.
format hex load jeuTest e2 x = e2;
[nansum(x) simplesum(x) KahanSum(x) SuperSum(x)]
ans = 423785e8206150e2 423785e8206150e0 423785e8206150e1 423785e8206150e1
For jeuTest, Kahan summation gets the same result as SuperSum, while nansum and simplesum differ in the last bit or two.
Second Test
The vector length is only three, but the third term completely cancels the first, and the second term rises from obscurity. In this situation, KahanSum is no more accurate than sum.
format hex x = [1 1e-14 -1]'
[nansum(x) simplesum(x) KahanSum(x) SuperSum(x)]
x = 3ff0000000000000 3d06849b86a12b9b bff0000000000000
ans = 3d06800000000000 3d06800000000000 3d06800000000000 3d06849b86a12b9b
Conclusion
I will leave careful timing for another day. Let's just say that in situations like jeuTest, KahanSum is probably all you need. It is usually more accurate than sum, and not much slower.
However, for complete reliability, use SuperSum. There is no substitute for the right answer.
- Category:
- Numerical Analysis,
- Performance,
- Precision,
- Symbolic
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.