Practical Advice for People on the Leading Edge

Understanding Tolerances in Ordinary Differential Equation Solvers

This is a guest blog post by Michael Hosea, a numerical analyst at MathWorks. He works on MATLAB Coder and on MATLAB’s ODE and integral solvers.
MATLAB's ODE solvers have tolerances that the user can change. Users are often reluctant to set these tolerances, perhaps because they think they are only for power users, and they are reluctant to alter “factory settings.” The sentiment is understandable. Tolerances do occur in many different contexts in MATLAB, and some of them are that sort of thing. For example, you probably won't need or want to change the default tolerance in the rank function. But as we're going to see, that's not the situation with ODE solvers. It may surprise you to hear that we don't actually know how to set your tolerances in an ODE Solver.
Most of what follows will be geared towards understanding how to do exactly that, but the concepts apply much more broadly. The application is direct with integral, ismembertol, and uniquetol, to name a few, despite differences in how tolerances are supplied and used. Even more generally, the principles apply in unit testing when one has exact or reference values to compare with results from software being tested.
Of course we did have reasons for the default tolerances used in the ODE solvers. We wanted the default tolerances to request enough accuracy for plotting, and it probably doesn't take a lot of accuracy to do that. Also, we're using the same defaults for all the solvers. Consequently, they needed to work with low-order and high-order solvers alike. Loose tolerances tend to work well enough with high-order solvers. On the other hand, I don't know if you've ever tried using tight tolerances with low-order solvers, but patience is virtue. Barring enough of that, ctrl-c is your friend. The resulting default tolerances are rather loose. Unfortunately, we don't know the context of the problem you are trying to solve. You might want more accuracy than these default tolerances are requesting, particularly if you are using one of the higher order methods. So let's talk about about the tolerances and how to go about setting them for your purposes.
Fortunately, no great understanding of the software is needed to set tolerances in a reasonable way, only an understanding of the problem you are trying to solve and what these tolerances mean. We should be able to set reasonable tolerances if we can answer the following questions about the problem we are trying to solve:
  1. How small of a value do we consider negligible?
  2. How precise is the data that defines the problem?
  3. What percentage error is small enough to ignore? Or, put differently, how many significant digits of accuracy do we need?
Our ODE solvers allow you to set two different tolerances. One is the absolute tolerance and the other relative tolerance. These two work together. Absolute tolerance is the simpler of the two, so let's start there.

Absolute Tolerance

We say an approximate result y is within an absolute tolerance A of an exact or reference value y0 if
Here we assume A0. Subtracting y0 from each term gives us
which we can write succinctly as
When y0 is the exact value we are trying to approximate with y, we call the quantity on the left the absolute error.


Before we go too far, let's just point out the obvious. We don't usually know y0. If we did, there wouldn't be much point in doing any work to approximate it unless we were just testing the code. Codes like integral and the ODE solvers compute an estimate of the absolute error and use that in place of |y-y0|. Sometimes it is an approximate bound on the error rather than an estimate per se. Exactly how this is done isn't important to our discussion here except to say that if things are going well, this estimate should be about the same size as the actual error.
So, we're done, right? An absolute tolerance should be able to get the job of describing the accuracy we need done no matter what we're doing. Well, yes and no. Suppose you're solving a problem where the correct answer is, say, 10 meters, and you'd like an answer to be correct to within ±0.5mm. So, for that, you'd set your absolute tolerance to A=0.0005m. Notice that the absolute tolerance in a physical context has units, in this case meters because y and y0 are in meters.
So we write some code that defines
AbsoluteTolerance = 0.0005;
and we keep the units in our head. We're used to doing that.
But what if the solution to our problem isn't just a scalar value, rather an array of values representing different kinds of quantities, maybe lengths, areas, volumes, mass, temperature, you name it. Assuming the software allows you to set a different absolute tolerance for each element of a solution array in the first place, you'd need a pretty good idea of the size of the solution element to decide ahead of time what the absolute tolerance on each element should be. Probably we can't just take a number like 0.0005 and think it's going to be appropriate for everything.
Moreover, how did we decide that ±0.5mm was reasonable in the first place? For something 10m long, that's rather precise for many practical purposes, often enough even for something only 1m long, but it isn't for something 0.001m long. In that case it covers a substantial difference percentage-wise. Sometimes it's not so much the absolute error that concerns us, rather the error in a "percentage" sense.
And wouldn't it be convenient to have a way of describing the accuracy we want using a dimensionless tolerance? It would be nice to have something that automatically adjusts itself for different scaling, something that we don't have to change if we merely restate the same problem using different units of measurement. After all, it shouldn't be any harder or easier to compute something to 0.5mm than 0.0005m; that's the same distance.

Relative Tolerance

You're probably familiar with the concept of significant digits or significant figures when reporting data. The basic idea is that if you see the measurement of some kind, say x=1.5mm, it corresponds to an unspecified exact quantity that has (hopefully) been correctly rounded to 1.5mm. This means that
since that is the interval of numbers that rounds to 1.5mm. Expressed in the parlance of absolute error, we're saying that the absolute error
That formulation includes the upper bound of 1.55mm, but we needn't quibble about that insofar as tolerances are ultimately used with error estimates, anyway. If we had instead written x=1.500mm, we would be implying that
hence that
Here again we have, without apology, added in the upper end point. In the first case we say that we have 2 significant digits, and the latter 4.
Note that 0.05 can be written as 5×10-2, and 0.0005 as 5×10-4. The exponents seem to indicate the number of significant digits. Unfortunately, the conjecture doesn't hold up. To see this, let's just change units of measurement. That shouldn't change the precision of anything. Switching from millimeters to microns, we have in the first case
and in the second,
1499.5μxexact<1500.5μ .
And again, in the language of absolute error, we have
respectively. The "absolute tolerances" on the right are 5×10+1μ and 5×10-1μ. Now the exponents don't seem to bear any relationship to the number of significant digits.
The problem is clearly scaling. Because absolute error has units of measurement, merely changing units of measurement rescales the data. So how can we express the idea that we want a certain number of digits to be correct? We need something that isn't sensitive to scaling. Enter the idea of relative error:


All we've done here is divide the absolute errory by |y0|. Obviously we are assuming that y00. More on that later.
Note that relative error is just the percentage difference formula with absolute values (and without the final multiplication by 100% to convert to percent).

If you multiply a relative tolerance by 100%, you get the percentage error that the tolerance allows.

This observation may help later when we discuss how to choose the relative tolerance.
A hopeful sign is that the relative error is now dimensionless because the numerator and denominator have the same units. Obviously any scaling of y cancels out. Let's use R from here on out to denote a relative error tolerance. So we want
Clearing the fraction, we get
That's usually how we use relative error in software, since it avoids the division.
Let's try this out on an example. Suppose y0=1234500. Looks like 5 significant digits there. Let's try R=5×10-5. Plugging in, this becomes
Which means that
If we were to change the units on y, we would also scale those end points by the same factor as y itself is changed, just as if we had multiplied the inequality through by the scaling factor.
Unfortunately, those end points don't round to 1234500 when expressed with 5 significant digits. The interval is wider than we want it to be if we wanted to guarantee 5 significant digits correct. If, instead, we tighten it up to R=5×10-6and repeat the exercise, we end up with
Now it's tighter than we needed, but those endpoints do round to 1234500 written to 5 significant digits. This generalizes:.

A relative tolerance for at least n significant digits is


You can use this relative tolerance when you don't know what y0 is or if you want the same formula to work when y0 varies.
Just as an aside, the interval we would have wanted to see was
You might be curious what relative tolerance gives this. If you're not the least bit curious, skip to the next section, because we won't be using this later.
Define a preliminary tolerance R=5×10-(n+1) as above. Then the relative tolerance that gives the loosest bounds that require n correct significant digits is
The exponent there is just the fractional part of c. In our example the tolerance works out to be about 8.1 times larger than R. This should be no surprise since we already knew from our experimentation that R was too tight and10R too loose. There's nothing very nice or memorable there, and we can only construct this "ideal" relative tolerance if we know y0, so we will not make further use of this fact, but if you were setting up a unit test comparing software output to a known correct result, you could make use of it.

What if y0=0?

The definition of relative error has a serious problem when the reference value y0 is zero. The requirement
becomes impossible to meet. As previously mentioned, we normally avoid the division, anyway, and evaluate
That avoids the division by zero, but it hardly helps very much when y0=0 because the test is only satisfied when y=y0, i.e. when the absolute error is zero.
It might be helpful to think about this as a limiting case. Consider a sequence of problems where the exact solution is not zero but is smaller and smaller in magnitude. Let's say the exact solution is y0=12345×10-K for increasing values of K.You might imagine that we're rewriting the same physical quantity in larger and larger units of measurement as K increases.
Leading zeros to the right of the decimal point aren't "significant" in the sense of significant digits, but obviously they determine the magnitude of the number. Relative error control is fine for all these values as long as all those leading zeros are computationally free to obtain. That might sound unlikely, but it actually could be the case if this were only a matter of scaling, say when switching units from millimeters to light years.
In the limiting case of y0=0, however, relative error control assumes that all those zeros are "insignificant", computationally free to obtain. Unfortunately, when y0=0 those leading zeros are all the digits that there are!
It still could be the case.that they are free. For example, if everything in the problem is zero, then the intermediate computations are probably going to yield zero at every stage. Or it might happen if there is symmetry that results in perfect cancellation, e.g. integrating an odd function over the interval [-1,1] and exact zero is obtained not because the intermediate calculations are exact, rather because the intermediate results cancel out perfectly, i.e., are equal and opposite, regardless of how inexact they might be.
These are common enough scenarios, but they are not always the case. Generally speaking, it is not easier to compute the ideal result y=0 when y0=0 than it is to compute the ideal result y=1.234567890 when y0=1.234567890. Just as it is usually impractical to require that the absolute error is zero, so it is impractical to impose any relative tolerance when y0=0. As convenient as relative error control is, we have a hole at zero that we need to fill somehow.

Combining Absolute and Relative Error Control

Absolute error is easy to understand but can be difficult to use when the problem contains values of different scales, and we need know something about the magnitude of the solution values before we have computed them. Controlling the error in the relative sense instead rectifies these limitations of absolute error control, but it doesn't work in any practical way when the desired solution happens to be very close to zero.
Fortunately, y0=0 is a value that doesn't have any scaling to worry about. Theoretically, then, absolute error should work fine at and very near zero, while relative error works everywhere else. So let's splice them together. Historically this has been accomplished in more than one way, but more commonly today it is done like this:
or in MATLAB code
abs(y - y0) <= max(AbsoluteTolerance,RelativeTolerance*abs(y0))
As previously mentioned, we don't generally know y0, and the left-hand side will be an estimate of the absolute error obtained somehow or other. If we also assume that the value of y that we have computed at least has the right magnitude, we can substitute it for y0 in that expression. So in software the test will take the form
errorEstimate <= max(AbsoluteTolerance,RelativeTolerance*abs(y))
Here errorEstimate is an estimate of the absolute error, or possibly an approximate upper bound on it. Using max, not min, ensures that we choose the least restrictive of the two tolerances. This is important because the hole we are trying to fill occurs because relative error control becomes too restrictive near y0=0.
Perhaps it is interesting to observe where the switchover from using the absolute tolerance to using the relative tolerance occurs. It occurs when AbsoluteTolerance=RelativeToleranceabs(y). In other words, it occurs when
For example, the default tolerances for the integral function are AbsoluteTolerance = 1e-10 and RelativeTolerance = 1e-6. Consequently when we estimate that a quantity is less than 1e-10/1e-6 = 1e-4 in magnitude, we use AbsoluteTolerance and control the error in the absolute sense, otherwise we use RelativeTolerance and control it in the relative sense.
Not all error control strategies in numerical software have been formulated that way, but they can have similar effects. Often only one tolerance was accepted, and in that case one could adapt the above strategy to
errorEstimate <= tol*max(1,abs(y))
This effectively makes the two tolerances the same, so that absolute error control is used for abs(y)1and relative error control for abs(y)1. Since error control is not an exact affair to begin with, and we have even substituted y for y0, addition can be used instead of the max function:
errorEstimate <= tol*(abs(y) + 1)
If abs(y) is very small, the factor on the right is just a little larger than 1, hence the test is essentially comparing the estimate of the absolute error with the tolerance. That's absolute error control. If, on the other hand, abs(y) is very large, the addition of 1 matters but little, and so the effect is relative error control.

Floating Point Limitations

Floating point arithmetic has limitations that may affect the tolerances we choose, so let's review what they are. MATLAB provides the functions eps, realmin, and realmax that tell you about limits in the floating point type.
Double precision:
Single precision
We used the eps function above without numeric inputs, but writing just eps with no inputs is the same as writing eps(1), and writing eps("single") is the same as writing eps(single(1)). In what follows we'll stick to the double precision case, but the same principles apply to single precision. The documentation says:
"eps(X) is the positive distance from abs(X) to the next larger in magnitude floating point number of the same precision as X."
Let's unpack this in case you've never thought about it. We use floating point numbers to model the real numbers, but there are infinitely many real values and only a finite number of floating point values. In double precision there are 252evenly spaced numbers in the interval 1x<2. This goes for any consecutive powers of 2, i.e. there are 252 evenly spaced double precision floating point numbers in the interval 2nx<2n+1, so long as -1022n1023. In MATLAB you also have that many in the interval 0x<2-1022. which is the same as 0x<realmin.
In every case, the very next larger floating point number after x0 is x+eps(x), so the spacing between the floating point numbers in the interval 2nx<2n+1 is eps(2n). The value of eps(x) is the same throughout the interval. For example:
n = 3;
x = linspace(2^n,2^(n+1),5)
x = 1×5
8 10 12 14 16
ans = 1×5
10-14 ×
0.1776 0.1776 0.1776 0.1776 0.3553
There are no floating point numbers between, say, 10 and 10+eps(10). If you try to compute the midpoint between a=x and b=x+eps(x), c=(a+b)2, the result will be either c=x or c=x+eps(x).
OK. So what's the point again? We're talking about floating point limitations on tolerances. With extremely tight tolerances, we will, in effect, be asking the code to return one of a small set of discrete values. For example, if we chooose an absolute tolerance of eps when y0=2, then by definition, we are asking for a result y such that
The floating point number 2-eps is less than 2, but it turns out to be the one floating point number immediately preceding 2. At the upper end point, however, 2+eps=2 in floating point, so the range above is just another way of saying
Some "range", that! We have allowed for only two possible values. Similarly, for a relative tolerance of eps, we get from applying the definition that
which turns out to be satisfied by just 4 possible values of y:
The upshot is that we will need to be careful when setting our tolerances so that we are asking for something reasonable.
We will always want to choose Reps. Even in testing scenarios where you have y0 in hand and you're just comparing outputs, you will probably want to use at least a small multiple of eps. The ODE and integral solvers will not allow anything smaller than 100eps.
With absolute tolerances things are more complicated because we must take account of the scale of values. An absolute tolerance can be smaller than eps when the solution values will be much smaller than 1. It can also be larger than 1. Consider a problem where the exact solution is 1 meter and anything smaller than 10 microns is negligible. If you were solving the problem using meters as your unit of measurement, then you'd set your absolute tolerance to 0.00001 meters because that's 10 microns. But now rescale the problem to use microns as your unit of measurement instead of meters. Now the exact solution is 1e6 microns, and your absolute tolerance would be 10 microns. Conversely, if you were to solve the problem in petameters, the exact answer is now 1e-15 petameters, and your absolute tolerance would be 1e-20 petameters. We're just changing units to illustrate the point. Your problem will have probably have units that are natural for the problem, and you'll need to deal with that as it comes. Whereas the value of eps places a natural limit on the relative tolerance, the realmin and realmax values present limitations on what you can use for an absolute tolerance. As with eps and the relative tolerance, you should avoid these extremes. Rescale the problem if you find yourself wanting to set an absolute tolerance anywhere near realmin or realmax.

How to set Tolerances

But how does one go about actually choosing tolerances for a computation that solves some real world problem? To answer that, we just need to answer a few questions.

1. How small of a value do we consider "negligible"?

How small is small enough that it is nothing for all practical intents and purposes? This might be different for different pieces of the solution, but it provides a reasonable value for the absolute tolerance. If the solution to your problem contains a weight for some powder that is 1.23456g, and your scale is only accurate to 0.1g, there's not much you can do with the extra 0.03456g in the result even if those numbers are correct. What we have there is 1.2g plus a negligible quantity. It makes sense to use tolerances that only ask for accuracy to, say, 0.01g. Suppose we did that and got 1.23654 instead. Now 0.00654 of that is "noise", but it's the same 1.2g when rounded to the 0.1g that our scale can actually measure.
Recall that an absolute tolerance is not limited by eps. It has the same units as the solution, and it can be freely set to whatever value makes sense, whether 1e-40 or 1e+40 depends on how the problem is scaled. Sometimes you might even be allowed to make it exactly zero, but that's a bold move unless you know the solution isn't close to zero.

2. How precise is the data that defines the problem?

If there are only a few significant digits of precision in the data that defines the problem to be solved, it's not likely that a more accurate solution will be more useful than one computed to about the same accuracy as the problem itself is specified. Of course, it's not really that simple. It may be that solution curves are very sensitive or very insensitive to variations in different problem data. It usually is not clear how the error in the problem specification manifests in the solution. Nevertheless, we are typically computing an approximate solution to an approximate problem, and that has implications. Given that we're starting with some error, how much are we willing to pay for extra digits of accuracy from the solver? The answer is probably "not a lot", and yet these extra digits will most likely cost us additional time and maybe computing resources to obtain. If the numerical data that defines the problem is only accurate to, say, 5 significant digits, we might reasonably decide to ask the solver for no more than 5 or 6 digits of accuracy when solving it. If 5 digits of accuracy seems OK to us, we could set R=510-6 or R=510-7 for good measure.

3. What percentage error is small enough to ignore? or How many significant digits of accuracy do we need?

If you said that a 0.01% error is small enough to ignore, then a reasonable relative tolerance would be 0.0001, as long as that isn't much beyond the precision of the data that defines the problem. If you prefer to think in terms of significant digits, and need, say, 5 significant digits, then a reasonable relative tolerance might be 5×10-6.
Then combine this with your answer to the previous question. Since there is some hand-waving about how the solution reacts to errors in the problem specification, one can make equally good arguments for picking the larger or the smaller of the two. If accuracy is especially important to you, use the smaller.
This is just a way of setting "reasonable" tolerance values. It provides a baseline of sorts. Nothing prevents you from choosing tighter or looser tolerances, and there can be good reasons to go with tighter or looser tolerances. For one thing, tolerances are based on error estimates, and in some cases not even error estimates for the final solution values. In ODE solvers, for example, they are used "locally", step by step. This can end up limiting the error in the final solution values in different ways to different degrees, so for some "insurance", slightly tighter tolerances can be a good thing. On the other hand, run times tend to go up with tighter tolerances. Memory requirements may also increase. If the solution changes but little and time is of the essence, looser tolerances might be justified, provided you are comfortable with the potential for larger errors. If the computation is occuring in a real-time system, for example, you would want to experiment with tolerances to see how the run-time and solution values react to tighter and looser tolerances.


Let's take a look at a first order system of ordinary differential equations.
This example has been constructed so that we know the exact solution.
function yp = f(t,y,param)
yp = zeros(size(y));
yp(1) = 2*t*(1 - y(2)) + 1;
yp(2) = 2*t*(y(1) - t - param);
function y = exactSolution(t,param)
y = zeros(2,1);
y(1) = cos(t.*t) + t + param;
y(2) = sin(t.*t) + 1;
And let's use this little comparison function I wrote so that we can easily see the accuracy of the results.
function compare(y,yExact)
fprintf(1," y(1) = %14.9f, y(2) = %14.9f\n",y(1),y(2));
fprintf(1,"exact y(1) = %14.9f, exact y(2) = %14.9f\n",yExact(1),yExact(2));
absoluteError = abs(y - yExact);
relativeError = absoluteError./abs(yExact);
fprintf(1,"absolute error y(1) = %5.0e, absolute error y(2) = %5.0e\n",absoluteError(1),absoluteError(2));
fprintf(1,"relative error y(1) = %5.0e, relative error y(2) = %5.0e\n\n",relativeError(1),relativeError(2));
Let's define this specific problem using as many digits as double precision allows. This will represent a problem that is specified exactly.
param = 1.234567890123456;
y0 = exactSolution(0,param); % "exact" initial value
D = ode(ODEFcn=@f,Parameters=param,InitialValue=y0);
t1 = 5; % Solve for the solution at this time point for comparisons.
yExact = exactSolution(t1,param); % The exact solution to our differential equation at t1.
% Problem specified to full precision. Using default tolerances.
sol = solve(D,t1);
y(1) = 7.227997543, y(2) = 0.876442186 exact y(1) = 7.225770702, exact y(2) = 0.867648250 absolute error y(1) = 2e-03, absolute error y(2) = 9e-03 relative error y(1) = 3e-04, relative error y(2) = 1e-02
The default tolerances are not tight, and it shows in the accuracy we get back. We got 3 digits correct in y(1) and just 1 digit correct in y(2). Let's suppose we want 8 digits. So we'll set the relative tolerance to 5e-9.
We've just made up this problem, but what amounts to a "negligible amount" in a system like this could in theory be different for y(1) than y(2). A little known fact is that the MATLAB ODE solvers support having different absolute tolerances for different solution components. This problem doesn't really depend on it, but just to show how it's done, assume that for y(1) we think a negligible amount is 1e-10 and for y(2), 1e-11.
% Problem specified to full precision. Asking for about 8 significant
% digits of accuracy.
D.RelativeTolerance = 5e-9;
D.AbsoluteTolerance = [1e-10,1e-11];
sol = solve(D,t1);
y(1) = 7.225770703, y(2) = 0.867648251 exact y(1) = 7.225770702, exact y(2) = 0.867648250 absolute error y(1) = 5e-10, absolute error y(2) = 1e-09 relative error y(1) = 7e-11, relative error y(2) = 1e-09
That's more like it. We got 9 digits correct in y(1) and 8 in y(2). But this is with the problem data at full precision. Your problem may have some measurements that aren't exact. Let's simulate that by tweaking some of the problem data while leaving the exact solution unchanged. Replacing the parameter and initial value with "approximate" values:
% Problem specified with 6 significant digits. Still asking for 8.
% significant digits.
param = 1.23457;
D.InitialValue = [param + 1; 1];
sol = solve(D,t1);
y(1) = 7.225772794, y(2) = 0.867647972 exact y(1) = 7.225770702, exact y(2) = 0.867648250 absolute error y(1) = 2e-06, absolute error y(2) = 3e-07 relative error y(1) = 3e-07, relative error y(2) = 3e-07
Maybe we solved the approximate problem just as accurately as before, but comparing to the "true" solution from the full precision problem, we only got 6 digits correct in y(1) and 5 digits correct in y(2). This is why you might want to let the precision of the data in the problem moderate your tolerance choices. Reducing our request to 6 digits, we get.
% Problem specified with 6 significant digits, now asking for 6
% significant digits.
D.RelativeTolerance = 5e-7;
D.AbsoluteTolerance = 1e-9;
sol = solve(D,t1);
y(1) = 7.225770368, y(2) = 0.867649499 exact y(1) = 7.225770702, exact y(2) = 0.867648250 absolute error y(1) = 3e-07, absolute error y(2) = 1e-06 relative error y(1) = 5e-08, relative error y(2) = 1e-06
This time we got 6 digits correct in y(1). We only got 5 digits correct in y(2), though, so maybe we should have set the relative tolerance at 5e-8. It's not a bad idea to ask for 7 digits when you need 6.

When Failures Occur

A failure related to tolerances usually occurs when the problem is too difficult to solve to the requested accuracy. If that is the case, then one must either use an algorithm that is able to solve that problem more easily, or one must relax the tolerances, or both. Failing that, the problem itself will have to be reformulated.
Sometimes, paradoxically, one must tighten the tolerances to succeed. There is an example of this here:
The tolerances for ode15s are specified in that example because the integration fails with the default tolerances, which are looser.
How is that even possible, you ask? Well, when you're driving a car on a mountain road, you need to keep the car on the road, and that's going to require a practical minimum amount of accuracy. If your tolerances are too loose and you drive off the cliff, then you won't get where you're going. How much accuracy you need in a mathematical problem is less clear than how to keep a car on the road, but the point is that if you get a failure, and if loosening the tolerances doesn't help, try tightening them. What have you got to lose? It might work. Probably the worst that's going to happen is that it fails quicker.
Now get out there and set those tolerances!
  • print


To leave a comment, please click here to sign in to your MathWorks Account or create a new one.