Zeroin, Part 2: Brent’s Version
Richard Brent's improvements to Dekker's zeroin algorithm, published in 1971, made it faster, safer in floating point arithmetic, and guaranteed not to fail.
At each iteration, the choice is made from four possible steps:
Here is a somewhat cluttered plot that shows most of these possibilities. The three values $a$, $b$, and $c$ are plotted in black. The function
$$ f(x) = x^3 - 3x - 2 $$
is plotted in light grey to emphasize that the algorithm so far has only evaluated it three times to give the three values shown by the circles.
Contents
Richard Brent
Richard Brent was a graduate student in computer science at Stanford in 1968-71. He wrote a Ph. D. thesis under George Forsythe's direction titled Algorithms for finding zeros and extrema of functions without calculating derivatives. I happened to be a visiting professor at Stanford and was on Brent's committee. Forsythe was editor of an important series of books in computer science published by Prentice-Hall. He was so impressed with Brent's thesis that he published it almost unchanged in the series in 1973 with the title Algorithms for Minimization without Derivatives. Three important algorithms were described in Brent's book. The algorithm for finding a zero of a function of one real variable, which is a successor to Dekker's zeroin, is still in use in MATLAB today. This is the algorithm that I want to write about here. The other two algorithms, one for minimizing a function of one real variable and one for minimizing a function of several real variables are no longer in wide spread use.Weakness of Zeroin
The version of zeroin that was presented by Dekker and that I described in Part 1 of this series is elegant and concise, but not foolproof. It has trouble with multiple roots. A simple example is provided by $$ f(x) = x^3 $$ If we start with a symmetric interval spanning the origin, the initial secant step hits the zero exactly, but the stopping criterion fails to recognize the good luck. The code takes a minimal step, which in this case is eps*realmin, the smallest possible denormal normal floating point number. It will continue essentially forever, taking impossibly small steps.f = @(x)x.^3;
zeroin(f,-1,1)
1 initial -1.000000000000000e+00 -1.000000000000000e+00 2 initial 1.000000000000000e+00 1.000000000000000e+00 3 secant 0.000000000000000e+00 0.000000000000000e+00 4 minimal -4.940656458412465e-324 -0.000000000000000e+00 5 minimal -9.881312916824931e-324 -0.000000000000000e+00 6 minimal -1.482196937523740e-323 -0.000000000000000e+00 7 minimal -1.976262583364986e-323 -0.000000000000000e+00 8 minimal -2.470328229206233e-323 -0.000000000000000e+00 . . . 12341 minimal -6.095781938389300e-320 -0.000000000000000e+00 12342 minimal -6.096276004035141e-320 -0.000000000000000e+00 12343 minimal -6.096770069680982e-320 -0.000000000000000e+00 12344 minimal -6.097264135326824e-320 -0.000000000000000e+00 . . .A starting interval that is not symmetric about the origin does not work any better. We don't hit the zero exactly, so we take some small secant steps for a while. But eventually we get bogged down repeatedly taking the minimal step, and just making roundoff in roundoff.
zeroin(f,-.5,1)
1 initial -5.000000000000000e-01 -1.250000000000000e-01 2 initial 1.000000000000000e+00 1.000000000000000e+00 3 secant -3.333333333333334e-01 -3.703703703703705e-02 4 secant -2.631578947368422e-01 -1.822423093745445e-02 5 secant -1.951779563719863e-01 -7.435193904825083e-03 6 secant -1.483300289942098e-01 -3.263527261310824e-03 7 secant -1.116805310361257e-01 -1.392940003647091e-03 8 secant -8.438934127192455e-02 -6.009838348927866e-04 . . . 124 secant -5.752411316818776e-16 -1.903486478002247e-46 125 minimal -5.752411316818775e-16 -1.903486478002246e-46 126 secant -4.143774132574865e-16 -7.115218233323205e-47 127 minimal -4.143774132574865e-16 -7.115218233323202e-47 128 minimal -4.143774132574864e-16 -7.115218233323200e-47 129 minimal -4.143774132574864e-16 -7.115218233323197e-47 130 minimal -4.143774132574863e-16 -7.115218233323194e-47 131 minimal -4.143774132574863e-16 -7.115218233323192e-47 132 minimal -4.143774132574862e-16 -7.115218233323189e-47 . . .
Two improvements
To avoid these difficulties, Brent made two important improvements to Dekker's algorithm. One is a check to see if an exact zero has accidentally been found. Another is to introduce the variable e in the code below that keeps track of the step size ratios in two successive steps. This variable is involved in the choice about whether to use bisection. As a result Brent's algorithm never takes more than twice as many steps as bisection alone. Let's see how these improvements affect our example in fzero.opt = optimset('display','iter')Starting with an interval that is centered about the origin, the exact zero results in an early termination.
fzero(f,[-1,1],opt)
Func-count x f(x) Procedure 2 1 1 initial 3 0 0 bisection
Zero found in the interval [-1, 1] ans = 0Starting with an interval that is not centered about the origin, the iteration terminates in a reasonable number of steps.
fzero(f,[-.5,1],opt)
Func-count x f(x) Procedure 2 -0.5 -0.125 initial 3 -0.333333 -0.037037 interpolation 4 -0.265664 -0.0187499 interpolation 5 -0.197929 -0.007754 interpolation 6 -0.197929 -0.007754 bisection 7 -0.133649 -0.00238724 interpolation . . . 151 -4.51944e-16 -9.2311e-47 interpolation 152 -7.85458e-18 -4.84584e-52 interpolation 153 -7.85458e-18 -4.84584e-52 bisection 154 -7.85458e-18 -4.84584e-52 interpolation
Zero found in the interval [-0.5, 1] ans = -7.854580142952130e-18
Muller's method
During the course of the iterations we often have three distinct values of $x$, namely $a$, $b$, and $c$, and corresponding function values, $f(a)$, $f(b)$, and $f(c)$. We could interpolate these three values with a quadratic in $x$, find the roots of this quadratic, and take the root that is closest to our best approximation so far to be the next approximation to the zero of $f(x)$. This is Muller's method. The trouble is the roots of the quadratic might be complex. This would be an advantage if we were looking for complex solutions, and Muller's method is often used in this situation. But we don't want them here and Brent chose to do something else.Inverse quadratic interpolation
Instead of a quadratic in $x$, we can interpolate the three points with a quadratic function in $y$. That's a ``sideways'' parabola, $P(y)$, determined by the interpolation conditions $$ a = P(f(a)), \ b = P(f(b)), \ c = P(f(c)) $$ This parabola always intersects the $x$-axis, which is $y = 0$. So $x = P(0)$ is the next iterate. This method is known as Inverse Quadratic Interpolation, abbreviated IQI. Here is a naive implementation that illustrates the idea. It uses polyinterp, taken from Numerical Computing with MATLAB type iqi
function x = iqi(f,a,b,c) while abs(c-b) > eps(abs(b)) x = polyinterp([f(a),f(b),f(c)],[a,b,c],0); disp(x) a = b; b = c; c = x; end endOne difficulty with this simple code is that polynomial interpolation requires the abscissae, which in this case are $f(a)$, $f(b)$, and $f(c)$, to be distinct. We have no guarantee that they are. For example, if we try to compute $\sqrt{2}$ using $f(x) = x^2 - 2$ and start with $a = -2, b = 0, c = 2$, we are starting with $f(a) = f(c)$ and the first step is undefined. If we start near this singular situation, say with $a = -2.001, b = 0, c = 1.999$, the next iterate is near $x = 500$. So IQI converges faster than secant once it is near the zero, but its global behavior is unpredictable. It needs to be combined with bisection to have a reliable algorithm.
Brent's algorithm
Like Dekker's original version, Brent's version of zeroin starts with an interval $[a,b]$ on which the function $f(x)$ changes sign. The objective is to reduce the interval to a tiny subinterval on which the function still changes sign. The variables $a$, $b$, and $c$ play the same role:- $b$ is the best zero so far, in the sense that $f(b)$ is the smallest value of $f(x)$ so far.
- $a$ is the previous value of $b$, so $a$ and $b$ produce the secant.
- $c$ and $b$ bracket the sign change, so $b$ and $c$ provide the midpoint.
- If $a \ne c$ , try inverse quadratic interpolation.
- If $a = c$, try secant interpolation.
- If the interpolation step is near the endpoint, or outside the interval, use bisection.
- If the step is smaller than the tolerance, use the tolerance.
zeroin_plot(1,2.4)The green line is the secant determined by $a$ and $b$. A secant step would go to the green $\times$ labeled $t$, but that's outside the interval so zeroin would not go there. Actually Brent's zeroin would not even be considering the secant in this situation because $a$ and $c$ are distinct. The blue curve is the quadratic in $y$ that goes through the three circles. It crosses the $x$ -axis at the blue $\times$ labeled $q$, so that would be the IQI step. However, in this case $q$ is more than three-quarters of the way from $b$ to $c$, so Brent would choose bisection instead. The red $\times$ labeled $m$ is the bisection point halfway between $b$ and $c$. With a value of 2.025 it happens to be really close to the zero at 2. Lucky shot.
Fzero
Brent included Algol and Fortran implementations of this algorithm in his book. Forsythe, Mike Malcolm, and I made the Fortran program the basis for the zero finding chapter of Computer Methods for Mathematical Computations. The code is still available from netlib. We named the MATLAB version fzero. Here is the textbook version, fzerotx, from Numerical Computing with MATLAB. This is not all of the MATLAB fzero. In my next blog post I will deal with the need to know an interval on which the function changes sign. type fzerotx
function b = fzerotx(F,ab,varargin) %FZEROTX Textbook version of FZERO. % x = fzerotx(F,[a,b]) tries to find a zero of F(x) between a and b. % F(a) and F(b) must have opposite signs. fzerotx returns one % end point of a small subinterval of [a,b] where F changes sign. % Arguments beyond the first two, fzerotx(F,[a,b],p1,p2,...), % are passed on, F(x,p1,p2,..). % % Examples: % fzerotx(@sin,[1,4]) % F = @(x) sin(x); fzerotx(F,[1,4]) % Copyright 2014 Cleve Moler % Copyright 2014 The MathWorks, Inc. % Initialize. a = ab(1); b = ab(2); fa = F(a,varargin{:}); fb = F(b,varargin{:}); if sign(fa) == sign(fb) error('Function must change sign on the interval') end c = a; fc = fa; d = b - c; e = d; % Main loop, exit from middle of the loop while fb ~= 0 % The three current points, a, b, and c, satisfy: % f(x) changes sign between a and b. % abs(f(b)) <= abs(f(a)). % c = previous b, so c might = a. % The next point is chosen from % Bisection point, (a+b)/2. % Secant point determined by b and c. % Inverse quadratic interpolation point determined % by a, b, and c if they are distinct. if sign(fa) == sign(fb) a = c; fa = fc; d = b - c; e = d; end if abs(fa) < abs(fb) c = b; b = a; a = c; fc = fb; fb = fa; fa = fc; end % Convergence test and possible exit m = 0.5*(a - b); tol = 2.0*eps*max(abs(b),1.0); if (abs(m) <= tol) | (fb == 0.0) break end % Choose bisection or interpolation if (abs(e) < tol) | (abs(fc) <= abs(fb)) % Bisection d = m; e = m; else % Interpolation s = fb/fc; if (a == c) % Linear interpolation (secant) p = 2.0*m*s; q = 1.0 - s; else % Inverse quadratic interpolation q = fc/fa; r = fb/fa; p = s*(2.0*m*q*(q - r) - (b - c)*(r - 1.0)); q = (q - 1.0)*(r - 1.0)*(s - 1.0); end; if p > 0, q = -q; else p = -p; end; % Is interpolated point acceptable if (2.0*p < 3.0*m*q - abs(tol*q)) & (p < abs(0.5*e*q)) e = d; d = p/q; else d = m; e = m; end; end % Next point c = b; fc = fb; if abs(d) > tol b = b + d; else b = b - sign(b-a)*tol; end fb = F(b,varargin{:}); end
댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.