Zeroin, Part 2: Brent’s Version

Posted by Cleve Moler, October 26, 2015

12 views (last 30 days) | 0 Likes | 2 comments

Richard Brent's improvements to Dekker's zeroin algorithm, published in 1971, made it faster, safer in floating point arithmetic, and guaranteed not to fail.

Richard Brent
Weakness of Zeroin
Two improvements
Muller's method
Inverse quadratic interpolation
Brent's algorithm
Fzero
References

Richard Brent

Richard Brent was a graduate student in computer science at Stanford in 1968-71. He wrote a Ph. D. thesis under George Forsythe's direction titled Algorithms for finding zeros and extrema of functions without calculating derivatives. I happened to be a visiting professor at Stanford and was on Brent's committee. Forsythe was editor of an important series of books in computer science published by Prentice-Hall. He was so impressed with Brent's thesis that he published it almost unchanged in the series in 1973 with the title Algorithms for Minimization without Derivatives. Three important algorithms were described in Brent's book. The algorithm for finding a zero of a function of one real variable, which is a successor to Dekker's zeroin, is still in use in MATLAB today. This is the algorithm that I want to write about here. The other two algorithms, one for minimizing a function of one real variable and one for minimizing a function of several real variables are no longer in wide spread use.

Weakness of Zeroin

The version of zeroin that was presented by Dekker and that I described in Part 1 of this series is elegant and concise, but not foolproof. It has trouble with multiple roots. A simple example is provided by $$ f(x) = x^3 $$ If we start with a symmetric interval spanning the origin, the initial secant step hits the zero exactly, but the stopping criterion fails to recognize the good luck. The code takes a minimal step, which in this case is eps*realmin, the smallest possible denormal normal floating point number. It will continue essentially forever, taking impossibly small steps.

f =
    @(x)x.^3;

zeroin(f,-1,1)

     1  initial  -1.000000000000000e+00  -1.000000000000000e+00
     2  initial   1.000000000000000e+00   1.000000000000000e+00
     3  secant    0.000000000000000e+00   0.000000000000000e+00
     4  minimal -4.940656458412465e-324  -0.000000000000000e+00
     5  minimal -9.881312916824931e-324  -0.000000000000000e+00
     6  minimal -1.482196937523740e-323  -0.000000000000000e+00
     7  minimal -1.976262583364986e-323  -0.000000000000000e+00
     8  minimal -2.470328229206233e-323  -0.000000000000000e+00
         .
         .
         .
 12341  minimal -6.095781938389300e-320  -0.000000000000000e+00
 12342  minimal -6.096276004035141e-320  -0.000000000000000e+00
 12343  minimal -6.096770069680982e-320  -0.000000000000000e+00
 12344  minimal -6.097264135326824e-320  -0.000000000000000e+00
         .
         .
         .

A starting interval that is not symmetric about the origin does not work any better. We don't hit the zero exactly, so we take some small secant steps for a while. But eventually we get bogged down repeatedly taking the minimal step, and just making roundoff in roundoff.

zeroin(f,-.5,1)

     1  initial  -5.000000000000000e-01  -1.250000000000000e-01
     2  initial   1.000000000000000e+00   1.000000000000000e+00
     3  secant   -3.333333333333334e-01  -3.703703703703705e-02
     4  secant   -2.631578947368422e-01  -1.822423093745445e-02
     5  secant   -1.951779563719863e-01  -7.435193904825083e-03
     6  secant   -1.483300289942098e-01  -3.263527261310824e-03
     7  secant   -1.116805310361257e-01  -1.392940003647091e-03
     8  secant   -8.438934127192455e-02  -6.009838348927866e-04
         .
         .
         .
   124  secant   -5.752411316818776e-16  -1.903486478002247e-46
   125  minimal  -5.752411316818775e-16  -1.903486478002246e-46
   126  secant   -4.143774132574865e-16  -7.115218233323205e-47
   127  minimal  -4.143774132574865e-16  -7.115218233323202e-47
   128  minimal  -4.143774132574864e-16  -7.115218233323200e-47
   129  minimal  -4.143774132574864e-16  -7.115218233323197e-47
   130  minimal  -4.143774132574863e-16  -7.115218233323194e-47
   131  minimal  -4.143774132574863e-16  -7.115218233323192e-47
   132  minimal  -4.143774132574862e-16  -7.115218233323189e-47
         .
         .
         .

Two improvements

To avoid these difficulties, Brent made two important improvements to Dekker's algorithm. One is a check to see if an exact zero has accidentally been found. Another is to introduce the variable e in the code below that keeps track of the step size ratios in two successive steps. This variable is involved in the choice about whether to use bisection. As a result Brent's algorithm never takes more than twice as many steps as bisection alone. Let's see how these improvements affect our example in fzero.

opt = optimset('display','iter')

Starting with an interval that is centered about the origin, the exact zero results in an early termination.

fzero(f,[-1,1],opt)

  Func-count    x          f(x)             Procedure
     2               1             1        initial
     3               0             0        bisection

Zero found in the interval [-1, 1]
ans =
    0

Starting with an interval that is not centered about the origin, the iteration terminates in a reasonable number of steps.

fzero(f,[-.5,1],opt)

  Func-count    x          f(x)             Procedure
     2            -0.5        -0.125        initial
     3       -0.333333     -0.037037        interpolation
     4       -0.265664    -0.0187499        interpolation
     5       -0.197929     -0.007754        interpolation
     6       -0.197929     -0.007754        bisection
     7       -0.133649   -0.00238724        interpolation
         .
         .
         .
   151    -4.51944e-16   -9.2311e-47        interpolation
   152    -7.85458e-18  -4.84584e-52        interpolation
   153    -7.85458e-18  -4.84584e-52        bisection
   154    -7.85458e-18  -4.84584e-52        interpolation

Zero found in the interval [-0.5, 1]
ans =
    -7.854580142952130e-18

Muller's method

During the course of the iterations we often have three distinct values of $x$, namely $a$, $b$, and $c$, and corresponding function values, $f(a)$, $f(b)$, and $f(c)$. We could interpolate these three values with a quadratic in $x$, find the roots of this quadratic, and take the root that is closest to our best approximation so far to be the next approximation to the zero of $f(x)$. This is Muller's method. The trouble is the roots of the quadratic might be complex. This would be an advantage if we were looking for complex solutions, and Muller's method is often used in this situation. But we don't want them here and Brent chose to do something else.

Inverse quadratic interpolation

Instead of a quadratic in $x$, we can interpolate the three points with a quadratic function in $y$. That's a ``sideways'' parabola, $P(y)$, determined by the interpolation conditions $$ a = P(f(a)), \ b = P(f(b)), \ c = P(f(c)) $$ This parabola always intersects the $x$-axis, which is $y = 0$. So $x = P(0)$ is the next iterate. This method is known as Inverse Quadratic Interpolation, abbreviated IQI. Here is a naive implementation that illustrates the idea. It uses polyinterp, taken from Numerical Computing with MATLAB

   type iqi

  function x = iqi(f,a,b,c)
     while abs(c-b) > eps(abs(b))
        x = polyinterp([f(a),f(b),f(c)],[a,b,c],0);
        disp(x)
        a = b;
        b = c;
        c = x;
     end
  end

One difficulty with this simple code is that polynomial interpolation requires the abscissae, which in this case are $f(a)$, $f(b)$, and $f(c)$, to be distinct. We have no guarantee that they are. For example, if we try to compute $\sqrt{2}$ using $f(x) = x^2 - 2$ and start with $a = -2, b = 0, c = 2$, we are starting with $f(a) = f(c)$ and the first step is undefined. If we start near this singular situation, say with $a = -2.001, b = 0, c = 1.999$, the next iterate is near $x = 500$. So IQI converges faster than secant once it is near the zero, but its global behavior is unpredictable. It needs to be combined with bisection to have a reliable algorithm.

Brent's algorithm

Like Dekker's original version, Brent's version of zeroin starts with an interval $[a,b]$ on which the function $f(x)$ changes sign. The objective is to reduce the interval to a tiny subinterval on which the function still changes sign. The variables $a$, $b$, and $c$ play the same role:

$b$ is the best zero so far, in the sense that $f(b)$ is the smallest value of $f(x)$ so far.
$a$ is the previous value of $b$, so $a$ and $b$ produce the secant.
$c$ and $b$ bracket the sign change, so $b$ and $c$ provide the midpoint.

At each iteration, the choice is made from four possible steps:

If $a \ne c$ , try inverse quadratic interpolation.
If $a = c$, try secant interpolation.
If the interpolation step is near the endpoint, or outside the interval, use bisection.
If the step is smaller than the tolerance, use the tolerance.

Here is a somewhat cluttered plot that shows most of these possibilities. The three values $a$, $b$, and $c$ are plotted in black. The function $$ f(x) = x^3 - 3x - 2 $$ is plotted in light grey to emphasize that the algorithm so far has only evaluated it three times to give the three values shown by the circles.

   zeroin_plot(1,2.4)

The green line is the secant determined by $a$ and $b$. A secant step would go to the green $\times$ labeled $t$, but that's outside the interval so zeroin would not go there. Actually Brent's zeroin would not even be considering the secant in this situation because $a$ and $c$ are distinct. The blue curve is the quadratic in $y$ that goes through the three circles. It crosses the $x$ -axis at the blue $\times$ labeled $q$, so that would be the IQI step. However, in this case $q$ is more than three-quarters of the way from $b$ to $c$, so Brent would choose bisection instead. The red $\times$ labeled $m$ is the bisection point halfway between $b$ and $c$. With a value of 2.025 it happens to be really close to the zero at 2. Lucky shot.

Fzero

Brent included Algol and Fortran implementations of this algorithm in his book. Forsythe, Mike Malcolm, and I made the Fortran program the basis for the zero finding chapter of Computer Methods for Mathematical Computations. The code is still available from netlib. We named the MATLAB version fzero. Here is the textbook version, fzerotx, from Numerical Computing with MATLAB. This is not all of the MATLAB fzero. In my next blog post I will deal with the need to know an interval on which the function changes sign.

   type fzerotx

function b = fzerotx(F,ab,varargin)
%FZEROTX  Textbook version of FZERO.
%   x = fzerotx(F,[a,b]) tries to find a zero of F(x) between a and b.
%   F(a) and F(b) must have opposite signs.  fzerotx returns one 
%   end point of a small subinterval of [a,b] where F changes sign.
%   Arguments beyond the first two, fzerotx(F,[a,b],p1,p2,...),
%   are passed on, F(x,p1,p2,..).
%
%   Examples:
%      fzerotx(@sin,[1,4])
%      F = @(x) sin(x); fzerotx(F,[1,4])

%   Copyright 2014 Cleve Moler
%   Copyright 2014 The MathWorks, Inc.

% Initialize.
a = ab(1);
b = ab(2);
fa = F(a,varargin{:});
fb = F(b,varargin{:});
if sign(fa) == sign(fb)
   error('Function must change sign on the interval')
end
c = a;
fc = fa;
d = b - c;
e = d;

% Main loop, exit from middle of the loop
while fb ~= 0

   % The three current points, a, b, and c, satisfy:
   %    f(x) changes sign between a and b.
   %    abs(f(b)) <= abs(f(a)).
   %    c = previous b, so c might = a.
   % The next point is chosen from
   %    Bisection point, (a+b)/2.
   %    Secant point determined by b and c.
   %    Inverse quadratic interpolation point determined
   %    by a, b, and c if they are distinct.

   if sign(fa) == sign(fb)
      a = c;  fa = fc;
      d = b - c;  e = d;
   end
   if abs(fa) < abs(fb)
      c = b;    b = a;    a = c;
      fc = fb;  fb = fa;  fa = fc;
   end
   
   % Convergence test and possible exit
   m = 0.5*(a - b);
   tol = 2.0*eps*max(abs(b),1.0);
   if (abs(m) <= tol) | (fb == 0.0)
      break
   end
   
   % Choose bisection or interpolation
   if (abs(e) < tol) | (abs(fc) <= abs(fb))
      % Bisection
      d = m;
      e = m;
   else
      % Interpolation
      s = fb/fc;
      if (a == c)
         % Linear interpolation (secant)
         p = 2.0*m*s;
         q = 1.0 - s;
      else
         % Inverse quadratic interpolation
         q = fc/fa;
         r = fb/fa;
         p = s*(2.0*m*q*(q - r) - (b - c)*(r - 1.0));
         q = (q - 1.0)*(r - 1.0)*(s - 1.0);
      end;
      if p > 0, q = -q; else p = -p; end;
      % Is interpolated point acceptable
      if (2.0*p < 3.0*m*q - abs(tol*q)) & (p < abs(0.5*e*q))
         e = d;
         d = p/q;
      else
         d = m;
         e = m;
      end;
   end
   
   % Next point
   c = b;
   fc = fb;
   if abs(d) > tol
      b = b + d;
   else
      b = b - sign(b-a)*tol;
   end
   fb = F(b,varargin{:});
end

References

Richard P. Brent, "An algorithm with guaranteed convergence for finding a zero of a function," Computer Journal 14, 422-25, 1971. Richard P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, New Jersey, 1973, 195 pp. Reprinted by Dover, New York, 2002 and 2013. <http://maths-people.anu.edu.au/~brent/pub/pub011.html> George Forsythe, Michael Malcolm, and Cleve Moler, Computer Methods for Mathematical Computations, Prentice-Hall, 259pp, 1977.

Published with MATLAB® R2015a