{"id":3845,"date":"2020-10-06T09:03:37","date_gmt":"2020-10-06T13:03:37","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=3845"},"modified":"2020-10-06T09:03:37","modified_gmt":"2020-10-06T13:03:37","slug":"automatic-differentiation-in-optimization-toolbox","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2020\/10\/06\/automatic-differentiation-in-optimization-toolbox\/","title":{"rendered":"Automatic Differentiation in Optimization Toolbox&#x2122;"},"content":{"rendered":"\r\n<div class=\"content\"><!--introduction--><p>This column is written by Alan Weiss, the writer for Optimization Toolbox documentation. Take it away, Alan.<\/p><p>Hi, folks. You may know that solving an optimization problem, meaning finding a point where a function is minimized, is easier when you have the gradient of the function. This is easy to understand: the gradient points uphill, so if you travel in the opposite direction, you generally reach a minimum. Optimization Toolbox algorithms are based on more sophisticated algorithms than this, yet these more sophisticated algorithms also benefit from a gradient.<\/p><p>How do you give a gradient to a solver along with the function? Until recently, you had to calculate the gradient as a separate output, with all the pain and possibility of error that entails. However, with R2020b, the problem-based approach uses automatic differentiation for the calculation of problem gradients for general nonlinear optimization problems. I will explain what all of those words mean. In a nutshell, as long as your function is composed of elementary functions such as polynomials, trigonometric functions, and exponentials, Optimization Toolbox calculates and uses the gradients of your functions automatically, with no effort on your part.<\/p><p>The \"general nonlinear\" phrase means that automatic differentiation applies to problems that <tt>fmincon<\/tt> and <tt>fminunc<\/tt> solve, which are general constrained or unconstrained minimization, as opposed to linear programming or least-squares or other problem types that call other specialized solvers.<\/p><p>Automatic differentiation, also called AD, is a type of symbolic derivative that transforms a function into code that calculates the function values and derivative values at particular points. This process is transparent; you do not have to write any special code to use AD. Actually, as you'll see later, you have to specify some name-value pairs in order not to have the solver use AD.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#13bfb8d2-2b4c-403f-942e-e17218a96af2\">Problem-Based Optimization<\/a><\/li><li><a href=\"#eb922bd7-3174-40f1-8924-22a59934be44\">Effect of Automatic Differentiation<\/a><\/li><li><a href=\"#d31ccf10-d596-4636-b9d3-610fdd28e544\">What Is Automatic Differentiation?<\/a><\/li><li><a href=\"#0b09d582-ee83-4353-abb4-7f99c29abdc2\">What Good is Automatic Differentiation?<\/a><\/li><li><a href=\"#05576894-8fae-4043-a791-23baec17781f\">What About Unsupported Operations?<\/a><\/li><li><a href=\"#c1b46c0d-9271-4f1e-bdd8-75cdd0894341\">Final Thoughts<\/a><\/li><\/ul><\/div><h4>Problem-Based Optimization<a name=\"13bfb8d2-2b4c-403f-942e-e17218a96af2\"><\/a><\/h4><p>The problem-based approach to optimization is to write your problem in terms of optimization variables and expressions. For example, to minimize the test function ${\\rm fun}(x,y) = 100(y-x^2)^2 + (1-x)^2$ inside the unit disk $x^2+y^2\\le 1$, you first create optimization variables.<\/p><pre class=\"codeinput\">x = optimvar(<span class=\"string\">'x'<\/span>);\r\ny = optimvar(<span class=\"string\">'y'<\/span>);\r\n<\/pre><p>You then create optimization expressions using these variables.<\/p><pre class=\"codeinput\">fun = 100*(y - x^2)^2 + (1 - x)^2;\r\nunitdisk = x^2 + y^2 &lt;= 1;\r\n<\/pre><p>Create an optimization problem with these expressions in the appropriate problem fields.<\/p><pre class=\"codeinput\">prob = optimproblem(<span class=\"string\">\"Objective\"<\/span>,fun,<span class=\"string\">\"Constraints\"<\/span>,unitdisk);\r\n<\/pre><p>Solve the problem by calling <tt>solve<\/tt>, starting from x = 0, y = 0.<\/p><pre class=\"codeinput\">x0.x = 0;\r\nx0.y = 0;\r\nsol = solve(prob,x0)\r\n<\/pre><pre class=\"codeoutput\">\r\nSolving problem using fmincon.\r\n\r\nLocal minimum found that satisfies the constraints.\r\n\r\nOptimization completed because the objective function is non-decreasing in \r\nfeasible directions, to within the value of the optimality tolerance,\r\nand constraints are satisfied to within the value of the constraint tolerance.\r\n\r\nsol = \r\n  struct with fields:\r\n\r\n    x: 0.7864\r\n    y: 0.6177\r\n<\/pre><p>The <tt>solve<\/tt> function calls <tt>fmincon<\/tt> to solve the problem. In fact, <tt>solve<\/tt> uses AD to speed the solution process. Let's examine the solution process in more detail to see this in action. But first, plot the logarithm of one plus the objective function on the unit disk, and plot a red circle at the solution.<\/p><pre class=\"codeinput\">[R,TH] = ndgrid(linspace(0,1,100),linspace(0,2*pi,200));\r\n[X,Y] = pol2cart(TH,R);\r\nsurf(X,Y,log(1+100*(Y - X.^2).^2 + (1 - X).^2),<span class=\"string\">'EdgeColor'<\/span>,<span class=\"string\">'none'<\/span>)\r\ncolorbar\r\nview(0,90)\r\naxis <span class=\"string\">equal<\/span>\r\nhold <span class=\"string\">on<\/span>\r\nplot3(sol.x,sol.y,1,<span class=\"string\">'ro'<\/span>,<span class=\"string\">'MarkerSize'<\/span>,10)\r\nhold <span class=\"string\">off<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/LorenBlogPostAD_01.png\" alt=\"\"> <h4>Effect of Automatic Differentiation<a name=\"eb922bd7-3174-40f1-8924-22a59934be44\"><\/a><\/h4><p>To examine the solution process in more detail, solve the problem again, this time requesting more <tt>solve<\/tt> outputs. Examine the number of iterations and function evaluations that the solver takes.<\/p><pre class=\"codeinput\">[sol,fval,exitflag,output] = solve(prob,x0);\r\nfprintf(<span class=\"string\">'fmincon takes %g iterations and %g function evaluations.\\n'<\/span>,<span class=\"keyword\">...<\/span>\r\n    output.iterations,output.funcCount)\r\n<\/pre><pre class=\"codeoutput\">\r\nSolving problem using fmincon.\r\n\r\nLocal minimum found that satisfies the constraints.\r\n\r\nOptimization completed because the objective function is non-decreasing in \r\nfeasible directions, to within the value of the optimality tolerance,\r\nand constraints are satisfied to within the value of the constraint tolerance.\r\n\r\nfmincon takes 24 iterations and 34 function evaluations.\r\n<\/pre><p>The output structure shows that the solver takes 24 iterations and 34 function counts. Run the problem again, this time enforcing that the solver does not use AD.<\/p><pre class=\"codeinput\">[sol2,fval2,exitflag2,output2] = solve(prob,x0,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'ObjectiveDerivative'<\/span>,<span class=\"string\">\"finite-differences\"<\/span>,<span class=\"string\">'ConstraintDerivative'<\/span>,<span class=\"string\">\"finite-differences\"<\/span>);\r\nfprintf(<span class=\"string\">'fmincon takes %g iterations and %g function evaluations.\\n'<\/span>,<span class=\"keyword\">...<\/span>\r\n    output2.iterations,output2.funcCount)\r\nplot([1 2],[output.funcCount output2.funcCount],<span class=\"string\">'r-'<\/span>,<span class=\"keyword\">...<\/span>\r\n    [1 2],[output.funcCount output2.funcCount],<span class=\"string\">'ro'<\/span>)\r\nylabel(<span class=\"string\">'Function Count'<\/span>)\r\nxlim([0.8 2.2])\r\nylim([0 90])\r\nlegend(<span class=\"string\">'Function Count (lower is better)'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'northwest'<\/span>)\r\nax = gca;\r\nax.XTick = [1,2];\r\nax.XTickLabel = {<span class=\"string\">'With AD'<\/span>,<span class=\"string\">'Without AD'<\/span>};\r\n<\/pre><pre class=\"codeoutput\">\r\nSolving problem using fmincon.\r\n\r\nLocal minimum found that satisfies the constraints.\r\n\r\nOptimization completed because the objective function is non-decreasing in \r\nfeasible directions, to within the value of the optimality tolerance,\r\nand constraints are satisfied to within the value of the constraint tolerance.\r\n\r\nfmincon takes 24 iterations and 84 function evaluations.\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/LorenBlogPostAD_02.png\" alt=\"\"> <p>This time the solver takes 84 function counts, not 34. The reason for this difference is automatic differentiation.<\/p><p>The solutions are nearly the same with or without AD:<\/p><pre class=\"codeinput\">fprintf(<span class=\"string\">'The norm of solution differences is %g.\\n'<\/span>,norm([sol.x,sol.y] - [sol2.x,sol2.y]))\r\n<\/pre><pre class=\"codeoutput\">The norm of solution differences is 1.80128e-09.\r\n<\/pre><h4>What Is Automatic Differentiation?<a name=\"d31ccf10-d596-4636-b9d3-610fdd28e544\"><\/a><\/h4><p>AD is similar to symbolic differentiation: each function is essentially differentiated symbolically, and the result is turned into code that MATLAB runs to compute derivatives. One way to see the resulting code is to use the <tt>prob2struct<\/tt> function. Let's try that on <tt>prob<\/tt>.<\/p><pre class=\"codeinput\">problem = prob2struct(prob);\r\nproblem.objective\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n  function_handle with value:\r\n    @generatedObjective\r\n<\/pre><p><tt>prob2struct<\/tt> creates a function file named <tt>generatedObjective.m<\/tt>. This function includes an automatically-generated gradient. View the contents of the function file.<\/p><pre class=\"language-matlab\">\r\n<span class=\"keyword\">function<\/span> [obj, grad] = generatedObjective(inputVariables)\r\n<span class=\"comment\">%generatedObjective Compute objective function value and gradient<\/span>\r\n<span class=\"comment\">%<\/span>\r\n<span class=\"comment\">%   OBJ = generatedObjective(INPUTVARIABLES) computes the objective value<\/span>\r\n<span class=\"comment\">%   OBJ at the point INPUTVARIABLES.<\/span>\r\n<span class=\"comment\">%<\/span>\r\n<span class=\"comment\">%   [OBJ, GRAD] = generatedObjective(INPUTVARIABLES) additionally computes<\/span>\r\n<span class=\"comment\">%   the objective gradient value GRAD at the current point.<\/span>\r\n<span class=\"comment\">%<\/span>\r\n<span class=\"comment\">%   Auto-generated by prob2struct on 06-Oct-2020 09:01:35<\/span>\r\n\r\n<span class=\"comment\">%% Variable indices.<\/span>\r\nxidx = 1;\r\nyidx = 2;\r\n\r\n<span class=\"comment\">%% Map solver-based variables to problem-based.<\/span>\r\nx = inputVariables(xidx);\r\ny = inputVariables(yidx);\r\n\r\n<span class=\"comment\">%% Compute objective function.<\/span>\r\narg1 = (y - x.^2);\r\narg2 = 100;\r\narg3 = arg1.^2;\r\narg4 = (1 - x);\r\nobj = ((arg2 .* arg3) + arg4.^2);\r\n\r\n<span class=\"comment\">%% Compute objective gradient.<\/span>\r\n<span class=\"keyword\">if<\/span> nargout &gt; 1\r\n    arg5 = 1;\r\n    arg6 = zeros([2, 1]);\r\n    arg6(xidx,:) = (-(arg5.*2.*(arg4(:)))) + ((-((arg5.*arg2(:)).*2.*(arg1(:)))).*2.*(x(:)));\r\n    arg6(yidx,:) = ((arg5.*arg2(:)).*2.*(arg1(:)));\r\n    grad = arg6(:);\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<\/pre><p>While this code might not be clear to you, you can compare the AD gradient calculation to a symbolic expression to see that they are the same:<\/p><p>$\\nabla({\\rm fun}) = [-400(y-x^2)x - 2(1-x);\\ 200(y-x^2)]$<\/p><p>The way that <tt>solve<\/tt> and <tt>prob2struct<\/tt> convert optimization expressions into code is essentially the same way that calculus students learn, taking each part of an expression and applying rules of differentiation. The details of the process of calculating the gradient are explained in <a href=\"https:\/\/www.mathworks.com\/help\/optim\/ug\/autodiff-background.html\">Automatic Differentiation Background<\/a>, which describes the \"forward\" and \"backward\" process used by most AD software. Currently, Optimization Toolbox uses only \"backward\" AD.<\/p><p>To use these rules of differentiation, the software has to have differentiation rules for each function in the objective or constraint functions. The list of supported operators includes polynomials, trigonometric and exponential functions and their inverses, along with multiplication and addition and their inverses. See <a href=\"https:\/\/www.mathworks.com\/help\/optim\/ug\/supported-operations-on-optimization-variables-expressions.html\">Supported Operations on Optimization Variables and Expressions<\/a>.<\/p><h4>What Good is Automatic Differentiation?<a name=\"0b09d582-ee83-4353-abb4-7f99c29abdc2\"><\/a><\/h4><p>AD lowers the number of function evaluations the solver takes. Without AD, nonlinear solvers estimate gradients by finite differences, such as $(f(x+\\delta e_1) - f(x))\/\\delta ,$ where $e_1$ is the unit vector (1,0,...,0). The solver evaluates <i>n<\/i> finite differences of this form by default, where <i>n<\/i> is the number of problem variables. For problems with a large number of variables, this process requires a large number of function evaluations.<\/p><p>With AD and supported functions, solvers do not need to take finite difference steps, so the derivative estimation process takes fewer function evaluations and is more accurate.<\/p><p>This is not to say that AD always speeds a solver. For complicated expressions, evaluating the automatic derivatives can be even more time-consuming than evaluating finite differences. Generally, AD is faster than finite differences when the problem has a large number of variables and is sparse. AD is slower when the problem has few variables and has complicated functions.<\/p><h4>What About Unsupported Operations?<a name=\"05576894-8fae-4043-a791-23baec17781f\"><\/a><\/h4><p>So far I have talked about automatic differentiation for supported operations. What if you have a black-box function, one for which the underlying code might not even be in MATLAB? Or what if you simply have a function that is not supported for problem-based optimization, such as a Bessel function? In order to include such functions in the problem-based approach, you convert the function to an optimization expression using the <tt>fcn2optimexpr<\/tt> function. For example, to use the <tt>besselj<\/tt> function,<\/p><pre class=\"codeinput\">fun2 = fcn2optimexpr(@(x,y)besselj(1,x^2 + y^2),x,y);\r\n<\/pre><p><tt>fcn2optimexpr<\/tt> allows you to use unsupported operations in the problem-based approach. However, <tt>fcn2optimexpr<\/tt> does not support AD. So, when you use <tt>fcn2optimexpr<\/tt>, solving the resulting problem uses finite differences to estimate gradients of objective or nonlinear constraint functions. For more information, see <a href=\"https:\/\/www.mathworks.com\/help\/optim\/ug\/derivatives-problem-based.html\">Supply Derivatives in Problem-Based Workflow<\/a>.<\/p><p>Currently, AD does not support higher-order derivatives. In other words, you cannot generate code for a second or third derivative automatically. You get first-order derivatives (gradients) only.<\/p><h4>Final Thoughts<a name=\"c1b46c0d-9271-4f1e-bdd8-75cdd0894341\"><\/a><\/h4><p>AD is useful for increased speed and reliability in solving optimization problems that are composed solely of supported functions. However, in some cases it does not increase speed, and currently AD is not available for nonlinear least-squares or equation-solving problems.<\/p><p>In my opinion, the most useful feature of AD is that it is utterly transparent to use in problem-based optimization. Beginning in R2020b, AD applies automatically, with no effort on your part. Let us know if you find it useful in solving your optimization problems by posting your comment <a href=\"https:\/\/blogs.mathworks.com\/loren?p=3845#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_afbfd6f33a50469d84ccf01946b0f30a() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='afbfd6f33a50469d84ccf01946b0f30a ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' afbfd6f33a50469d84ccf01946b0f30a';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2020 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_afbfd6f33a50469d84ccf01946b0f30a()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2020b<br><\/p><\/div><!--\r\nafbfd6f33a50469d84ccf01946b0f30a ##### SOURCE BEGIN #####\r\n%% Automatic Differentiation in Optimization Toolbox&#x2122;\r\n%\r\n% This column is written by Alan Weiss, the writer for Optimization Toolbox\r\n% documentation. Take it away, Alan.\r\n% \r\n% Hi, folks. You may know that solving an optimization problem, meaning\r\n% finding a point where a function is minimized, is easier when you have\r\n% the gradient of the function. This is easy to understand: the gradient\r\n% points uphill, so if you travel in the opposite direction, you generally\r\n% reach a minimum. Optimization Toolbox algorithms are based on more\r\n% sophisticated algorithms than this, yet these more sophisticated\r\n% algorithms also benefit from a gradient.\r\n%\r\n% How do you give a gradient to a solver along with the function? Until\r\n% recently, you had to calculate the gradient as a separate output, with\r\n% all the pain and possibility of error that entails. However, with R2020b,\r\n% the problem-based approach uses automatic differentiation for the\r\n% calculation of problem gradients for general nonlinear optimization\r\n% problems. I will explain what all of those words mean. In a nutshell, as\r\n% long as your function is composed of elementary functions such as\r\n% polynomials, trigonometric functions, and exponentials, Optimization\r\n% Toolbox calculates and uses the gradients of your functions\r\n% automatically, with no effort on your part.\r\n%\r\n% The \"general nonlinear\" phrase means that automatic differentiation\r\n% applies to problems that |fmincon| and |fminunc| solve, which are general\r\n% constrained or unconstrained minimization, as opposed to linear\r\n% programming or least-squares or other problem types that call other\r\n% specialized solvers.\r\n% \r\n% Automatic differentiation, also called AD, is a type of symbolic\r\n% derivative that transforms a function into code that calculates the\r\n% function values and derivative values at particular points. This process\r\n% is transparent; you do not have to write any special code to use AD.\r\n% Actually, as you'll see later, you have to specify some name-value pairs\r\n% in order not to have the solver use AD.\r\n%% Problem-Based Optimization\r\n% The problem-based approach to optimization is to write your problem in\r\n% terms of optimization variables and expressions. For example, to minimize\r\n% the test function ${\\rm fun}(x,y) = 100(y-x^2)^2 + (1-x)^2$ inside the\r\n% unit disk $x^2+y^2\\le 1$, you first create optimization variables.\r\nx = optimvar('x');\r\ny = optimvar('y');\r\n%%\r\n% You then create optimization expressions using these variables.\r\nfun = 100*(y - x^2)^2 + (1 - x)^2;\r\nunitdisk = x^2 + y^2 <= 1;\r\n\r\n%%\r\n% Create an optimization problem with these expressions in the appropriate\r\n% problem fields.\r\nprob = optimproblem(\"Objective\",fun,\"Constraints\",unitdisk);\r\n%%\r\n% Solve the problem by calling |solve|, starting from x = 0, y = 0.\r\nx0.x = 0;\r\nx0.y = 0;\r\nsol = solve(prob,x0)\r\n%%\r\n% The |solve| function calls |fmincon| to solve the problem. In fact,\r\n% |solve| uses AD to speed the solution process. Let's examine the solution\r\n% process in more detail to see this in action. But first, plot the\r\n% logarithm of one plus the objective function on the unit disk, and plot a\r\n% red circle at the solution.\r\n[R,TH] = ndgrid(linspace(0,1,100),linspace(0,2*pi,200));\r\n[X,Y] = pol2cart(TH,R);\r\nsurf(X,Y,log(1+100*(Y - X.^2).^2 + (1 - X).^2),'EdgeColor','none')\r\ncolorbar\r\nview(0,90)\r\naxis equal\r\nhold on\r\nplot3(sol.x,sol.y,1,'ro','MarkerSize',10)\r\nhold off \r\n%% Effect of Automatic Differentiation\r\n% To examine the solution process in more detail, solve the problem again,\r\n% this time requesting more |solve| outputs. Examine the number of\r\n% iterations and function evaluations that the solver takes.\r\n[sol,fval,exitflag,output] = solve(prob,x0);\r\nfprintf('fmincon takes %g iterations and %g function evaluations.\\n',...\r\n    output.iterations,output.funcCount)\r\n%%\r\n% The output structure shows that the solver takes 24 iterations and 34\r\n% function counts. Run the problem again, this time enforcing that the\r\n% solver does not use AD.\r\n[sol2,fval2,exitflag2,output2] = solve(prob,x0,...\r\n    'ObjectiveDerivative',\"finite-differences\",'ConstraintDerivative',\"finite-differences\");\r\nfprintf('fmincon takes %g iterations and %g function evaluations.\\n',...\r\n    output2.iterations,output2.funcCount)\r\nplot([1 2],[output.funcCount output2.funcCount],'r-',...\r\n    [1 2],[output.funcCount output2.funcCount],'ro')\r\nylabel('Function Count')\r\nxlim([0.8 2.2])\r\nylim([0 90])\r\nlegend('Function Count (lower is better)','Location','northwest')\r\nax = gca;\r\nax.XTick = [1,2];\r\nax.XTickLabel = {'With AD','Without AD'};\r\n%%\r\n% This time the solver takes 84 function counts, not 34. The reason for\r\n% this difference is automatic differentiation.\r\n% \r\n% The solutions are nearly the same with or without AD:\r\nfprintf('The norm of solution differences is %g.\\n',norm([sol.x,sol.y] - [sol2.x,sol2.y]))\r\n\r\n%% What Is Automatic Differentiation?\r\n% AD is similar to symbolic differentiation: each function is essentially\r\n% differentiated symbolically, and the result is turned into code that\r\n% MATLAB runs to compute derivatives. One way to see the resulting code is\r\n% to use the |prob2struct| function. Let's try that on |prob|.\r\nproblem = prob2struct(prob);\r\nproblem.objective\r\n%%\r\n% |prob2struct| creates a function file named |generatedObjective.m|. This\r\n% function includes an automatically-generated gradient. View the contents\r\n% of the function file.\r\n%\r\n% <include>generatedObjective.m<\/include>\r\n%%\r\n% While this code might not be clear to you, you can compare the AD\r\n% gradient calculation to a symbolic expression to see that they are the\r\n% same:\r\n%\r\n% $\\nabla({\\rm fun}) = [-400(y-x^2)x - 2(1-x);\\ 200(y-x^2)]$\r\n%\r\n% The way that |solve| and |prob2struct| convert optimization expressions\r\n% into code is essentially the same way that calculus students learn,\r\n% taking each part of an expression and applying rules of differentiation.\r\n% The details of the process of calculating the gradient are explained in\r\n% <https:\/\/www.mathworks.com\/help\/optim\/ug\/autodiff-background.html\r\n% Automatic Differentiation Background>, which describes the \"forward\" and\r\n% \"backward\" process used by most AD software. Currently, Optimization\r\n% Toolbox uses only \"backward\" AD.\r\n% \r\n% To use these rules of differentiation, the software has to have\r\n% differentiation rules for each function in the objective or constraint\r\n% functions. The list of supported operators includes polynomials,\r\n% trigonometric and exponential functions and their inverses, along with\r\n% multiplication and addition and their inverses. See\r\n% <https:\/\/www.mathworks.com\/help\/optim\/ug\/supported-operations-on-optimization-variables-expressions.html\r\n% Supported Operations on Optimization Variables and Expressions>.\r\n\r\n%% What Good is Automatic Differentiation?\r\n%\r\n% AD lowers the number of function evaluations the solver takes. Without\r\n% AD, nonlinear solvers estimate gradients by finite differences, such as\r\n% $(f(x+\\delta e_1) - f(x))\/\\delta ,$ where $e_1$ is the unit vector\r\n% (1,0,...,0). The solver evaluates _n_ finite differences of this form by\r\n% default, where _n_ is the number of problem variables. For problems with\r\n% a large number of variables, this process requires a large number of\r\n% function evaluations.\r\n% \r\n% With AD and supported functions, solvers do not need to take finite\r\n% difference steps, so the derivative estimation process takes fewer\r\n% function evaluations and is more accurate.\r\n% \r\n% This is not to say that AD always speeds a solver. For complicated\r\n% expressions, evaluating the automatic derivatives can be even more\r\n% time-consuming than evaluating finite differences. Generally, AD is\r\n% faster than finite differences when the problem has a large number of\r\n% variables and is sparse. AD is slower when the problem has few variables\r\n% and has complicated functions.\r\n%\r\n%% What About Unsupported Operations?\r\n% So far I have talked about automatic differentiation for supported\r\n% operations. What if you have a black-box function, one for which the\r\n% underlying code might not even be in MATLAB? Or what if you simply have a\r\n% function that is not supported for problem-based optimization, such as a\r\n% Bessel function? In order to include such functions in the problem-based\r\n% approach, you convert the function to an optimization expression using\r\n% the |fcn2optimexpr| function. For example, to use the |besselj| function,\r\nfun2 = fcn2optimexpr(@(x,y)besselj(1,x^2 + y^2),x,y);\r\n%%\r\n% |fcn2optimexpr| allows you to use unsupported operations in the\r\n% problem-based approach. However, |fcn2optimexpr| does not support AD. So,\r\n% when you use |fcn2optimexpr|, solving the resulting problem uses finite\r\n% differences to estimate gradients of objective or nonlinear constraint\r\n% functions. For more information, see\r\n% <https:\/\/www.mathworks.com\/help\/optim\/ug\/derivatives-problem-based.html\r\n% Supply Derivatives in Problem-Based Workflow>.\r\n%\r\n% Currently, AD does not support higher-order derivatives. In other words,\r\n% you cannot generate code for a second or third derivative automatically.\r\n% You get first-order derivatives (gradients) only.\r\n%% Final Thoughts\r\n% AD is useful for increased speed and reliability in solving optimization\r\n% problems that are composed solely of supported functions. However, in\r\n% some cases it does not increase speed, and currently AD is not available\r\n% for nonlinear least-squares or equation-solving problems.\r\n%\r\n% In my opinion, the most useful feature of AD is that it is utterly\r\n% transparent to use in problem-based optimization. Beginning in R2020b, AD\r\n% applies automatically, with no effort on your part. Let us know if you\r\n% find it useful in solving your optimization problems by posting your\r\n% comment\r\n% <https:\/\/blogs.mathworks.com\/loren?p=3845#respond here>. \r\n##### SOURCE END ##### afbfd6f33a50469d84ccf01946b0f30a\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/LorenBlogPostAD_02.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>This column is written by Alan Weiss, the writer for Optimization Toolbox documentation. Take it away, Alan.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2020\/10\/06\/automatic-differentiation-in-optimization-toolbox\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,60],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/3845"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=3845"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/3845\/revisions"}],"predecessor-version":[{"id":3849,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/3845\/revisions\/3849"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=3845"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=3845"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=3845"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}