{"id":10964,"date":"2024-01-12T11:35:32","date_gmt":"2024-01-12T16:35:32","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=10964"},"modified":"2024-01-12T12:42:39","modified_gmt":"2024-01-12T17:42:39","slug":"exponential-fitting-separable-least-squares-and-quahogs","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2024\/01\/12\/exponential-fitting-separable-least-squares-and-quahogs\/","title":{"rendered":"Exponential Fitting, Separable Least Squares, and Quahogs"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>We have been investigating a recent bug report about <tt>fitnlm<\/tt>, the Statistics and Machine Learning Toolbox function for robust fitting of nonlinear models.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#e8d5954f-4051-48dd-9888-3729cc8d6313\">Quahogs<\/a><\/li><li><a href=\"#1fd80451-cc63-4160-8fcc-b1a1448c7382\">Acidification<\/a><\/li><li><a href=\"#faab8af3-c8dd-48bf-8621-b7fa14ade2fb\">Separable Least Squares<\/a><\/li><li><a href=\"#8afd06c3-7aca-4df8-b636-b7d660a3b6ca\">Centering Data<\/a><\/li><li><a href=\"#b8434e53-14a6-412e-97d9-39485ad02c07\">Exponential Fitting<\/a><\/li><li><a href=\"#e6bd554c-66fa-4ebc-b3d4-2cd14229bed0\">Results<\/a><\/li><li><a href=\"#dd10ff5a-44a4-4ad7-9ddb-7cb103a943eb\">Software<\/a><\/li><li><a href=\"#7e09ac26-71d6-47be-aa47-13da26bc9755\">Thanks<\/a><\/li><\/ul><\/div><h4>Quahogs<a name=\"e8d5954f-4051-48dd-9888-3729cc8d6313\"><\/a><\/h4><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/quahogs.png\" alt=\"\"> <\/p><p>The bug report comes from Greg Pelletier, an independent research scientist and biogeochemical modeler in Olympia, Washington. Greg has been studying the vulnerability of sensitive marine organisms to increases in ocean acidification.  One of the most important of these organisms is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hard_clam\">Mercenaria mercenaria<\/a>, the hard clam.<\/p><p>Especially here in New England, hard clams are known by their traditional Native American name, <i>quahog<\/i>. They have a well-deserved reputation for making excellent clam chowder.<\/p><h4>Acidification<a name=\"1fd80451-cc63-4160-8fcc-b1a1448c7382\"><\/a><\/h4><p>We are all aware of increasing levels of carbon dioxide in the earth's atmosphere.  We may not be as aware of the effect this increase has on the health of the earth's oceans.  According to <a href=\"https:\/\/www.noaa.gov\/education\/resource-collections\/ocean-coasts\/ocean-acidification\">NOAA<\/a>, the ocean absorbs about 30% of the atmospheric carbon dioxide.<\/p><p>A definitive and controversial 2009 paper by Justin Ries and colleagues, then at the Woods Hole Oceanographic Institution, is \"Marine calcifiers exhibit mixed responses to CO2-induced ocean acidification\", <a href=\"https:\/\/doi.org\/10.1130\/G30210A.1\">https:\/\/doi.org\/10.1130\/G30210A.1<\/a>. The hard clam example in Greg's bug report comes from figure 1K in the Ries et al. paper.<\/p><p>The independent variable in experiments is the ratio of alkalinity of sea water to the concentration of dissolved inorganic carbon. The dependent variable is the calcification rate, which compares how fast the organism builds its shells to how fast the shells are dissolving.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/Ries_et_al.png\" alt=\"\"> <\/p><h4>Separable Least Squares<a name=\"faab8af3-c8dd-48bf-8621-b7fa14ade2fb\"><\/a><\/h4><p>The model chosen by Ries at al. is<\/p><p>$$ y \\approx \\beta_1 + \\beta_2 e^{\\lambda t} $$<\/p><p>where $t$ is the ratio of alkalinity to dissolved carbon and $y$ is the calcification rate. The data have only four distinct values of $t$, with several observations of $y$ at each value.<\/p><p>The parameters $\\beta_1$, $\\beta_2$ and $\\lambda$ are determined by least squares curve fit.  This is a <i>separable least squares<\/i> problem. For any given value of $\\lambda$, the parameters $\\beta_1$ and $\\beta_2$ occur linearly and the least squares solution can be obtained by MATLAB's backslash.<\/p><p>Gene Golub and Victor Pereyra described separable least squares in 1973 and proposed solving it by a <i>variable projection algorithm<\/i>. Since 1973 a number of people, including Pereyra, Linda Kaufman, Fred Krogh, John Bolstadt and David Gay, have contributed to the development of a series of Fortran programs named <tt>varpro<\/tt>. In 2011, Dianne O'Leary and Burt Rust created a MATLAB version of <tt>varpro<\/tt>.  Their report, <a href=\"https:\/\/www.cs.umd.edu\/~oleary\/software\/varpro\/\">https:\/\/www.cs.umd.edu\/~oleary\/software\/varpro\/<\/a>, is a good background source, as well as documentation for <tt>varpro.m<\/tt>.<\/p><p>I have a section on separable least squares, and an example, <tt>expfitdemo<\/tt>, in NCM, <a href=\"https:\/\/www.mathworks.com\/content\/dam\/mathworks\/mathworks-dot-com\/moler\/leastsquares.pdf\">Numerical Computing with MATLAB<\/a>. I have modified <tt>expfitdemo<\/tt> to work on Greg's quahogs problem.<\/p><h4>Centering Data<a name=\"8afd06c3-7aca-4df8-b636-b7d660a3b6ca\"><\/a><\/h4><p>It turns out that the problem Greg encountered can be traced to the fact that the data are not centered.  The given values of $t$ are all positive.  This causes <tt>fitnlm<\/tt> to print a warning message and attempt to rectify the situation by changing the degrees of freedom from 22 to 23, but this only makes the situation worse. (We should take another look at the portion of <tt>fitnlm<\/tt> that adjusts the degrees of freedom.)<\/p><p>It is always a good idea in curve fitting to center the data with something like<\/p><pre>  t = t - mean(t)<\/pre><p>The values of $y$ are already pretty well centered. Rescaling $y$ with<\/p><pre>  y = 10000*y<\/pre><p>makes interpretation of results easier.<\/p><h4>Exponential Fitting<a name=\"b8434e53-14a6-412e-97d9-39485ad02c07\"><\/a><\/h4><p>With the data centered and scaled, we have three different ways of tackling Greg's problem.  All three methods agree on the results they compute.<\/p><div><ul><li><tt>fitnlm<\/tt>. Treats all parameters as if they were nonlinear.      Computes statistical quantities such as R-squared and      RMS Error.<\/li><\/ul><\/div><div><ul><li><tt>varpro<\/tt>. Venerable software history.      Only one nonlinear parameter for the quahogs problem.      Delivers additional statistical quantities in <tt>Regression<\/tt> structure.<\/li><\/ul><\/div><div><ul><li><tt>quahogsfit<\/tt>. Textbook separable least squares code.      Modification for the quahogs problem of <tt>expfitdemo<\/tt> from NCM.      Only one nonlinear parameter.  No statistics.<\/li><\/ul><\/div><h4>Results<a name=\"e6bd554c-66fa-4ebc-b3d4-2cd14229bed0\"><\/a><\/h4><div><ul><li><tt>fitnlm<\/tt><\/li><\/ul><\/div><pre>Nonlinear regression model:\r\n    y ~ param1 + param2*exp(param3*xval)<\/pre><pre>Estimated Coefficients:\r\n              Estimate      SE        tStat       pValue\r\n              ________    _______    _______    __________<\/pre><pre>    param1     0.69536     0.1657     4.1964    0.00037344\r\n    param2    -0.26482    0.19909    -1.3302       0.19709\r\n    param3     -22.218     8.1494    -2.7263      0.012327<\/pre><pre>Number of observations: 25, Error degrees of freedom: 22\r\nRoot Mean Squared Error: 0.307\r\nR-Squared: 0.828,  Adjusted R-Squared 0.813\r\nF-statistic vs. constant model: 53, p-value = 3.86e-09<\/pre><div><ul><li><tt>varpro<\/tt><\/li><\/ul><\/div><pre>Linear Parameters:\r\n  0.695367   -0.264837\r\nNonlinear Parameters:\r\n -22.217495<\/pre><pre>Norm         of weighted residual  =   1.438935\r\nNorm         of data vector        =   3.545820\r\nExpected error of observations     =   0.306782\r\nCoefficient of determination       =   0.828145<\/pre><pre>Regression.t_ratio\r\n    4.1962\r\n   -1.3301\r\n   -2.7264<\/pre><pre>Regression.std_param\r\n    0.1657\r\n    0.1991\r\n    8.1490<\/pre><div><ul><li><tt>quahogsfit<\/tt><\/li><\/ul><\/div><pre>lambda =\r\n  -22.2180\r\ncondX =\r\n    4.3882\r\nbeta =\r\n    0.6954\r\n   -0.2648\r\nnormres =\r\n    1.4389<\/pre><p><tt>quahogsfit<\/tt> produces this plot, which can be compared with figure 1K from Ries et al, reproduced above.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/quahogsfit.png\" alt=\"\"> <\/p><h4>Software<a name=\"dd10ff5a-44a4-4ad7-9ddb-7cb103a943eb\"><\/a><\/h4><p>The codes for this post are available here <a href=\"https:\/\/blogs.mathworks.com\/cleve\/files\/quahogs_driver.m\">quahogs_driver.m<\/a> and here <a href=\"https:\/\/blogs.mathworks.com\/cleve\/files\/varpro.m\">varpro.m<\/a>.<\/p><h4>Thanks<a name=\"7e09ac26-71d6-47be-aa47-13da26bc9755\"><\/a><\/h4><p>Thanks to Greg Pelletier for the bug report and to Tom Lane for his statistical expertise.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_0a292e90c48540ba9ef9da619b32e8e7() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='0a292e90c48540ba9ef9da619b32e8e7 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 0a292e90c48540ba9ef9da619b32e8e7';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2024 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_0a292e90c48540ba9ef9da619b32e8e7()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2023a<br><\/p><\/div><!--\r\n0a292e90c48540ba9ef9da619b32e8e7 ##### SOURCE BEGIN #####\r\n%% Exponential Fitting, Separable Least Squares, and Quahogs\r\n% We have been investigating a recent bug report about |fitnlm|,\r\n% the Statistics and Machine Learning Toolbox function\r\n% for robust fitting of nonlinear models.\r\n\r\n%% Quahogs\r\n% <<quahogs.png>>\r\n%\r\n% The bug report comes from Greg Pelletier, an independent research\r\n% scientist and biogeochemical modeler in Olympia, Washington.\r\n% Greg has been studying the \r\n% vulnerability of sensitive marine organisms to increases in ocean\r\n% acidification.  One of the most important of these organisms is\r\n% <https:\/\/en.wikipedia.org\/wiki\/Hard_clam Mercenaria mercenaria>,\r\n% the hard clam.\r\n% \r\n% Especially here in New England, hard clams are known\r\n% by their traditional Native American name, _quahog_.\r\n% They have a well-deserved reputation for making excellent\r\n% clam chowder.\r\n\r\n%% Acidification\r\n% We are all aware of increasing levels of carbon dioxide in the earth's\r\n% atmosphere.  We may not be as aware of the effect this increase has\r\n% on the health of the earth's oceans.  According to\r\n% <https:\/\/www.noaa.gov\/education\/resource-collections\/ocean-coasts\/ocean-acidification\r\n% NOAA>, the ocean absorbs about 30% of the atmospheric carbon dioxide. \r\n%\r\n% A definitive and controversial 2009 paper\r\n% by Justin Ries and colleagues,\r\n% then at the Woods Hole Oceanographic Institution, is\r\n% \"Marine calcifiers exhibit mixed responses to CO2-induced ocean\r\n% acidification\", <https:\/\/doi.org\/10.1130\/G30210A.1>.\r\n% The hard clam example in Greg's bug report comes from \r\n% figure 1K in the Ries et al. paper.\r\n%\r\n% The independent variable in experiments is the\r\n% ratio of alkalinity of sea water\r\n% to the concentration of dissolved inorganic carbon.\r\n% The dependent variable is the calcification rate, which compares\r\n% how fast the organism builds its shells \r\n% to how fast the shells are dissolving.\r\n%\r\n% <<Ries_et_al.png>>\r\n%\r\n%% Separable Least Squares\r\n% The model chosen by Ries at al. is\r\n%\r\n% $$ y \\approx \\beta_1 + \\beta_2 e^{\\lambda t} $$\r\n%\r\n% where $t$ is the ratio of alkalinity to dissolved carbon and $y$ is the \r\n% calcification rate. The data have only four distinct values of $t$, with \r\n% several observations of $y$ at each value.\r\n%\r\n% The parameters $\\beta_1$, $\\beta_2$ and $\\lambda$ are determined by\r\n% least squares curve fit.  This is a _separable least squares_ problem.\r\n% For any given value of $\\lambda$, the parameters $\\beta_1$ and $\\beta_2$\r\n% occur linearly and the least squares solution can be obtained by \r\n% MATLAB's backslash.\r\n% \r\n% Gene Golub and Victor Pereyra described separable least squares in 1973\r\n% and proposed solving it by a _variable projection algorithm_.\r\n% Since 1973 a number of people, including Pereyra, Linda Kaufman,\r\n% Fred Krogh, John Bolstadt and David Gay, have contributed to the\r\n% development of a series of Fortran programs named |varpro|.\r\n% In 2011, Dianne O'Leary and Burt Rust created a MATLAB version of\r\n% |varpro|.  Their report,\r\n% <https:\/\/www.cs.umd.edu\/~oleary\/software\/varpro\/>, is a good background\r\n% source, as well as documentation for |varpro.m|.\r\n%\r\n% I have a section on separable least squares, and an example,\r\n% |expfitdemo|, in NCM,\r\n% <https:\/\/www.mathworks.com\/content\/dam\/mathworks\/mathworks-dot-com\/moler\/leastsquares.pdf\r\n% Numerical Computing with MATLAB>. I have modified |expfitdemo| to\r\n% work on Greg's quahogs problem.\r\n\r\n%% Centering Data\r\n% It turns out that the problem Greg encountered can be traced to the \r\n% fact that the data are not centered.  The given values of $t$ are all\r\n% positive.  This causes |fitnlm| to print a warning message and attempt\r\n% to rectify the situation by changing the degrees of freedom  \r\n% from 22 to 23, but this only makes the situation worse.\r\n% (We should take another look at the portion of |fitnlm| that adjusts \r\n% the degrees of freedom.)\r\n%\r\n% It is always a good idea in curve fitting to center the \r\n% data with something like\r\n%\r\n%    t = t - mean(t)\r\n%\r\n% The values of $y$ are already pretty well centered.\r\n% Rescaling $y$ with\r\n%\r\n%    y = 10000*y\r\n%\r\n% makes interpretation of results easier.\r\n\r\n%% Exponential Fitting\r\n% With the data centered and scaled, we have three different \r\n% ways of tackling Greg's problem.  All three methods\r\n% agree on the results they compute.\r\n% \r\n% * |fitnlm|. Treats all parameters as if they were nonlinear.\r\n%      Computes statistical quantities such as R-squared and\r\n%      RMS Error.\r\n% \r\n% * |varpro|. Venerable software history.\r\n%      Only one nonlinear parameter for the quahogs problem.\r\n%      Delivers additional statistical quantities in |Regression| structure.\r\n%  \r\n% * |quahogsfit|. Textbook separable least squares code.\r\n%      Modification for the quahogs problem of |expfitdemo| from NCM.\r\n%      Only one nonlinear parameter.  No statistics.\r\n%\r\n\r\n%% Results\r\n% * |fitnlm| \r\n%\r\n%  Nonlinear regression model:\r\n%      y ~ param1 + param2*exp(param3*xval)\r\n%  \r\n%  Estimated Coefficients:\r\n%                Estimate      SE        tStat       pValue  \r\n%                ________    _______    _______    __________\r\n%  \r\n%      param1     0.69536     0.1657     4.1964    0.00037344\r\n%      param2    -0.26482    0.19909    -1.3302       0.19709\r\n%      param3     -22.218     8.1494    -2.7263      0.012327\r\n%  \r\n%  \r\n%  Number of observations: 25, Error degrees of freedom: 22\r\n%  Root Mean Squared Error: 0.307\r\n%  R-Squared: 0.828,  Adjusted R-Squared 0.813\r\n%  F-statistic vs. constant model: 53, p-value = 3.86e-09\r\n%\r\n% * |varpro|\r\n%   \r\n%  Linear Parameters:\r\n%    0.695367   -0.264837\r\n%  Nonlinear Parameters:\r\n%   -22.217495\r\n%   \r\n%  Norm         of weighted residual  =   1.438935\r\n%  Norm         of data vector        =   3.545820\r\n%  Expected error of observations     =   0.306782\r\n%  Coefficient of determination       =   0.828145\r\n%\r\n%  Regression.t_ratio\r\n%      4.1962\r\n%     -1.3301\r\n%     -2.7264\r\n%\r\n%  Regression.std_param\r\n%      0.1657\r\n%      0.1991\r\n%      8.1490\r\n%\r\n% * |quahogsfit|\r\n%\r\n%  lambda =\r\n%    -22.2180\r\n%  condX =\r\n%      4.3882\r\n%  beta =\r\n%      0.6954\r\n%     -0.2648\r\n%  normres =\r\n%      1.4389\r\n%\r\n% |quahogsfit| produces this plot, which can be compared with\r\n% figure 1K from Ries et al, reproduced above.\r\n%\r\n% <<quahogsfit.png>>\r\n%\r\n\r\n%% Software\r\n% The codes for this post are available here\r\n% <https:\/\/blogs.mathworks.com\/cleve\/files\/quahogs_driver.m  quahogs_driver.m>\r\n% and here \r\n% <https:\/\/blogs.mathworks.com\/cleve\/files\/varpro.m  varpro.m>.\r\n\r\n%% Thanks\r\n% Thanks to Greg Pelletier for the bug report and\r\n% to Tom Lane for his statistical expertise.\r\n\r\n##### SOURCE END ##### 0a292e90c48540ba9ef9da619b32e8e7\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/cleve\/files\/quahogsfit.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><!--introduction--><p>We have been investigating a recent bug report about <tt>fitnlm<\/tt>, the Statistics and Machine Learning Toolbox function for robust fitting of nonlinear models.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2024\/01\/12\/exponential-fitting-separable-least-squares-and-quahogs\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":10955,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[29,5,23,16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/10964"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=10964"}],"version-history":[{"count":3,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/10964\/revisions"}],"predecessor-version":[{"id":10970,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/10964\/revisions\/10970"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media\/10955"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=10964"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=10964"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=10964"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}