{"id":261,"date":"2011-01-13T11:11:19","date_gmt":"2011-01-13T11:11:19","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2011\/01\/13\/data-driven-fitting\/"},"modified":"2018-01-08T14:45:48","modified_gmt":"2018-01-08T19:45:48","slug":"data-driven-fitting","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2011\/01\/13\/data-driven-fitting\/","title":{"rendered":"Data Driven Fitting"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p><i>This week Richard Willey from technical marketing will be guest blogging about nonparametric fitting.<\/i><\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#4\">Generate some data to work with<\/a><\/li>\r\n         <li><a href=\"#6\">Nonlinear Regression example<\/a><\/li>\r\n         <li><a href=\"#7\">Nonparameteric Fitting Example<\/a><\/li>\r\n         <li><a href=\"#9\">Evaluate the accuracy of our LOWESS fit<\/a><\/li>\r\n         <li><a href=\"#11\">Special Guest Appearance by bootstrp<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p>This week's blog is motivated by a common question on <a>MATLAB Central<\/a>. Consider the following problem:\r\n   <\/p>\r\n   <div>\r\n      <ul>\r\n         <li>You have a data set consisting of two variables:  X and Y<\/li>\r\n         <li>You need to generate a model that the describes the relationship between the two variables<\/li>\r\n         <li>The only thing that you have to work with is the data.  You don't have any first principles information about the \"true\" relationship\r\n            between your variables\r\n         <\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p>How can you generate a curve fit that best describes the relationship between X and Y?  This question traditionally shows\r\n      up in in two different guises:\r\n   <\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Many people ask about solutions for \"automatic fitting\".  (How can I produce a curve fit without a parametric equation?)<\/li>\r\n         <li>The second variant involves questions about a curve fitting using high order polynomials.  The user doesn't have enough \"First\r\n            Principles\" knowledge about the system to specify a model, but does know that a 9th order polynomial can describe a very complicated\r\n            system.\r\n         <\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p>In this week's blog, I'm going to show you a better way to solve this same problem how to use a combination of localized regression\r\n      and cross validation.\r\n   <\/p>\r\n   <div>\r\n      <ul>\r\n         <li>You can access localized regression using the <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2010b\/toolbox\/curvefit\/smooth.html\">smooth<\/a> function in <a href=\"https:\/\/www.mathworks.com\/products\/curvefitting\/\">Curve Fitting Toolbox<\/a><\/li>\r\n         <li>Cross validation is available using the <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2010b\/toolbox\/stats\/cvpartition.html\">cvpartition<\/a> function in <a href=\"https:\/\/www.mathworks.com\/products\/statistics\/\">Statistics Toolbox<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Generate some data to work with<a name=\"4\"><\/a><\/h3>\r\n   <p>To use a more visual approach, suppose you had the following set of data points:<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">s = RandStream(<span style=\"color: #A020F0\">'mt19937ar'<\/span>,<span style=\"color: #A020F0\">'seed'<\/span>,1971);\r\nRandStream.setDefaultStream(s);\r\n\r\nX = linspace(1,10,100);\r\nX = X';\r\n\r\n<span style=\"color: #228B22\">% Specify the parameters for a second order Fourier series<\/span>\r\n\r\nw = .6067;\r\na0 = 1.6345;\r\na1 = -.6235;\r\nb1 = -1.3501;\r\na2 = -1.1622;\r\nb2 = -.9443;\r\n\r\n<span style=\"color: #228B22\">% Fourier2 is the true (unknown) relationship between X and Y<\/span>\r\nY = a0 + a1*cos(X*w) + b1*sin(X*w) + a2*cos(2*X*w) + b2*sin(2*X*w);\r\n\r\n<span style=\"color: #228B22\">% Add in a noise vector<\/span>\r\n\r\nK = max(Y) - min(Y);\r\nnoisy = Y +  .2*K*randn(100,1);\r\n\r\n<span style=\"color: #228B22\">% Generate a scatterplot<\/span>\r\nscatter(X,noisy,<span style=\"color: #A020F0\">'k'<\/span>);\r\nL2 = legend(<span style=\"color: #A020F0\">'Noisy Data Sample'<\/span>, 2);\r\nsnapnow<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/261\/RWLOWESS_01.png\"> <h3>Nonlinear Regression example<a name=\"6\"><\/a><\/h3>\r\n   <p>If you knew that this data was generated with a second order Fourier series, you use nonlinear regression to model Y = f(X).<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">foo = fit(X, noisy, <span style=\"color: #A020F0\">'fourier2'<\/span>)\r\n\r\n<span style=\"color: #228B22\">% Plot the results<\/span>\r\n\r\nhold <span style=\"color: #A020F0\">on<\/span>\r\nplot(foo)\r\nL3 = legend(<span style=\"color: #A020F0\">'Noisy Data Sample'<\/span>,<span style=\"color: #A020F0\">'Nonlinear Regression'<\/span>, 2);\r\nhold <span style=\"color: #A020F0\">off<\/span>\r\nsnapnow<\/pre><pre style=\"font-style:oblique\">foo = \r\n     General model Fourier2:\r\n     foo(x) =  a0 + a1*cos(x*w) + b1*sin(x*w) + \r\n               a2*cos(2*x*w) + b2*sin(2*x*w)\r\n     Coefficients (with 95% confidence bounds):\r\n       a0 =       1.734  (1.446, 2.021)\r\n       a1 =     -0.1998  (-1.065, 0.6655)\r\n       b1 =      -1.413  (-1.68, -1.146)\r\n       a2 =     -0.7688  (-1.752, 0.2142)\r\n       b2 =      -1.317  (-1.867, -0.7668)\r\n       w =      0.6334  (0.5802, 0.6866)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/261\/RWLOWESS_02.png\"> <h3>Nonparameteric Fitting Example<a name=\"7\"><\/a><\/h3>\r\n   <p>All very nice and easy...  We were able to specify a regression model and plot our curve using a couple lines of code.  The\r\n      coefficients that were estimated by our regression model are a close match to the ones that we originally specified.\r\n   <\/p>\r\n   <p>Here's the rub:  we cheated.  When we created our regression model, we specified that the true relationship between X and\r\n      Y should be modeled using a second order Fourier series.  What if this information wasn't available to us?\r\n   <\/p>\r\n   <p>This next piece of code is a bit more complicated, however, it will solve the same problem using nothing but the raw data\r\n      and some CPU cycles.\r\n   <\/p>\r\n   <p>We're going to use a smoothing algorithm called LOWESS (Localized Scatter Plot Smoothing) to generate our curve fit.  The\r\n      accuracy of a LOWESS fit depends on specifying the \"right\" value for the smoothing parameter.  We're going generate 99 different\r\n      LOWESS models, using smoothing parameters between zero and one, and see which value generates the most accurate model.\r\n   <\/p>\r\n   <p>This type of brute force technique often runs into problems with \"overfitting\".  The LOWESS curve that minimizes the \"sum\r\n      of the squared errors\" fits both the signal and the noise component of the data set.\r\n   <\/p>\r\n   <p>To protect against overfitting, we're going to use a technique called cross validation.  We're going to divide the data set\r\n      into different training sets and test sets.  We'll generate our predictive model using the data in the training set, and then\r\n      measure the accuracy of the model using the data in the test set.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">num = 99;\r\nspans = linspace(.01,.99,num);\r\nsse = zeros(size(spans));\r\ncp = cvpartition(100,<span style=\"color: #A020F0\">'k'<\/span>,10);\r\n\r\n<span style=\"color: #0000FF\">for<\/span> j=1:length(spans),\r\n    f = @(train,test) norm(test(:,2) - mylowess(train,test(:,1),spans(j)))^2;\r\n    sse(j) = sum(crossval(f,[X,noisy],<span style=\"color: #A020F0\">'partition'<\/span>,cp));\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\n[minsse,minj] = min(sse);\r\nspan = spans(minj);<\/pre><p><tt>mylowess<\/tt> is a function that I wrote that estimates values for a lowess smooth.  I've included a copy of <tt>mylowess.m<\/tt> for you to use.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">mylowess.m<\/span><\/pre><pre style=\"font-style:oblique\">\r\nfunction ys=mylowess(xy,xs,span)\r\n%MYLOWESS Lowess smoothing, preserving x values\r\n%   YS=MYLOWESS(XY,XS) returns the smoothed version of the x\/y data in the\r\n%   two-column matrix XY, but evaluates the smooth at XS and returns the\r\n%   smoothed values in YS.  Any values outside the range of XY are taken to\r\n%   be equal to the closest values.\r\n\r\nif nargin&lt;3 || isempty(span)\r\n    span = .3;\r\nend\r\n\r\n% Sort and get smoothed version of xy data\r\nxy = sortrows(xy);\r\nx1 = xy(:,1);\r\ny1 = xy(:,2);\r\nys1 = smooth(x1,y1,span,'loess');\r\n\r\n% Remove repeats so we can interpolate\r\nt = diff(x1)==0;\r\nx1(t)=[]; ys1(t) = [];\r\n\r\n% Interpolate to evaluate this at the xs values\r\nys = interp1(x1,ys1,xs,'linear',NaN);\r\n\r\n% Some of the original points may have x values outside the range of the\r\n% resampled data.  Those are now NaN because we could not interpolate them.\r\n% Replace NaN by the closest smoothed value.  This amounts to extending the\r\n% smooth curve using a horizontal line.\r\nif any(isnan(ys))\r\n    ys(xs&lt;x1(1)) = ys1(1);\r\n    ys(xs&gt;x1(end)) = ys1(end);\r\nend\r\n\r\n<\/pre><h3>Evaluate the accuracy of our LOWESS fit<a name=\"9\"><\/a><\/h3>\r\n   <p>Next, let's create a plot that shows<\/p>\r\n   <li>Our original second order Fourier Series<\/li>\r\n   <li>The noisy sample from this curve<\/li>\r\n   <li>The nonlinear regression model<\/li>\r\n   <li>The LOWESS model<\/li><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">plot(X,Y, <span style=\"color: #A020F0\">'k'<\/span>);\r\nhold <span style=\"color: #A020F0\">on<\/span>\r\nscatter(X,noisy, <span style=\"color: #A020F0\">'k'<\/span>);\r\nplot(foo, <span style=\"color: #A020F0\">'r'<\/span>)\r\n\r\nx = linspace(min(X),max(X));\r\nline(x,mylowess([X,noisy],x,span),<span style=\"color: #A020F0\">'color'<\/span>,<span style=\"color: #A020F0\">'b'<\/span>,<span style=\"color: #A020F0\">'linestyle'<\/span>,<span style=\"color: #A020F0\">'-'<\/span>, <span style=\"color: #A020F0\">'linewidth'<\/span>,2)\r\nlegend(<span style=\"color: #A020F0\">'Clean Data'<\/span>, <span style=\"color: #A020F0\">'Noisy Sample'<\/span>, <span style=\"color: #A020F0\">'Nonlinear Regression'<\/span>, <span style=\"color: #A020F0\">'LOWESS'<\/span>, 2)\r\nhold <span style=\"color: #A020F0\">off<\/span>\r\nsnapnow<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/261\/RWLOWESS_03.png\"> <p>As you can see, the LOWESS curve is doing a great job capturing the dynamics of the system.  We were able to generate a LOWESS\r\n      fit that is almost as accurate as the nonlinear regression without the need to specify that this system should be modeled\r\n      as a second order Fourier series.\r\n   <\/p>\r\n   <h3>Special Guest Appearance by bootstrp<a name=\"11\"><\/a><\/h3>\r\n   <p>Here's a bonus for those you you who stuck around to the end.  This next bit of code generates confidence intervals for the\r\n      LOWESS fit using a \"paired bootstrap\".\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">scatter(X, noisy)\r\n\r\nf = @(xy) mylowess(xy,X,span);\r\nyboot2 = bootstrp(1000,f,[X,noisy])';\r\nmeanloess = mean(yboot2,2);\r\nh1 = line(X, meanloess,<span style=\"color: #A020F0\">'color'<\/span>,<span style=\"color: #A020F0\">'k'<\/span>,<span style=\"color: #A020F0\">'linestyle'<\/span>,<span style=\"color: #A020F0\">'-'<\/span>,<span style=\"color: #A020F0\">'linewidth'<\/span>,2);\r\n\r\nstdloess = std(yboot2,0,2);\r\nh2 = line(X, meanloess+2*stdloess,<span style=\"color: #A020F0\">'color'<\/span>,<span style=\"color: #A020F0\">'r'<\/span>,<span style=\"color: #A020F0\">'linestyle'<\/span>,<span style=\"color: #A020F0\">'--'<\/span>,<span style=\"color: #A020F0\">'linewidth'<\/span>,2);\r\nh3 = line(X, meanloess-2*stdloess,<span style=\"color: #A020F0\">'color'<\/span>,<span style=\"color: #A020F0\">'r'<\/span>,<span style=\"color: #A020F0\">'linestyle'<\/span>,<span style=\"color: #A020F0\">'--'<\/span>,<span style=\"color: #A020F0\">'linewidth'<\/span>,2);\r\n\r\nL5 = legend(<span style=\"color: #A020F0\">'Localized Regression'<\/span>,<span style=\"color: #A020F0\">'Confidence Intervals'<\/span>,2);\r\nsnapnow<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/261\/RWLOWESS_04.png\"> <p>Can anyone provide a good intuitive explanation how the bootstrap works? For extra credit:<\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Can you convert this example from a \"paired bootstrap\" to a \"residual bootstrap\"?<\/li>\r\n         <li>Do you see any advantages or limitations to the different types of bootstraps?<\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p>Post your answers <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=261#respond\">here<\/a>.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_8cfc5c33409142f6bab064dbe1d0774a() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='8cfc5c33409142f6bab064dbe1d0774a ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 8cfc5c33409142f6bab064dbe1d0774a';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Richard Willey';\r\n        copyright = 'Copyright 2010 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_8cfc5c33409142f6bab064dbe1d0774a()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.11<br><\/p>\r\n<\/div>\r\n<!--\r\n8cfc5c33409142f6bab064dbe1d0774a ##### SOURCE BEGIN #####\r\n%% Data Driven Fitting  \r\n% _This week Richard Willey from technical marketing will be guest blogging\r\n% about nonparametric fitting._\r\n\r\n%%\r\n% This week's blog is motivated by a common question on\r\n% <http:\/\/ MATLAB Central>.\r\n% Consider the following problem:\r\n%\r\n% * You have a data set consisting of two variables:  X and Y\r\n% * You need to generate a model that the describes the relationship\r\n% between the two variables\r\n% * The only thing that you have to work with is the data.  You don't have\r\n% any first principles information about the \"true\" relationship between\r\n% your variables\r\n\r\n%%\r\n% How can you generate a curve fit that best describes the relationship \r\n% between X and Y?  This question traditionally shows up in in two \r\n% different guises:\r\n% \r\n% * Many people ask about solutions for \"automatic fitting\".  (How can I\r\n% produce a curve fit without a parametric equation?)\r\n% * The second variant involves questions about a curve fitting using \r\n% high order polynomials.  The user doesn't have enough \"First Principles\" \r\n% knowledge about the system to specify a model, but does know that a \r\n% 9th order polynomial can describe a very complicated system.\r\n% \r\n\r\n%%\r\n% In this week's blog, I'm going to show you a better way to solve this \r\n% same problem how to use a combination of localized regression and \r\n% cross validation.\r\n% \r\n% * You can access localized regression using the <https:\/\/www.mathworks.com\/help\/releases\/R2010b\/toolbox\/curvefit\/smooth.html smooth> function in\r\n% <https:\/\/www.mathworks.com\/products\/curvefitting\/ Curve Fitting Toolbox>\r\n% * Cross validation is available using the <https:\/\/www.mathworks.com\/help\/releases\/R2010b\/toolbox\/stats\/cvpartition.html cvpartition> function in\r\n% <https:\/\/www.mathworks.com\/products\/statistics\/ Statistics Toolbox>\r\n%\r\n\r\n%% Generate some data to work with\r\n\r\n%%\r\n% \r\n% To use a more visual approach, suppose you had the following set of data points:\r\n% \r\n\r\ns = RandStream('mt19937ar','seed',1971);\r\nRandStream.setDefaultStream(s);\r\n\r\nX = linspace(1,10,100); \r\nX = X';\r\n\r\n% Specify the parameters for a second order Fourier series\r\n\r\nw = .6067;\r\na0 = 1.6345;\r\na1 = -.6235;\r\nb1 = -1.3501;\r\na2 = -1.1622;\r\nb2 = -.9443;\r\n\r\n% Fourier2 is the true (unknown) relationship between X and Y\r\nY = a0 + a1*cos(X*w) + b1*sin(X*w) + a2*cos(2*X*w) + b2*sin(2*X*w);\r\n\r\n% Add in a noise vector\r\n\r\nK = max(Y) - min(Y);\r\nnoisy = Y +  .2*K*randn(100,1);\r\n\r\n% Generate a scatterplot\r\nscatter(X,noisy,'k');\r\nL2 = legend('Noisy Data Sample', 2);\r\nsnapnow\r\n%% Nonlinear Regression example\r\n% \r\n% If you knew that this data was generated with a second order Fourier\r\n% series, you use nonlinear regression to model Y = f(X).\r\n%\r\n\r\nfoo = fit(X, noisy, 'fourier2')\r\n\r\n% Plot the results\r\n\r\nhold on\r\nplot(foo)\r\nL3 = legend('Noisy Data Sample','Nonlinear Regression', 2);\r\nhold off\r\nsnapnow\r\n%% Nonparameteric Fitting Example\r\n% All very nice and easy...  We were able to specify a regression model\r\n% and plot our curve using a couple lines of code.  The coefficients that\r\n% were estimated by our regression model are a close match to the ones\r\n% that we originally specified.\r\n%\r\n% Here's the rub:  we cheated.  When we created our regression model, we\r\n% specified that the true relationship between X and Y should be modeled\r\n% using a second order Fourier series.  What if this information wasn't\r\n% available to us?\r\n%  \r\n% This next piece of code is a bit more complicated, however, it will\r\n% solve the same problem using nothing but the raw data and some CPU\r\n% cycles.\r\n%\r\n% We're going to use a smoothing algorithm called LOWESS (Localized\r\n% Scatter Plot Smoothing) to generate our curve fit.  The accuracy of a\r\n% LOWESS fit depends on specifying the \"right\" value for the smoothing \r\n% parameter.  We're going generate 99 different LOWESS models, using \r\n% smoothing parameters between zero and one, and see which value generates \r\n% the most accurate model.\r\n%\r\n% This type of brute force technique often runs into problems with \r\n% \"overfitting\".  The LOWESS curve that minimizes the \"sum of the \r\n% squared errors\" fits both the signal and the noise component of \r\n% the data set.  \r\n%\r\n% To protect against overfitting, we're going to use a technique called \r\n% cross validation.  We're going to divide the data set into different\r\n% training sets and test sets.  We'll generate our predictive model using\r\n% the data in the training set, and then measure the accuracy of the model\r\n% using the data in the test set.\r\nnum = 99;\r\nspans = linspace(.01,.99,num);\r\nsse = zeros(size(spans));\r\ncp = cvpartition(100,'k',10);\r\n\r\nfor j=1:length(spans),\r\n    f = @(train,test) norm(test(:,2) - mylowess(train,test(:,1),spans(j)))^2;\r\n    sse(j) = sum(crossval(f,[X,noisy],'partition',cp));\r\nend\r\n\r\n[minsse,minj] = min(sse);\r\nspan = spans(minj);\r\n\r\n%%\r\n% |mylowess| is a function that I wrote that estimates values for a lowess\r\n% smooth.  I've included a copy of |mylowess.m| for you to use.\r\n%\r\n\r\ntype mylowess.m\r\n\r\n%%  Evaluate the accuracy of our LOWESS fit\r\n% Next, let's create a plot that shows\r\n%\r\n% # Our original second order Fourier Series\r\n% # The noisy sample from this curve\r\n% # The nonlinear regression model\r\n% # The LOWESS model\r\nplot(X,Y, 'k');\r\nhold on\r\nscatter(X,noisy, 'k');\r\nplot(foo, 'r')\r\n\r\nx = linspace(min(X),max(X));\r\nline(x,mylowess([X,noisy],x,span),'color','b','linestyle','-', 'linewidth',2)\r\nlegend('Clean Data', 'Noisy Sample', 'Nonlinear Regression', 'LOWESS', 2)\r\nhold off\r\nsnapnow\r\n%%\r\n% As you can see, the LOWESS curve is doing a great job capturing the\r\n% dynamics of the system.  We were able to generate a LOWESS fit that is\r\n% almost as accurate as the nonlinear regression without the need to\r\n% specify that this system should be modeled as a second order Fourier\r\n% series.\r\n\r\n%% Special Guest Appearance by bootstrp\r\n% Here's a bonus for those you you who stuck around to the end.  This next\r\n% bit of code generates confidence intervals for the LOWESS fit using a \r\n% \"paired bootstrap\".  \r\nscatter(X, noisy)\r\n \r\nf = @(xy) mylowess(xy,X,span);\r\nyboot2 = bootstrp(1000,f,[X,noisy])';\r\nmeanloess = mean(yboot2,2);\r\nh1 = line(X, meanloess,'color','k','linestyle','-','linewidth',2);\r\n \r\nstdloess = std(yboot2,0,2);\r\nh2 = line(X, meanloess+2*stdloess,'color','r','linestyle','REPLACE_WITH_DASH_DASH','linewidth',2);\r\nh3 = line(X, meanloess-2*stdloess,'color','r','linestyle','REPLACE_WITH_DASH_DASH','linewidth',2);\r\n \r\nL5 = legend('Localized Regression','Confidence Intervals',2);\r\nsnapnow\r\n%%\r\n% Can anyone provide a good intuitive explanation how the bootstrap works?\r\n% For extra credit:\r\n%\r\n% * Can you convert this example from a \"paired bootstrap\" to a \"residual bootstrap\"?\r\n% * Do you see any advantages or limitations to the different types of\r\n% bootstraps?\r\n%\r\n% Post your answers <https:\/\/blogs.mathworks.com\/loren\/?p=261#respond here>.\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n##### SOURCE END ##### 8cfc5c33409142f6bab064dbe1d0774a\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      This week Richard Willey from technical marketing will be guest blogging about nonparametric fitting.\r\n   \r\n   Contents\r\n   \r\n      \r\n         Generate some data... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2011\/01\/13\/data-driven-fitting\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[39,23,43],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/261"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=261"}],"version-history":[{"count":1,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/261\/revisions"}],"predecessor-version":[{"id":2534,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/261\/revisions\/2534"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}