{"id":935,"date":"2014-01-14T07:00:09","date_gmt":"2014-01-14T12:00:09","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/?p=935"},"modified":"2019-11-01T09:44:27","modified_gmt":"2019-11-01T13:44:27","slug":"automating-data-extraction-3","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2014\/01\/14\/automating-data-extraction-3\/","title":{"rendered":"Automating the extraction of real data from an image of the data &#8211; part 3"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p><i>I'd like to welcome back my fellow MATLAB Central blogger Brett Shoelson for the last in a three-part series on extracting curve values from a plot. You can find Brett over at the <a href=\"https:\/\/blogs.mathworks.com\/pick\/\">File Exchange Pick of the Week blog<\/a>, or you can check out his many <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/911\">File Exchange contributions<\/a>. -Steve<\/i><\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#182e35e3-7128-4d82-a0d4-08ed9f692a18\">Quick recap<\/a><\/li><li><a href=\"#b6e931bf-65c4-4819-a177-734c1db58035\">Final comment<\/a><\/li><li><a href=\"#37baa2a6-f94d-4b92-a6bd-849424d6f7f4\">The complete series<\/a><\/li><\/ul><\/div><h4>Quick recap<a name=\"182e35e3-7128-4d82-a0d4-08ed9f692a18\"><\/a><\/h4><p>If you've followed Steve's blog for the past couple weeks, you'll know that Steve has graciously allowed me to demonstrate how one might extract real data from a graphical depiction of the data. In this final post in this three-part series, I will use the coordinates I extracted from the curve of interest to fit a predictive model to those data. Recall that I showed two approaches to determine the <i>x<\/i>- <i>y<\/i>- coordinates of the efficiency curve:<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/pick\/files\/PointExtractionApproaches.png\" alt=\"\"> <\/p><p>First, load the variables <tt>xs<\/tt>, <tt>ys<\/tt>, and <tt>bb<\/tt>, which were computed in the previous post.<\/p><pre class=\"codeinput\">tempfile = [tempname <span class=\"string\">'.mat'<\/span>];\r\nurl = <span class=\"string\">'https:\/\/blogs.mathworks.com\/images\/steve\/2013\/curve.mat'<\/span>;\r\nurlwrite(url,tempfile);\r\ns = load(tempfile);\r\nxs = s.xs;\r\nys = s.ys;\r\nbb = s.bb;\r\ndelete(tempfile);\r\n<\/pre><p>Clearly, the results obtained using <tt>regionprops<\/tt> were better than those I got using <tt>bwtraceboundary<\/tt>. We may as well use those values for our curve fit. Before we do that, though, it will be useful to transform the data to account for the fact that the extracted coordinates are in units of pixels, rather than units of flow rate and efficiency. I do that manually here:<\/p><pre class=\"codeinput\">flowLims = [0 240];\r\nefficiencyLims = [0 100];\r\n\r\n<span class=\"comment\">% Scale data to specified limits for curvefit<\/span>\r\n<span class=\"comment\">% Scale xs:<\/span>\r\nxs  = (xs-bb(1))\/bb(3)*diff(flowLims)+flowLims(1);\r\n<span class=\"comment\">% Scale ys:<\/span>\r\nys = (bb(2)+bb(4)-ys)\/bb(4); <span class=\"comment\">%Percentages = Efficiencies<\/span>\r\n<\/pre><p>(By the way, if you ever need to fit data to anything other than a very simple polynomial, and if you don't have the <a href=\"https:\/\/www.mathworks.com\/help\/curvefit\/index.html\">Curve Fitting Toolbox<\/a>, I hope this motivates you to get it; it is is an <i>excellent<\/i> tool that will make your life much easier!)<\/p><p>You can fit to just about any type of equation that makes sense. (You can even provide a custom equation, if you have one in mind.) I can, for instance, get a reasonably good robust \"Bisquare\" fit using a two-term exponential. Or, since I just want to <i>represent<\/i> my data, and not force them to a specific form, I could fit them to a smoothing spline.<\/p><pre class=\"language-matlab\">cftool(xs,ys)\r\n<\/pre><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/cftoolImg.png\" alt=\"\"> <\/p><p>When you've interactively selected your fit options, you can readily tell MATLAB to generate code, with which you can apply the same routine to subsequent images (data sets).<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/pick\/files\/cftoolImg2.png\" alt=\"\"> <\/p><p>Here's the code that <tt>cftool<\/tt> told me to use:<\/p><pre class=\"codeinput\">[xData, yData] = prepareCurveData( xs, ys );\r\n\r\n<span class=\"comment\">% Set up fittype and options.<\/span>\r\nft = fittype( <span class=\"string\">'smoothingspline'<\/span> );\r\nopts = fitoptions( <span class=\"string\">'Method'<\/span>, <span class=\"string\">'SmoothingSpline'<\/span> );\r\nopts.SmoothingParam = 0.004;\r\n\r\n<span class=\"comment\">% Fit model to data.<\/span>\r\n[fitresult, gof] = fit( xData, yData, ft, opts );\r\n\r\n<span class=\"comment\">% Plot fit with data.<\/span>\r\nfigure( <span class=\"string\">'Name'<\/span>, <span class=\"string\">'untitled fit 1'<\/span> );\r\nh = plot( fitresult, xData, yData );\r\nlegend( h, <span class=\"string\">'ys vs. xs'<\/span>, <span class=\"string\">'untitled fit 1'<\/span>, <span class=\"string\">'Location'<\/span>, <span class=\"string\">'NorthEast'<\/span> );\r\n<span class=\"comment\">% Label axes<\/span>\r\nxlabel( <span class=\"string\">'xs'<\/span> );\r\nylabel( <span class=\"string\">'ys'<\/span> );\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/extractEfficiencyWriteup3_01.png\" alt=\"\"> <p>We could, for illustration, calculate the efficiency of the pump when the flow rate is 100 m^3\/hr:<\/p><pre class=\"codeinput\">flow100 = fitresult(100)\r\n<\/pre><pre class=\"codeoutput\">\r\nflow100 =\r\n\r\n    0.5997\r\n\r\n<\/pre><p>If you refer to the <a href=\"https:\/\/blogs.mathworks.com\/pick\/files\/OriginalChart.png\">original image<\/a>, you'll see that that value perfectly reflects the efficiency at the specified flow rate!<\/p><h4>Final comment<a name=\"b6e931bf-65c4-4819-a177-734c1db58035\"><\/a><\/h4><p>I don't really have a good sense for how well this exact approach will work for subsequent images. I tried to make the code fairly general, but the <i>automatability<\/i> of the problem depends markedly on the ability to exploit similarities in the data set (images). Therein lies the art of image processing!<\/p><h4>The complete series<a name=\"37baa2a6-f94d-4b92-a6bd-849424d6f7f4\"><\/a><\/h4><div><ul><li><a href=\"https:\/\/blogs.mathworks.com\/steve\/2013\/12\/31\/automating-data-extraction-1\/\">Part 1<\/a><\/li><li><a href=\"https:\/\/blogs.mathworks.com\/steve\/2014\/01\/07\/automating-data-extraction-2\/\">Part 2<\/a><\/li><li><a href=\"https:\/\/blogs.mathworks.com\/steve\/2014\/01\/14\/automating-data-extraction-3\/\">Part 3<\/a><\/li><\/ul><\/div><script language=\"JavaScript\"> <!-- \r\n    function grabCode_3190097ba37d4177bec2db3f8409ba98() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='3190097ba37d4177bec2db3f8409ba98 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 3190097ba37d4177bec2db3f8409ba98';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2013 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_3190097ba37d4177bec2db3f8409ba98()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p><\/div><!--\r\n3190097ba37d4177bec2db3f8409ba98 ##### SOURCE BEGIN #####\r\n%% Automating the extraction of real data from an image of the data, Part 3\r\n%\r\n% _I'd like to welcome back my fellow MATLAB Central blogger Brett Shoelson\r\n% for the last in a three-part series on extracting curve values from a\r\n% plot. You can find Brett over at the <https:\/\/blogs.mathworks.com\/pick\/ \r\n% File Exchange Pick of the Week blog>, or you \r\n% can check out his many <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/911 \r\n% File Exchange contributions>. -Steve_\r\n%\r\n%% Quick recap\r\n% If you've followed Steve's blog for the past couple weeks, you'll know\r\n% that Steve has graciously allowed me to demonstrate how one might extract\r\n% real data from a graphical depiction of the data. In this final post in\r\n% this three-part series, I will use the coordinates I extracted from the\r\n% curve of interest to fit a predictive model to those data. Recall that I\r\n% showed two approaches to determine the _x_- _y_- coordinates of the\r\n% efficiency curve:\r\n\r\n%%\r\n% \r\n% <<https:\/\/blogs.mathworks.com\/pick\/files\/PointExtractionApproaches.png>>\r\n% \r\n\r\n%%\r\n% First, load the variables |xs|, |ys|, and |bb|, which were computed in the\r\n% <https:\/\/blogs.mathworks.com\/steve\/2013\/12\/13\/automating-data-extraction-2\/\r\n% previous post>.\r\ntempfile = [tempname '.mat'];\r\nurl = 'https:\/\/blogs.mathworks.com\/images\/steve\/2013\/curve.mat';\r\nurlwrite(url,tempfile);\r\ns = load(tempfile);\r\nxs = s.xs;\r\nys = s.ys;\r\nbb = s.bb;\r\ndelete(tempfile);\r\n\r\n%%\r\n% Clearly, the results obtained using |regionprops| were better than those\r\n% I got using |bwtraceboundary|. We may as well use those values for our\r\n% curve fit. Before we do that, though, it will be useful to transform the\r\n% data to account for the fact that the extracted coordinates are in units\r\n% of pixels, rather than units of flow rate and efficiency. I do that manually here:\r\nflowLims = [0 240];\r\nefficiencyLims = [0 100];\r\n   \r\n% Scale data to specified limits for curvefit\r\n% Scale xs:\r\nxs  = (xs-bb(1))\/bb(3)*diff(flowLims)+flowLims(1);\r\n% Scale ys:\r\nys = (bb(2)+bb(4)-ys)\/bb(4); %Percentages = Efficiencies\r\n\r\n%%\r\n% (By the way, if you ever need to fit data to anything other than a very\r\n% simple polynomial, and if you don't have the\r\n% <https:\/\/www.mathworks.com\/help\/curvefit\/index.html Curve Fitting Toolbox>,\r\n% I hope this motivates you to get it; it is is an _excellent_\r\n% tool that will make your life much easier!)\r\n%\r\n%%\r\n% You can fit to just about any type of equation that makes sense. (You can\r\n% even provide a custom equation, if you have one in mind.) I can, for\r\n% instance, get a reasonably good robust \"Bisquare\" fit using a two-term\r\n% exponential. Or, since I just want to _represent_ my data, and not force\r\n% them to a specific form, I could fit them to a smoothing spline.\r\n%\r\n%   cftool(xs,ys)\r\n%\r\n% <<https:\/\/blogs.mathworks.com\/images\/steve\/2013\/cftoolImg.png>>\r\n\r\n%%\r\n% When you've interactively selected your fit options, you can readily tell\r\n% MATLAB to generate code, with which you can apply the same routine to\r\n% subsequent images (data sets).\r\n\r\n%%\r\n% <<https:\/\/blogs.mathworks.com\/pick\/files\/cftoolImg2.png>>\r\n% \r\n%\r\n% Here's the code that |cftool| told me to use:\r\n\r\n[xData, yData] = prepareCurveData( xs, ys );\r\n\r\n% Set up fittype and options.\r\nft = fittype( 'smoothingspline' );\r\nopts = fitoptions( 'Method', 'SmoothingSpline' );\r\nopts.SmoothingParam = 0.004;\r\n\r\n% Fit model to data.\r\n[fitresult, gof] = fit( xData, yData, ft, opts );\r\n\r\n% Plot fit with data.\r\nfigure( 'Name', 'untitled fit 1' );\r\nh = plot( fitresult, xData, yData );\r\nlegend( h, 'ys vs. xs', 'untitled fit 1', 'Location', 'NorthEast' );\r\n% Label axes\r\nxlabel( 'xs' );\r\nylabel( 'ys' );\r\ngrid on\r\n\r\n%%\r\n% We could, for illustration, calculate the efficiency of the pump when\r\n% the flow rate is 100 m^3\/hr:\r\n\r\n%%\r\nflow100 = fitresult(100)\r\n\r\n%%\r\n% If you refer to the\r\n% <https:\/\/blogs.mathworks.com\/pick\/files\/OriginalChart.png original image>, \r\n% you'll see that that value perfectly reflects the efficiency at the \r\n% specified flow rate!\r\n\r\n%% Final comment\r\n% I don't really have a good sense for how well this exact approach will\r\n% work for subsequent images. I tried to make the code fairly general, but\r\n% the _automatability_ of the problem depends markedly on the ability to\r\n% exploit similarities in the data set (images). Therein lies the art of\r\n% image processing!\r\n\r\n%% The complete series\r\n%\r\n% * <https:\/\/blogs.mathworks.com\/steve\/2013\/12\/13\/automating-data-extraction-1\/ Part 1> \r\n% * <https:\/\/blogs.mathworks.com\/steve\/2013\/12\/13\/automating-data-extraction-2\/ Part 2>\r\n% * <https:\/\/blogs.mathworks.com\/steve\/2013\/12\/13\/automating-data-extraction-3\/ Part 3>\r\n##### SOURCE END ##### 3190097ba37d4177bec2db3f8409ba98\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/steve\/2013\/extractEfficiencyWriteup3_01.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p><i>I'd like to welcome back my fellow MATLAB Central blogger Brett Shoelson for the last in a three-part series on extracting curve values from a plot. You can find Brett over at the <a href=\"https:\/\/blogs.mathworks.com\/pick\/\">File Exchange Pick of the Week blog<\/a>, or you can check out his many <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/911\">File Exchange contributions<\/a>. -Steve<\/i>... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2014\/01\/14\/automating-data-extraction-3\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[1059,733,695,725,1065,1063,1067,1061,70,92,362,68,729,731,94,96],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/935"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=935"}],"version-history":[{"count":8,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/935\/revisions"}],"predecessor-version":[{"id":2305,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/935\/revisions\/2305"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=935"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=935"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=935"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}