{"id":2586,"date":"2010-08-20T14:04:16","date_gmt":"2010-08-20T14:04:16","guid":{"rendered":"https:\/\/blogs.mathworks.com\/pick\/2010\/08\/20\/get-html-table-data-into-matlab\/"},"modified":"2010-08-20T14:04:16","modified_gmt":"2010-08-20T14:04:16","slug":"get-html-table-data-into-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/pick\/2010\/08\/20\/get-html-table-data-into-matlab\/","title":{"rendered":"Get HTML Table Data into MATLAB"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p><a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/5021\">Bob<\/a>'s pick this week is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/22465-get-html-table-data-into-matlab\">Get HTML Table Data into MATLAB<\/a> by <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/22572\">Jeremy Barry<\/a>.\r\n      <\/p>\r\n   <\/introduction>\r\n   <p>Lots of web pages have tabulated data. Suppose you want to grab some data to analyze. You could select text, copy to your\r\n      clipboard, and import to MATLAB. But how would you do that automatically?\r\n   <\/p>\r\n   <p>Take this stock snapshot from the example script Jeremy included to illustrate the 3-step process.<\/p>\r\n   <p>1. Navigate to web page with desired data.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">web(<span style=\"color: #A020F0\">'http:\/\/finance.yahoo.com\/q\/ks?s=GOOG'<\/span>), pause(10)<\/pre><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/pick_gethtmldata1.png\"> <\/p>\r\n   <p>2. Parse the data.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">getTableFromWeb, pause(10)<\/pre><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/get_html_data.png\"> <\/p>\r\n   <p>3. Extract the desired region of interest.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">myTableData = getTableFromWeb(8)<\/pre><pre style=\"font-style:oblique\">myTableData = \r\n    'Market Cap (intraday)5:'              '149.14B'\r\n    'Enterprise Value (Aug 19, 2010)3:'    '123.61B'\r\n    'Trailing P\/E (ttm, intraday):'        '20.32'  \r\n    'Forward P\/E (fye Dec 31, 2011)1:'     '14.98'  \r\n    'PEG Ratio (5 yr expected):'           '1.10'   \r\n    'Price\/Sales (ttm):'                   '5.86'   \r\n    'Price\/Book (mrq):'                    '3.78'   \r\n    'Enterprise Value\/Revenue (ttm)3:'     '4.72'   \r\n    'Enterprise Value\/EBITDA (ttm)3:'      '11.42'  \r\n<\/pre><p>If you noticed the MATLAB icons in the web browser, those indicate different parsed regions that you could select. If you\r\n      click one of them from MATLAB's Browser that's exactly what it will do albiet manually of course. If you want to know which\r\n      number corresponds to each data region, watch the status bar for the callback link when you hover over each one with your\r\n      mouse. (If you noticed my screenshots are not perfectly synchronized with my published results then you might be very picky.\r\n      :)\r\n   <\/p>\r\n   <p><a href=\"https:\/\/blogs.mathworks.com\/pick\/?p=2586#respond\">Comments?<\/a><\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_d26e5299b58740b786d0284184fd1650() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='d26e5299b58740b786d0284184fd1650 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' d26e5299b58740b786d0284184fd1650';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Robert Bemis';\r\n        copyright = 'Copyright 2010 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_d26e5299b58740b786d0284184fd1650()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.10<br><\/p>\r\n<\/div>\r\n<!--\r\nd26e5299b58740b786d0284184fd1650 ##### SOURCE BEGIN #####\r\n%%\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/5021 Bob>'s \r\n% pick this week is \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/22465-get-html-table-data-into-matlab Get HTML Table Data into MATLAB> \r\n% by \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/22572 Jeremy Barry>.\r\n%%\r\n% Lots of web pages have tabulated data. Suppose you want to grab some data\r\n% to analyze. You could select text, copy to your clipboard, and import to\r\n% MATLAB. But how would you do that automatically?\r\n%% \r\n% Take this stock snapshot from the example script Jeremy included to\r\n% illustrate the 3-step process.  \r\n%%\r\n% 1. Navigate to web page with desired data.\r\nweb('http:\/\/finance.yahoo.com\/q\/ks?s=GOOG'), pause(10)\r\n%%\r\n% <<https:\/\/blogs.mathworks.com\/images\/pick\/pick_gethtmldata1.png>>\r\n%%\r\n% 2. Parse the data.\r\ngetTableFromWeb, pause(10)\r\n%%\r\n% <<https:\/\/blogs.mathworks.com\/images\/pick\/get_html_data.png>>\r\n%%\r\n% 3. Extract the desired region of interest.\r\nmyTableData = getTableFromWeb(8)\r\n%%\r\n% If you noticed the MATLAB icons in the web browser, those indicate\r\n% different parsed regions that you could select. If you click one of them\r\n% from MATLAB's Browser that's exactly what it will do albiet manually of\r\n% course. If you want to know which number corresponds to each data region,\r\n% watch the status bar for the callback link when you hover over each one\r\n% with your mouse. (If you noticed my screenshots are not perfectly\r\n% synchronized with my published results then you might be very picky. :)\r\n%%\r\n% <https:\/\/blogs.mathworks.com\/pick\/?p=2586#respond Comments?>\r\n\r\n##### SOURCE END ##### d26e5299b58740b786d0284184fd1650\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      Bob's pick this week is Get HTML Table Data into MATLAB by Jeremy Barry.\r\n      \r\n   \r\n   Lots of web pages have tabulated data. Suppose you want to grab some data to... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/pick\/2010\/08\/20\/get-html-table-data-into-matlab\/\">read more >><\/a><\/p>","protected":false},"author":46,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/2586"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/users\/46"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/comments?post=2586"}],"version-history":[{"count":0,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/2586\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/media?parent=2586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/categories?post=2586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/tags?post=2586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}