{"id":2241,"date":"2008-06-06T10:47:40","date_gmt":"2008-06-06T15:47:40","guid":{"rendered":"https:\/\/blogs.mathworks.com\/pick\/2008\/06\/06\/reading-formatted-text\/"},"modified":"2016-05-11T10:16:43","modified_gmt":"2016-05-11T14:16:43","slug":"reading-formatted-text","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/pick\/2008\/06\/06\/reading-formatted-text\/","title":{"rendered":"Reading Formatted Text"},"content":{"rendered":"<div class=\"content\">\r\n\r\nJiro's pick this week is <a title=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/loadFile.do?objectId=16075&amp;objectType=file (link no longer works)\">TEXTSCANTOOL<\/a> by our very own Stuart McGarrity.\r\n\r\n&nbsp;\r\n<h3>Contents<\/h3>\r\n<div>\r\n<ul>\r\n\t<li><a href=\"#2\">My Data File<\/a><\/li>\r\n\t<li><a href=\"#3\">Header Lines<\/a><\/li>\r\n\t<li><a href=\"#4\">Data Types<\/a><\/li>\r\n\t<li><a href=\"#5\">Import and Generate Code<\/a><\/li>\r\n\t<li><a href=\"#6\">Video Tutorial<\/a><\/li>\r\n\t<li><a href=\"#7\">Comments<\/a><\/li>\r\n<\/ul>\r\n<\/div>\r\nIn many of my projects, reading in the data files is often the first step. I utilize various methods, ranging from double-clicking on the data file to using high-level import functions (such as <tt>xlsread<\/tt> and <tt>load<\/tt>) to using low-level functions (such as <tt>textscan<\/tt> and <tt>fread<\/tt>). The more unconventional the data format is, the more I rely on low-level functions.\r\n<h3>My Data File<a name=\"2\"><\/a><\/h3>\r\nLet's take a look at this particular data file:\r\n\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/textscantool_sc0.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nHave you ever had to deal with this type of format - comma-separated file, arbitrary number of header lines, a row with label\r\nnames, and a mix of numeric and text data? I have, quite often.\r\n\r\nStuart's <tt>textscantool<\/tt> allows you to easily bring this data in, by working in conjunction with MATLAB's <tt>textscan<\/tt> function. It provides a nice graphical interface to quickly parse through a formatted ascii file and construct an automated\r\nimport function for reading similar files.\r\n<h3>Header Lines<a name=\"3\"><\/a><\/h3>\r\nThe tool takes you through a sequence of steps to import a file. First, you can indicate how many header lines there are and\r\nwhich row will be used for the header names:\r\n\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/textscantool_sc1.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n<h3>Data Types<a name=\"4\"><\/a><\/h3>\r\nNext, you can individually specify the data types of the columns:\r\n\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/textscantool_sc2.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n<h3>Import and Generate Code<a name=\"5\"><\/a><\/h3>\r\nFinally, you can specify how to bring it in (array, cell, etc) and how many rows to import. This means that you can import\r\na single portion of a large file.\r\n\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/textscantool_sc3.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n\r\nAnd you click \"Import Data\" and off you go! Want to automate this process? Just click on \"Generate Code\", and you have a reusable\r\nfunction!\r\n\r\n<img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/textscantool_sc4.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/>\r\n<h3>Video Tutorial<a name=\"6\"><\/a><\/h3>\r\nWhat makes this entry complete is the video tutorial that Stuart includes with his function. And yes, he's the voice of many\r\nof our shipping tutorial videos.\r\n<h3>Comments<a name=\"7\"><\/a><\/h3>\r\nMATLAB provides numerous functions for importing files. Tell us <a href=\"https:\/\/blogs.mathworks.com\/pick\/?p=2241#respond\">here<\/a> how you use these functions to deal with your specific data files.\r\n\r\n<script>\/\/ <![CDATA[\r\nfunction grabCode_efdcac42812c450b83893addda2434a8() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='efdcac42812c450b83893addda2434a8 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' efdcac42812c450b83893addda2434a8';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Jiro Doke';\r\n        copyright = 'Copyright 2008 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('\r\n\r\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\r\n\r\n\r\n\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }\r\n\/\/ ]]><\/script>\r\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\">\r\n<a><span style=\"font-size: x-small; font-style: italic;\">Get\r\nthe MATLAB code\r\n<noscript>(requires JavaScript)<\/noscript><\/span><\/a>\r\n\r\nPublished with MATLAB\u00ae 7.6<\/p>\r\n\r\n<\/div>\r\n<!--\r\nefdcac42812c450b83893addda2434a8 ##### SOURCE BEGIN #####\r\n%%\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/loadAuthor.do?objectId=1094142&objectType=author % Jiro>'s pick this week is <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/loadFile.do?objectId=16075&objectType=file % TEXTSCANTOOL> by our very own <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/loadAuthor.do?objectType=author&objectId=126174 % Stuart McGarrity>.\r\n\r\n%%\r\n% In many of my projects, reading in the data files is often the first\r\n% step. I utilize various <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/matlab_prog\/f5-4931.html % methods>, ranging from double-clicking on the data\r\n% file to using high-level import functions (such as |xlsread| and |load|)\r\n% to using low-level functions (such as |textscan| and |fread|). The more\r\n% unconventional the data format is, the more I rely on low-level\r\n% functions.\r\n\r\n%% My Data File\r\n% Let's take a look at this particular data file:\r\n%\r\n% <<textscantool_sc0.png>>\r\n%\r\n% Have you ever had to deal with this type of format - comma-separated\r\n% file, arbitrary number of header lines, a row with label names, and a mix\r\n% of numeric and text data? I have, quite often.\r\n%\r\n% Stuart's |textscantool| allows you to easily bring this data in, by\r\n% working in conjunction with MATLAB's |textscan| function. It provides a\r\n% nice graphical interface to quickly parse through a formatted ascii file\r\n% and construct an automated import function for reading similar files.\r\n\r\n%% Header Lines\r\n% The tool takes you through a sequence of steps to import a file. First,\r\n% you can indicate how many header lines there are and which row will be\r\n% used for the header names:\r\n%\r\n% <<textscantool_sc1.png>>\r\n\r\n%% Data Types\r\n% Next, you can individually specify the data types of the columns:\r\n%\r\n% <<textscantool_sc2.png>>\r\n\r\n%% Import and Generate Code\r\n% Finally, you can specify how to bring it in (array, cell, etc) and how\r\n% many rows to import. This means that you can import a single portion of a\r\n% large file.\r\n%\r\n% <<textscantool_sc3.png>>\r\n%\r\n% And you click \"Import Data\" and off you go! Want to automate this\r\n% process? Just click on \"Generate Code\", and you have a reusable function!\r\n%\r\n% <<textscantool_sc4.png>>\r\n\r\n%% Video Tutorial\r\n% What makes this entry complete is the video tutorial that Stuart includes\r\n% with his function. And yes, he's the voice of many of our shipping\r\n% <https:\/\/www.mathworks.com\/products\/matlab\/demos.html tutorial videos>.\r\n\r\n%% Comments\r\n%\r\n% MATLAB provides numerous functions for importing files. Tell us\r\n% <https:\/\/blogs.mathworks.com\/pick\/?p=2241#respond here>\r\n% how you use these functions to deal with your specific data files.\r\n##### SOURCE END ##### efdcac42812c450b83893addda2434a8\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n\r\nJiro's pick this week is TEXTSCANTOOL by our very own Stuart McGarrity.\r\n\r\n&nbsp;\r\nContents\r\n\r\n\r\n\tMy Data File\r\n\tHeader Lines\r\n\tData Types\r\n\tImport and Generate Code\r\n\tVideo... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/pick\/2008\/06\/06\/reading-formatted-text\/\">read more >><\/a><\/p>","protected":false},"author":35,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/2241"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/comments?post=2241"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/2241\/revisions"}],"predecessor-version":[{"id":7148,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/2241\/revisions\/7148"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/media?parent=2241"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/categories?post=2241"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/tags?post=2241"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}