{"id":487,"date":"2012-02-28T07:00:07","date_gmt":"2012-02-28T12:00:07","guid":{"rendered":"https:\/\/blogs.mathworks.com\/steve\/?p=487"},"modified":"2019-10-31T14:42:02","modified_gmt":"2019-10-31T18:42:02","slug":"writing-a-file-reader-in-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/steve\/2012\/02\/28\/writing-a-file-reader-in-matlab\/","title":{"rendered":"Writing a file reader in MATLAB"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p><i>I'd like to introduce guest blogger Jeff Mather. Jeff started his MathWorks career as an application support engineer. Then,\r\n            very many moons ago, I hired Jeff as a software developer to work on image and scientific file formats. Jeff's now on the\r\n            Image Processing Toolbox development team, but as you'll see he still thinks about file formats from time to time. I first\r\n            saw Jeff's comments below <a title=\"http:\/\/jeffmatherphotography.com\/dispatches\/2012\/02\/writing-a-file-reader-in-matlab\/ (link no longer works)\">posted on his personal blog<\/a>, and I asked him if I could share them here. Thanks, Jeff!<\/i><\/p>\r\n      <p>A colleague recently asked me to help him read an NRRD file in MATLAB, which supports reading a whole bunch of image and scientific\r\n         data formats right out-of-the-box but not NRRD. This format stores 3D volumes of radiology data and (like FITS) contains a\r\n         text header containing key-value pairs followed by a binary payload. Having written file parsers full-time for the better\r\n         part of ten years, it didn't take too long for me to create a <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34653-nrrd-format-file-reader\">.nrrd file reader for MATLAB<\/a>.\r\n      <\/p>\r\n      <p>I'm kind of proud of this little feature for its simplicity, and it shows a lot of the power of MATLAB. In fewer than 200\r\n         lines of well-structured code, I was able to implement a robust file reader. Here are a few features it uses that anyone creating\r\n         their own file reader in MATLAB might also try to take advantage of:\r\n      <\/p>\r\n      <p><b>assert<\/b> - Stop writing if blocks that only exist to check whether everything is okay and error if it isn't.\r\n      <\/p><pre> fid = fopen(filename, 'rb');\r\n assert(fid &gt; 0, 'Could not open file.');<\/pre><\/introduction>\r\n   <p style=\"margin-top: 10px;\">And . . .<\/p><pre> assert(isfield(meta, 'sizes') &amp;&amp; ...\r\n        isfield(meta, 'dimension') &amp;&amp; ...\r\n        isfield(meta, 'encoding') &amp;&amp; ...\r\n        isfield(meta, 'endian'), ...\r\n        'Missing required metadata fields.')<\/pre><p style=\"margin-top: 10px;\"><b>onCleanup<\/b> - Why worry about trying to remember to clean up resources? Let the onCleanup class take care of it for you. Construct one\r\n      of these objects by giving it an anonymous function that closes your file handle when the object goes out of scope&#8212;whether\r\n      from an error or at the end of the function.\r\n   <\/p><pre> cleaner = onCleanup(@() fclose(fid));<\/pre><p style=\"margin-top: 10px;\"><b>regexp<\/b> - Use MATLAB's regular expression engine to handle complicated text parsing for you.\r\n   <\/p><pre> theLine = fgetl(fid);<\/pre><pre> % \"fieldname:= value\" or \"fieldname: value\" or \"fieldname:value\"\r\n parsedLine = regexp(theLine, ':=?\\s*', 'split', 'once');<\/pre><p style=\"margin-top: 10px;\"><b>Dynamic structure field indexing<\/b> - If you have a string that's a legal MATLAB identifier, there's no need to write complicated logic just to use it as a field\r\n      name in a structure. Simply use the <tt>.(string)<\/tt> construct.\r\n   <\/p><pre> field = lower(parsedLine{1});\r\n value = parsedLine{2};<\/pre><pre> field(isspace(field)) = '';  % Remove embedded spaces.\r\n meta(1).(field) = value;<\/pre><p style=\"margin-top: 10px;\"><b>Using temporary files to decompress data<\/b> - The NRRD format supports storing the image data as raw bytes, human readable text, or GZIP-compressed byte streams. When\r\n      a file contains compressed or encapsulated data and MATLAB has a file reader capable of handling that, it's easiest just to\r\n      write the data to a temporary file and use the supported reader. Consider the <tt>readData()<\/tt> subfunction that recursively handles three different kinds of encoding:\r\n   <\/p><pre> function data = readData(fidIn, meta, datatype)<\/pre><pre> switch (meta.encoding)\r\n  case {'raw'}<\/pre><pre>   data = fread(fidIn, inf, [datatype '=&gt;' datatype]);<\/pre><pre>  case {'gzip', 'gz'}<\/pre><pre>   tmpBase = tempname();\r\n   tmpFile = [tmpBase '.gz'];\r\n   fidTmp = fopen(tmpFile, 'wb');\r\n   assert(fidTmp &gt; 3, ...\r\n      'Could not open temporary file for GZIP decompression')<\/pre><pre>   tmp = fread(fidIn, inf, 'uint8=&gt;uint8');\r\n   fwrite(fidTmp, tmp, 'uint8');\r\n   fclose(fidTmp);<\/pre><pre>   gunzip(tmpFile)<\/pre><pre>   fidTmp = fopen(tmpBase, 'rb');\r\n   cleaner = onCleanup(@() fclose(fidTmp));<\/pre><pre>   meta.encoding = 'raw';\r\n   data = readData(fidTmp, meta, datatype);<\/pre><pre>  case {'txt', 'text', 'ascii'}<\/pre><pre>   data = fscanf(fidIn, '%f');\r\n   data = cast(data, datatype);<\/pre><pre>  otherwise\r\n   assert(false, 'Unsupported encoding')\r\n end<\/pre><p style=\"margin-top: 10px;\"><b>swapbytes<\/b> - Like many formats, NRRD supports little-endian and big-endian byte ordering. The <tt>swapbytes<\/tt> function makes it dead simple to change endianness, and the computer function helps you determine whether swapping is necessary.\r\n      Here's the pattern, which uses the \"endian\" metadata value read from the .nrrd file:\r\n   <\/p><pre> function data = adjustEndian(data, meta)<\/pre><pre> [~,~,endian] = computer();<\/pre><pre> needToSwap = (isequal(endian, 'B') &amp;&amp; ...\r\n                isequal(lower(meta.endian), 'little')) || ...\r\n              (isequal(endian, 'L') &amp;&amp; ...\r\n               isequal(lower(meta.endian), 'big'));<\/pre><pre> if (needToSwap)\r\n     data = swapbytes(data);\r\n end<\/pre><p style=\"margin-top: 10px;\">Happy coding!<\/p>\r\n   <p>- Jeff Mather<\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_a646faa87c724b33aacf453ff512f801() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='a646faa87c724b33aacf453ff512f801 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' a646faa87c724b33aacf453ff512f801';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Jeff Mather';\r\n        copyright = 'Copyright 2012 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_a646faa87c724b33aacf453ff512f801()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.13<br><\/p>\r\n<\/div>\r\n<!--\r\na646faa87c724b33aacf453ff512f801 ##### SOURCE BEGIN #####\r\n%%\r\n% _I'd like to introduce guest blogger Jeff Mather. Jeff started his\r\n% MathWorks career as an application support engineer. Then, very many\r\n% moons ago, I hired Jeff as a software developer to work on image and\r\n% scientific file formats. Jeff's now on the Image Processing Toolbox\r\n% development team, but as you'll see he is still thinks about file\r\n% formats from time to time. I first saw Jeff's comments below\r\n% <http:\/\/jeffmatherphotography.com\/dispatches\/2012\/02\/writing-a-file-reader-in-matlab\/\r\n% posted on his personal blog>, and I asked him if I could share them here.\r\n% Thanks, Jeff!_\r\n%\r\n% A colleague recently asked me to help him read an NRRD file in MATLAB,\r\n% which supports reading a whole bunch of image and scientific data formats\r\n% right out-of-the-box but not NRRD. This format stores 3D volumes of\r\n% radiology data and (like FITS) contains a text header containing\r\n% key-value pairs followed by a binary payload. Having written file parsers\r\n% full-time for the better part of ten years, it didn't take too long for\r\n% me to create a\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34653-nrrd-format-file-reader\r\n% .nrrd file reader for MATLAB>.\r\n% \r\n% I'm kind of proud of this little feature for its simplicity, and it shows\r\n% a lot of the power of MATLAB. In fewer than 200 lines of well-structured\r\n% code, I was able to implement a robust file reader. Here are a few\r\n% features it uses that anyone creating their own file reader in MATLAB\r\n% might also try to take advantage of:\r\n% \r\n% *assert* - Stop writing if blocks that only exist to check whether\r\n% everything is okay and error if it isn't.\r\n% \r\n%   fid = fopen(filename, 'rb'); \r\n%   assert(fid > 0, 'Could not open file.');\r\n%   \r\n%%\r\n% And . . .\r\n% \r\n%   assert(isfield(meta, 'sizes') && ...\r\n%          isfield(meta, 'dimension') && ... \r\n%          isfield(meta, 'encoding') && ...\r\n%          isfield(meta, 'endian'), ... \r\n%          'Missing required metadata fields.')\r\n%    \r\n%% \r\n%\r\n% *onCleanup* - Why worry about trying to remember to clean up resources?\r\n% Let the onCleanup class take care of it for you. Construct one of these\r\n% objects by giving it an anonymous function that closes your file handle\r\n% when the object goes out of scope\u00e2\u20ac\u201dwhether from an error or at the end of\r\n% the function.\r\n% \r\n%   cleaner = onCleanup(@() fclose(fid));\r\n%   \r\n%% \r\n% *regexp* - Use MATLAB's regular expression engine to handle complicated\r\n% text parsing for you.\r\n% \r\n%   theLine = fgetl(fid);\r\n%   \r\n%   % \"fieldname:= value\" or \"fieldname: value\" or \"fieldname:value\"\r\n%   parsedLine = regexp(theLine, ':=?\\s*', 'split', 'once');\r\n%   \r\n%% \r\n% *Dynamic structure field indexing* - If you have a string that's a legal\r\n% MATLAB identifier, there's no need to write complicated logic just to use\r\n% it as a field name in a structure. Simply use the |.(string)| construct.\r\n% \r\n%   field = lower(parsedLine{1}); \r\n%   value = parsedLine{2};\r\n%   \r\n%   field(isspace(field)) = '';  % Remove embedded spaces. \r\n%   meta(1).(field) = value;\r\n%   \r\n%% \r\n% *Using temporary files to decompress data* - The NRRD format supports\r\n% storing the image data as raw bytes, human readable text, or\r\n% GZIP-compressed byte streams. When a file contains compressed or\r\n% encapsulated data and MATLAB has a file reader capable of handling that,\r\n% it's easiest just to write the data to a temporary file and use the\r\n% supported reader. Consider the |readData()| subfunction that recursively\r\n% handles three different kinds of encoding:\r\n% \r\n%   function data = readData(fidIn, meta, datatype)\r\n%   \r\n%   switch (meta.encoding)\r\n%    case {'raw'}\r\n%   \r\n%     data = fread(fidIn, inf, [datatype '=>' datatype]);\r\n%   \r\n%    case {'gzip', 'gz'}\r\n%   \r\n%     tmpBase = tempname(); \r\n%     tmpFile = [tmpBase '.gz']; \r\n%     fidTmp = fopen(tmpFile, 'wb'); \r\n%     assert(fidTmp > 3, ...\r\n%        'Could not open temporary file for GZIP decompression')\r\n%   \r\n%     tmp = fread(fidIn, inf, 'uint8=>uint8'); \r\n%     fwrite(fidTmp, tmp, 'uint8');\r\n%     fclose(fidTmp);\r\n%   \r\n%     gunzip(tmpFile)\r\n%   \r\n%     fidTmp = fopen(tmpBase, 'rb'); \r\n%     cleaner = onCleanup(@() fclose(fidTmp));\r\n%   \r\n%     meta.encoding = 'raw'; \r\n%     data = readData(fidTmp, meta, datatype);\r\n%   \r\n%    case {'txt', 'text', 'ascii'}\r\n%   \r\n%     data = fscanf(fidIn, '%f'); \r\n%     data = cast(data, datatype);\r\n%   \r\n%    otherwise\r\n%     assert(false, 'Unsupported encoding')\r\n%   end\r\n%   \r\n%% \r\n% *swapbytes* - Like many formats, NRRD supports little-endian and big-endian\r\n% byte ordering. The |swapbytes| function makes it dead simple to change\r\n% endianness, and the computer function helps you determine whether\r\n% swapping is necessary. Here's the pattern, which uses the \"endian\"\r\n% metadata value read from the .nrrd file:\r\n% \r\n%   function data = adjustEndian(data, meta)\r\n%   \r\n%   [~,~,endian] = computer();\r\n%   \r\n%   needToSwap = (isequal(endian, 'B') && ...\r\n%                  isequal(lower(meta.endian), 'little')) || ...\r\n%                (isequal(endian, 'L') && ...\r\n%                 isequal(lower(meta.endian), 'big'));\r\n%   \r\n%   if (needToSwap)\r\n%       data = swapbytes(data);\r\n%   end\r\n%   \r\n%% \r\n% Happy coding!\r\n%\r\n% - Jeff Mather\r\n##### SOURCE END ##### a646faa87c724b33aacf453ff512f801\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      I'd like to introduce guest blogger Jeff Mather. Jeff started his MathWorks career as an application support engineer. Then,\r\n            very many moons ago, I hired Jeff as a software... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/steve\/2012\/02\/28\/writing-a-file-reader-in-matlab\/\">read more >><\/a><\/p>","protected":false},"author":42,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[615,859,861,681,847,677,679,857,853,855,346,843,851,849,845,623,607,729],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/487"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/users\/42"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/comments?post=487"}],"version-history":[{"count":11,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/487\/revisions"}],"predecessor-version":[{"id":3781,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/posts\/487\/revisions\/3781"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/media?parent=487"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/categories?post=487"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/steve\/wp-json\/wp\/v2\/tags?post=487"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}