Writing a file reader in MATLAB
I'd like to introduce guest blogger Jeff Mather. Jeff started his MathWorks career as an application support engineer. Then, very many moons ago, I hired Jeff as a software developer to work on image and scientific file formats. Jeff's now on the Image Processing Toolbox development team, but as you'll see he still thinks about file formats from time to time. I first saw Jeff's comments below posted on his personal blog, and I asked him if I could share them here. Thanks, Jeff!
A colleague recently asked me to help him read an NRRD file in MATLAB, which supports reading a whole bunch of image and scientific data formats right out-of-the-box but not NRRD. This format stores 3D volumes of radiology data and (like FITS) contains a text header containing key-value pairs followed by a binary payload. Having written file parsers full-time for the better part of ten years, it didn't take too long for me to create a .nrrd file reader for MATLAB.
I'm kind of proud of this little feature for its simplicity, and it shows a lot of the power of MATLAB. In fewer than 200 lines of well-structured code, I was able to implement a robust file reader. Here are a few features it uses that anyone creating their own file reader in MATLAB might also try to take advantage of:
assert - Stop writing if blocks that only exist to check whether everything is okay and error if it isn't.
fid = fopen(filename, 'rb'); assert(fid > 0, 'Could not open file.');
And . . .
assert(isfield(meta, 'sizes') && ... isfield(meta, 'dimension') && ... isfield(meta, 'encoding') && ... isfield(meta, 'endian'), ... 'Missing required metadata fields.')
onCleanup - Why worry about trying to remember to clean up resources? Let the onCleanup class take care of it for you. Construct one of these objects by giving it an anonymous function that closes your file handle when the object goes out of scope—whether from an error or at the end of the function.
cleaner = onCleanup(@() fclose(fid));
regexp - Use MATLAB's regular expression engine to handle complicated text parsing for you.
theLine = fgetl(fid);
% "fieldname:= value" or "fieldname: value" or "fieldname:value" parsedLine = regexp(theLine, ':=?\s*', 'split', 'once');
Dynamic structure field indexing - If you have a string that's a legal MATLAB identifier, there's no need to write complicated logic just to use it as a field name in a structure. Simply use the .(string) construct.
field = lower(parsedLine{1}); value = parsedLine{2};
field(isspace(field)) = ''; % Remove embedded spaces. meta(1).(field) = value;
Using temporary files to decompress data - The NRRD format supports storing the image data as raw bytes, human readable text, or GZIP-compressed byte streams. When a file contains compressed or encapsulated data and MATLAB has a file reader capable of handling that, it's easiest just to write the data to a temporary file and use the supported reader. Consider the readData() subfunction that recursively handles three different kinds of encoding:
function data = readData(fidIn, meta, datatype)
switch (meta.encoding) case {'raw'}
data = fread(fidIn, inf, [datatype '=>' datatype]);
case {'gzip', 'gz'}
tmpBase = tempname(); tmpFile = [tmpBase '.gz']; fidTmp = fopen(tmpFile, 'wb'); assert(fidTmp > 3, ... 'Could not open temporary file for GZIP decompression')
tmp = fread(fidIn, inf, 'uint8=>uint8'); fwrite(fidTmp, tmp, 'uint8'); fclose(fidTmp);
gunzip(tmpFile)
fidTmp = fopen(tmpBase, 'rb'); cleaner = onCleanup(@() fclose(fidTmp));
meta.encoding = 'raw'; data = readData(fidTmp, meta, datatype);
case {'txt', 'text', 'ascii'}
data = fscanf(fidIn, '%f'); data = cast(data, datatype);
otherwise assert(false, 'Unsupported encoding') end
swapbytes - Like many formats, NRRD supports little-endian and big-endian byte ordering. The swapbytes function makes it dead simple to change endianness, and the computer function helps you determine whether swapping is necessary. Here's the pattern, which uses the "endian" metadata value read from the .nrrd file:
function data = adjustEndian(data, meta)
[~,~,endian] = computer();
needToSwap = (isequal(endian, 'B') && ... isequal(lower(meta.endian), 'little')) || ... (isequal(endian, 'L') && ... isequal(lower(meta.endian), 'big'));
if (needToSwap) data = swapbytes(data); end
Happy coding!
- Jeff Mather
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.