Steve on Image Processing

Under the hood of imread 14

Posted by Steve Eddins,

I'm going to play a small trick on you today. Try reading in this JPEG file using imread:

url = 'http://blogs.mathworks.com/images/steve/2014/peppers.jpg';
rgb = imread(url);
imshow(rgb)

So what's the trick? Well, look more closely at this file using imfinfo:

info = imfinfo(url)
info = 

                  Filename: 'http://blogs.mathworks.com/images/steve/2014/...'
               FileModDate: '11-Jul-2014 14:46:33'
                  FileSize: 287677
                    Format: 'png'
             FormatVersion: []
                     Width: 512
                    Height: 384
                  BitDepth: 24
                 ColorType: 'truecolor'
           FormatSignature: [137 80 78 71 13 10 26 10]
                  Colormap: []
                 Histogram: []
             InterlaceType: 'none'
              Transparency: 'none'
    SimpleTransparencyData: []
           BackgroundColor: []
           RenderingIntent: []
            Chromaticities: []
                     Gamma: []
               XResolution: []
               YResolution: []
            ResolutionUnit: []
                   XOffset: []
                   YOffset: []
                OffsetUnit: []
           SignificantBits: []
              ImageModTime: '16 Jul 2002 16:46:41 +0000'
                     Title: []
                    Author: []
               Description: 'Zesty peppers'
                 Copyright: 'Copyright The MathWorks, Inc.'
              CreationTime: []
                  Software: []
                Disclaimer: []
                   Warning: []
                    Source: []
                   Comment: []
                 OtherText: []

See it yet? No?

Look at the Format field:

info.Format
ans =

png

The function imfinfo is claiming this this JPEG file is really a PNG file, which is a completely different image file format!

So what's going on here? Is this a JPEG file or not?

This trick question is really just an excuse to peek under the hood of imread to see how an interesting piece of it works. (Well, it's interesting to me, at least.)

Before opening the hood, though, let's try reading one more file. And notice that this filename has an extension that has nothing to do with any particular image format.

url2 = 'http://blogs.mathworks.com/images/steve/2014/peppers.fruit_study_2014_Jul_11';

rgb2 = imread(url2);
imshow(rgb2)

OK, so imread can successfully read this image file even without an extension indicating its format.

If you're curious, a lot of the imread code that makes all this work is available for you to look at in your installation of MATLAB. (If, on the other hand, you're not curious, then this would be a good time to go over and read Cleve's blog instead.) For example, you can view the source code for imread in the MATLAB Editor by typing edit imread. (Please don't modify the code, though!)

Here's a partial fragment of code near the top:

if (isempty(fmt_s))
    % The format was not specified explicitly.
    ... snip ...
    % Try to determine the file type.
    [format, fmt_s] = imftype(filename);

Hmm, what's that function imftype?

which imftype
'imftype' not found.

It doesn't appear to exist!

It does exist, but it happens to be a private function. The function which will find it if you tell which to look at little harder.

which -all imftype
/Applications/MATLAB_R2014a.app/toolbox/matlab/imagesci/private/imftype.m  % Private to imagesci

Even though you can't directly call this function (that's what private means here), you can still look at it in the MATLAB Editor by typing edit private/imftype.

Here's some code from near the beginning of imftype.

idx = find(filename == '.');
if (~isempty(idx))
    extension = lower(filename(idx(end)+1:end));
else
    extension = '';
end
% Try to get useful imformation from the extension.
if (~isempty(extension))
    % Look up the extension in the file format registry.
    fmt_s = imformats(extension);
    if (~isempty(fmt_s))
        if (~isempty(fmt_s.isa))
            % Call the ISA function for this format.
            tf = feval(fmt_s.isa, filename);
            if (tf)
                % The file is of that format.  Return the ext field.
                format = fmt_s.ext{1};
                return;
            end
        end
    end
end

In English: If the filename has an extension on it, use the imformats function to get a function that can test to see whether the file really has that format.

So what's that new function imformats in the middle there? Well, you can call this one directly. Try it.

s = imformats
s = 

1x19 struct array with fields:

    ext
    isa
    info
    read
    write
    alpha
    description

That output is not very readable. If we were designing this today, we'd probably make imformats return a table. Fortunately we've got an easy way to convert a struct array into table!

t = struct2table(s)
t = 

       ext          isa         info          read         write      alpha
    __________    _______    ___________    _________    _________    _____

    {1x1 cell}    @isbmp     @imbmpinfo     @readbmp     @writebmp    0    
    {1x1 cell}    @iscur     @imcurinfo     @readcur     ''           1    
    {1x2 cell}    @isfits    @imfitsinfo    @readfits    ''           0    
    {1x1 cell}    @isgif     @imgifinfo     @readgif     @writegif    0    
    {1x1 cell}    @ishdf     @imhdfinfo     @readhdf     @writehdf    0    
    {1x1 cell}    @isico     @imicoinfo     @readico     ''           1    
    {1x2 cell}    @isjp2     @imjp2info     @readjp2     @writej2c    0    
    {1x1 cell}    @isjp2     @imjp2info     @readjp2     @writejp2    0    
    {1x2 cell}    @isjp2     @imjp2info     @readjp2     ''           0    
    {1x2 cell}    @isjpg     @imjpginfo     @readjpg     @writejpg    0    
    {1x1 cell}    @ispbm     @impnminfo     @readpnm     @writepnm    0    
    {1x1 cell}    @ispcx     @impcxinfo     @readpcx     @writepcx    0    
    {1x1 cell}    @ispgm     @impnminfo     @readpnm     @writepnm    0    
    {1x1 cell}    @ispng     @impnginfo     @readpng     @writepng    1    
    {1x1 cell}    @ispnm     @impnminfo     @readpnm     @writepnm    0    
    {1x1 cell}    @isppm     @impnminfo     @readpnm     @writepnm    0    
    {1x1 cell}    @isras     @imrasinfo     @readras     @writeras    1    
    {1x2 cell}    @istif     @imtifinfo     @readtif     @writetif    0    
    {1x1 cell}    @isxwd     @imxwdinfo     @readxwd     @writexwd    0    


               description            
    __________________________________

    'Windows Bitmap'                  
    'Windows Cursor resources'        
    'Flexible Image Transport System' 
    'Graphics Interchange Format'     
    'Hierarchical Data Format'        
    'Windows Icon resources'          
    'JPEG 2000 (raw codestream)'      
    'JPEG 2000 (Part 1)'              
    'JPEG 2000 (Part 2)'              
    'Joint Photographic Experts Group'
    'Portable Bitmap'                 
    'Windows Paintbrush'              
    'Portable Graymap'                
    'Portable Network Graphics'       
    'Portable Any Map'                
    'Portable Pixmap'                 
    'Sun Raster'                      
    'Tagged Image File Format'        
    'X Window Dump'                   

Now we've gotten to some interesting stuff! This table represents the guts of how imread, imfinfo, and imwrite knows how to deal with the many different image file formats supported.

If you pass an extension to imformats, it looks through file formats it knows about to see if it matches a standard one.

imformats('jpg')
ans = 

            ext: {'jpg'  'jpeg'}
            isa: @isjpg
           info: @imjpginfo
           read: @readjpg
          write: @writejpg
          alpha: 0
    description: 'Joint Photographic Experts Group'

Let's go back to the original file peppers.jpg and consider what happens in the code we've seen so far.

1. We did not specify the format explicitly (with a 2nd argument) when we called imread, so imread called imftype to determine the format type.

2. The function imftype found an extension ('.jpg') at the end of the filename, so it asked the imformats function about the extension, and imformats returned a set of function handles useful for doing things with JPEG files. One of the function handles, @isjpg, tests to see whether a file is a JPEG file or not.

To be completely truthful, @isjpg just does a quick check based only on the first few bytes of the file. Look at the code by typing edit private/isjpg. Here are the key lines.

fid = fopen(filename, 'r', 'ieee-le');
assert(fid ~= -1, message('MATLAB:imagesci:validate:fileOpen', filename));
sig = fread(fid, 2, 'uint8');
fclose(fid);
tf = isequal(sig, [255; 216]);

OK, I have now taught you enough so that you can thoroughly confuse imread if you really want to. But don't you already have enough hobbies?

3. In this case, the file wasn't actually a JPEG file, so the function handle @isjpg returned 0 (false).

That brings us to the rest of the excitement in imftype.

% Get all formats from the registry.
fmt_s = imformats;
% Look through each of the possible formats.
for p = 1:length(fmt_s)
    % Call each ISA function until the format is found.
    if (~isempty(fmt_s(p).isa))
        tf = feval(fmt_s(p).isa, filename);
        if (tf)
            % The file is of that format.  Return the ext field.
            format = fmt_s(p).ext{1};
            fmt_s = fmt_s(p);
            return
        end
    else
        warning(message('MATLAB:imagesci:imftype:missingIsaFunction'));
    end
end

In English: For every image file format we know about, run the corresponding isa function handle on the file. If one of the isa functions returns true, then return the corresponding set of information from imformats.

Back to the story for our misnamed image file peppers.jpg. The @isjpg function handle returned false for it. So imftype then tried the isa function handles for every image file format. One of them, @ispng, returned 1 (true). That information was passed back up to imread, which then read the file successfully as a PNG, which was the file's true format type.

Finally, here's what happened for the image file peppers.2014_Jul_11. When imftype passed the extension '2014_Jul_11' to imformats, no such image format extension was found, so imformats returned empty. That caused imftype to go into the code that simply tried every image format it knew about, which again worked when it got to PNG.

Phew! That's the story of the effort imread makes to read your images in correctly.

For the three of you that are still reading along, I'll send a t-shirt to the first one to post a convincingly complete explanation of this line of code from above:

extension = lower(filename(idx(end)+1:end));


Get the MATLAB code

Published with MATLAB® R2014a

14 CommentsOldest to Newest

extension = lower(filename(idx(end)+1:end));

This line takes the index of filename at the last location indicated by idx (which is looking for any ‘.’ in the string). ‘end’ is used just in case there are multiple ‘.’ in the string. The index is moved 1 + the location of the last ‘.’ presuming it is the end of the file name just before the file extension and goes to the end of the string. It is finally lower-cased to account for a user indicating upper-case characters in the string which would not arise in file extension. Ultimately, this line attempts to extract whatever is located at the end of the string, assuming that the file extension will follow the last ‘.’ and should be lower-cased.

Steve,

I really enjoyed this blog post. I think it’s a great example of how to account for the variability in user input.

Hey there Steve, I read it :)

The extension is grabbed from one character beyond the last period to the end of the filename.

Cheers,
Sven.

John—Works for me! We do find that some programs write out image files using all-caps for the format extension, such as myfile.JPG. That’s another reason for the call to lower.

Oh, and to ensure completeness, the extension is forced to lowercase since a .JPG file is really just a .jpg file. The idx variable is an index into every period character in the filename :)

Steve-

I thought you said a T-shirt, not a tie :-)

FYI, I do think fileparts was already in MATLAB then but I am not positive. It definitely existed well before the VMS platform went away.

–Loren

Using |fileparts| is probably better than the manual approach using |find|. Both take two statements to generate the lower-case file extension so there’s no benefit to choosing one over the other from a simplistic “lines of code” point of view. On the other hand, the tilde notation to ignore function return values was not available in a public release until R2009b so if you’re targeting prior releases you have to use the some variation of the opaque construction

  [ext, ext, ext] = fileparts(filename);

I still do that in much of my code, albeit with a Code Analyzer message suppression token at the end.

Still, you *could* make the |find|-based solution slightly more palatable by calling |find| as

  idx = find(filename == '.', 1, 'last');

That way the extension extraction becomes

  ext = lower(filename(idx + 1 : end))

which removes one level of subscripting.

Oh, and I forgot the particular semantics of |fileparts|’s extension return value. The extension string always begins with a dot (period), so if you use |fileparts| the complete construction becomes something like

  [~, ~, ext] = fileparts(filename);
  if ~ isempty(ext),
     ext = lower(ext(2 : end));
  end

These postings are the author's and don't necessarily represent the opinions of MathWorks.