Loren on the Art of MATLAB

September 27th, 2006

Evolution of the Function isfield

Today I want to talk a bit about the function isfield and some of the history of its implementation. isfield has been built into MATLAB for a few releases now. We felt it was important to do this after talking to some customers with customers who had struct arrays with a large number (10s of thousands) of fields, and they found the performance too slow. Let's see how isfield used to look as an M-file and see some of the studies we did as we investigated how to proceed.

Contents

Original Code

Let me first show you the original code.

dbtype isfieldOrig
1     function tf = isfieldOrig(s,fn)
2     %ISFIELDORIG True if field is in structure array.
3     %   F = isfieldOrig(S,'field') returns true if 'field' is the name of a field
4     %   in the structure array S, otherwise it returns false.
5     
6     if isa(s,'struct')  
7       tf = any(strcmp(fieldnames(s),fn));
8     else
9       tf = false;
10    end

The algorithm is straight-forward and pretty easy to read. Only return true for structures and only then if the requested field exists. So what's the problem? As I mentioned earlier, this code was slow for structures with a large number of fields. Why? First, there was getting all the fieldnames from the structure, and then comparing each fieldname to the requested one. If there are lots and lots of fields, there needs to be lots and lots of comparisons.

Speedier with try

So I figured, what if I just try to address the structure field and return my success or failure from that? Here's isfieldTry.

dbtype isfieldTry
1     function tf = isfieldTry(s,f)
2     %ISFIELD True if field is in structure array.
3     %   F = ISFIELDTRY(S,'field') returns true if 'field' is the name of a field
4     %   in the structure array S.
5     %
6     %   See also GETFIELD, SETFIELD, FIELDNAMES.
7     
8     tf = true;
9     try  
10      tmp = s.(f);
11    catch
12        tf = false;
13    end
14    
15    

This approach was much faster for structures with lots of fields. But I missed something when I did this. And that was it might return true for objects, if the class designed allowed users to use dot notation to address some aspects of the object.

Return false for Objects

Next I added some logic to see if the first input is actually a struct. If not, return false straight away.

dbtype isfieldReturnBeforeTry
1     function tf = isfieldReturnBeforeTry(s,f)
2     %ISFIELD True if field is in structure array.
3     %   F = ISFIELDRETURNBEFORETRY(S,'field') returns true if 'field' is the name of a field
4     %   in the structure array S.
5     %
6     %   See also GETFIELD, SETFIELD, FIELDNAMES.
7     
8     tf = false;
9     if ~isstruct(s)
10        return
11    end  
12    try  
13      tmp = s.(f);
14      tf = true;
15    catch
16    end
17    
18    

As you can see, the function is now getting a little bit more complicated than I originally thought would be required.

Speedier with eval, but Ugly

I next saw a user substitute this code. Any guesses what I think about this? Here's one article I've written about eval previously.

dbtype isfieldEval
1     function tf = isfieldEval(s,f)
2     %ISFIELD True if field is in structure array.
3     %   F = ISFIELDEVAL(S,'field') returns true if 'field' is the name of a field
4     %   in the structure array S.
5     %
6     %   See also GETFIELD, SETFIELD, FIELDNAMES.
7     
8     tf = true;
9     eval('tmp = s.(f);','tf = false;');
10    
11    

Correct Behavior with Decent Speed

Well folks, the new M-files still don't have the right behavior in MATLAB even though the answers are correct! If I use a version of the M-file that uses the try-catch syntax, and if the field does not exist, then the state of lasterror will be changed, even though the isfield* collection of functions never actually results in a user error, but rather always returns true or false. Let me show you what I mean.

lasterror reset % reset lasterror, just in case
isfieldTry(17,'fred')
lerr = lasterror
disp(lerr.message)
ans =

     0


lerr = 

       message: 'Attempt to reference field of non-structure array.'
    identifier: 'MATLAB:nonStrucReference'
         stack: [5x1 struct]

Attempt to reference field of non-structure array.

Now suppose that we want to use isfield in a larger application where I will need to sometimes check for errors. Depending on how I've written the program, I might well get a false positive error message. And I don't want that.

What I really need to do is to get the state of lasterror before I do any work, and reset its state to the previous error state after I finish. Finally, here's the code to do this.

dbtype isfieldLasterror
1     function tf = isfieldLasterror(s,f)
2     %ISFIELD True if field is in structure array.
3     %   F = ISFIELDLASTERROR(S,'field') returns true if 'field' is the name of a field
4     %   in the structure array S.
5     %
6     %   See also GETFIELD, SETFIELD, FIELDNAMES.
7     
8     tf = false;
9     if ~isa(s,'struct')
10        % non-structs shouldn't have exposed fields unless specific objects
11        % overload isfield for themselves
12        return  
13    end  
14    try
15      l = lasterror;  % capture current error state
16      tmp = s.(f);
17      tf = true;
18    catch
19      lasterror(l);  % reset previous error state
20    end
21    
22    

Result

In the end, we chose to build isfield into MATLAB where we have access to the internal data structures. At that point, it is very quick and easy to ascertain if the first input is a structure. If so, it's a quick hash table lookup to see if the requested fieldname exists and to return the answer without disrupting the error state.

Do you have functions that alter a state or property for expediency and then need to reset it? Have you remembered to reset it? I'd love to hear your thoughts.


Published with MATLAB® 7.3

10 Responses to “Evolution of the Function isfield”

  1. Duane Hanselman replied on :

    I never liked the original isfield because it didn’t need the if statement. Just combine the if test with the logical statement:

    tf = isa(s,’struct’) && …
    any(strcmp(fieldnames(s),fn));

    I am glad to see isfield as a built in, so this is gone.

    The other one that puzzles me is

    if ~isempty(msg)
    error(msg)
    end
    This one used to appear in dozens of places and perhaps still does. It can be simply replaced by

    error(msg)

    since error does nothing with an empty argument.

    Thanks for doing this blog. It is a wonderful educational tool.

  2. Oliver A. Chapman, PE replied on :

    Loren,

    Thanks for sharing the evolution of this code, it is a good series of examples on programming style.

    However, I don’t like the two exit paths in your last bit of code. I would have used an if / else / end approach.

    It seems like Duane’s code is even simpler and just as easy to follow. Although, maybe it has the same performance issues that you discussed.

    But, in answer to your question, I’ve been lucky in that I’ve been able to come down on the side of easy to read code and sacrifice performance. So, I would have used the first set of code and looked for other ways to get the performance like faster or more computers, compiling the code, breaking the problem up into smaller pieces or hiring more people to manage the running. So, no, I haven’t had to save and reset this type of variable, yet.

    But still, I’m happy to see these examples.

  3. Loren replied on :

    Duane, Charles-

    Thanks to both of you for great comments.

    The reason we had the initial if statement in the original isfield was so that if/when MATLAB allows subclassing of built-in classes, such as struct, an object derived from struct might return true for a field that was not meant to be public, unless the class designer also overloaded isfield.

    The reason for the two exit paths is to not waste any time doing calculations for the non-struct if it was an object were subsref had been overloaded to allow dot-referencing. If that happened, I wanted to be sure the class designer had to intentionally overload isfield to be sure which fields to make public. It’s possible that this is overkill. But it does guarantee that the M-file version behaves identically to the built-in one, and is exactly the design we intended.

    –Loren

  4. Loren replied on :

    Oops, I misunderstood Duane’s comments. You are right that the statements could be combined.

    –Loren

  5. Gary Roth replied on :

    Loren,
    Is the rewrite of isfield something that is new for 2006b, or was it implemented earlier?

  6. Loren replied on :

    Gary-

    I think it was built into R14sp3.

    –Loren

  7. Priyantha replied on :

    When I tried to assign a file using the following sequence, I got the message
    Attempt to reference field of non-structure array.

    load M:\WCCT_05.dat;
    wc=WCCT_05.dat

    What does this mean?

  8. Loren replied on :

    Priyantha-

    This question is not relevant to this article. Please contact technical support.

    –Loren

  9. Kieran Parsons replied on :

    Hi Loren,

    I came across this old post while trying to figure out why isfield() does not work on objects in 2009a. I think that since fieldnames() does work on an object that isfield() should do as well (without needing to overload isfield for the specific classes). I have added a feature request. In the meantime I came up with the following, based on your code above, while also adding cell field support:

    function tf = isfield2(s, field)
    %ISFIELD2 Adds object support to isfield.
    %   TF = ISFIELD2(S, FIELD) checks if FIELD is a field of S (a structure or an object).
    %
    %   See also ISFIELD.
    %
    if isstruct(s)
      tf = isfield(s, field);
    elseif isobject(s)
      tf = false;
      if iscell(field)
        tf = false(size(field));
        for idx = 1:numel(tf)
          try
            l = lasterror;
            tmp = s.(field{idx});
            tf(idx) = true;
          catch
            lasterror(l);
          end
        end
      else
        try
          l = lasterror;
          tmp = s.(field);
          tf = true;
        catch
          lasterror(l);
        end
      end
    end
    end
    

    While writing this I also found out that true/false can take a size argument which I never knew before.

    Can you please point me to how to provide formatted code in these blog posts. I could not find this info on the blog page.

    Thanks,
    Kieran

  10. Loren replied on :

    Kieran-

    Use the tags pre and /pre inside angle brackets to denote preformatted code. I fixed that for your last comment.

    -Loren

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Jun: I totally can not believe it, Loren. You are really helpful. Thank you so much, MATLAB master!
  • Loren: Wow folks- Always lots of interest when there’s a quickie to try out! I will only make 2 general...
  • Loren: Jun- ismember is your friend here: >> [aa,ind] = ismember(Array2,Arra y1) aa = 1 1 1 1 1 1 1 ind = 1 2 1 4 4 3...
  • Dan: I like the first way better than the second way. Combining the arrays into one and running any is nice, although...
  • James Myatt: How about I = (a == 0 | b == 0); a(I) = []; b(I) = [];
  • Tunc: Hello Loren, love your blog because of such inspiring and challenging comments to such ’small’...
  • Pekka Kumpulainen: Here is my tradeoff. I usually want to keep the original variables as they are most probably...
  • Iain: Followup: Of course, to allow NaNs (counting them as non-zero): mask = (a~=0) & (b~=0); The mask says “a...
  • Matt Fig: I would usually go with something like this: y = a&b; x = a(y); y = b(y); But I was surprised to find...
  • kk: c=all([a;b]) a(c) a(b)

These postings are the author's and don't necessarily represent the opinions of The MathWorks.