Loren on the Art of MATLAB

Working with Arrays of Structures 30

Posted by Loren Shure,

Though I have covered this topic somewhat in the past, it seems like a good time to refresh the information. There are recent posts on the MATLAB newsgroup relating to this topic such as this one.

Contents

Original Question

Suppose I have a structure array and want to find all the entries with a value of 4, without looping, something like [m,n] = find(f.h == 4).

f(1).h = [1 2 3 4];
f(2).h = [5 6 7 8];
try
 [m,n] = find(f.h == 4);
end

Why can't I use the find statement directly? Let's take a look at the error message to understand.

lerr = lasterror;
disp(lerr.message)
Error using ==> eq
Too many input arguments.

Too many input arguments? What is f.h? For that matter, what exactly is f again?

f
f = 
1x2 struct array with fields:
    h

f is a struct array, and f.h is a comma-separated list.

f.h
ans =
     1     2     3     4
ans =
     5     6     7     8

Alternatives

To turn this list into a MATLAB construct I can use, I'd normally either wrap it inside [] or {}. If I wrap f.h inside [], I lose the information about what is in the first element of f and what is in the second.

[f.h]
ans =
     1     2     3     4     5     6     7     8

Wrapping f.h inside {}, I have a cell array to work with.

{f.h}
ans = 
    [1x4 double]    [1x4 double]

I still can't immediately use find or numeric functions on this array.

try
    [m,n] = find({f.h} == 4);
end
lerr = lasterror;
disp(lerr.message)
Error using ==> evalin
Undefined function or method 'eq' for input arguments of type 'cell'.

Solution

What I'd like is a way to work with my struct, without writing too much code, without looping, that is ideally a pattern I can reuse as my problem evolves. This is exactly what arrayfun was designed to help with. It works on each element of an array, and I need to just tell it what I want to operate on one element, as well as telling arrayfun what array to work on.

Let's first find the values in the struct array f equal to 4. Since I have 2 arrays embedded in f, and they may each have different numbers of outputs, I have to clearly state that the outputs need to go into a cell array.

[m,n] = arrayfun(@(x)find(x.h==4),f,'uniformoutput',false)
m = 
    [1]    [1x0 double]
n = 
    [4]    [1x0 double]

This becomes even more obvious if I can another array, g that is even less "regular" than f.

g = f;
g(3).h = [1 2 17 4];
g(4).h = [1 3 17 5 9 17];
[mg,ng] = arrayfun(@(x)find(x.h==17),g,'uniformoutput',false)
mg = 
    [1x0 double]    [1x0 double]    [1]    [1x2 double]
ng = 
    [1x0 double]    [1x0 double]    [3]    [1x2 double]

Some problems are more benign however and it would be wasteful to return results in a cell array and then have to unpack them into a numeric array, for example, the function max, which generally has a single value as the result.

[minval,idx] = arrayfun(@(x)max(x.h),f)
[minval,idx] = arrayfun(@(x)max(x.h),g)
minval =
     4     8
idx =
     4     4
minval =
     4     8    17    17
idx =
     4     4     3     3

Related Topics

Here are some links to related blogs and MATLAB reference pages.

Your Thoughts

  • Do you use struct arrays?
  • If yes, do you use arrayfun, or do you use loops? Whichever your choice is, can you say more about why it's your choice?
  • Do you avoid struct arrays all together and use something else? If so, what data representations do you use instead?

Let's see your feedback here.


Get the MATLAB code

Published with MATLAB® 7.3

30 CommentsOldest to Newest

In general, is arrayfun faster than looping? It seems to me that arrayfun would have to perform a loop anyway.

Jessee-

arrayfun can be much faster than looping. First, results are automatically preallocated in size. Second, the loop inside means that MATLAB doesn’t reinterpret the statement each time and pieces of the code can’t have side effects that change results for the rest of the array. arrayfun can take advantage of this in ways that it would be hard or impossible to do in a for loop.

–Loren

Loren,
Actually the area where I tend to run into the most problems is in constructing my structure array. I have a situation where I get back a collection of possible options (e.g.
S.filtSize=[10,20,30];S.name={‘john’,'george’};
S.type={@hann};

I spent quite a while trying to find a non loop method for creating a structure array with all possible combinations of settings (taking one from each category)
example: S(1)=struct(‘filtSize’,10,’name’,john’,'type’,@hann);
S(2)=struct(‘filtSize’,20,’name’,john’,'type’,@hann);…

This is a simple example but in the where creating the struct array from a cell array would work, but in actuality I have some 10 fields with any number of entries of various types in each field. I’ve been looking at this post trying to find a way to use the techniques you describe to address this without any luck… Any thoughts?

how about:

[m,n] = find(cat(1,f.h) == 4)

cat(1,f.h) creates an array by concatinating the two lists produced by f.h on the first axis (vertically).

Reza-

That works fine if all the original arrays are the same length. But that’s not true for the extended example with g.

Then I get this error message:

??? Error using ==> cat
CAT arguments dimensions are not consistent.

–Loren

Dan-

I am not sure that I completely understand your problem. But here’s an idea to start you off. Try using ndgrid to get indices for your various fields.

S.filtSize=[10,20,30];
S.name={’john’,'george’};
S.type={@hann};
[filtInd, nameInd, typeInd] = ndgrid(length(S.filtSize),length(S.name),length(S.type);

filtInd =
     1     1
     2     2
     3     3
nameInd =
     1     2
     1     2
     1     2
typeInd =
     1     1
     1     1
     1     1

Using triplets of those indices gets you all the combinations of inputs.

–Loren

Loren,

This column introduced me to the “arrayfun,” et al. and helped me see a potential use for function handles. After reviewing the documentation, I can follow part of what’s going on.

At the end of your first example, we have two cell array variables, m & n. The entire purpose of this batch of code is to generate the indices that will allow subsequent code to find the values indicated. Based on your example structure arrays, I expect the output for the first example to be something like:

1, 4

And, for the second:

1, 3
2, 3
2, 6

But, when I run your examples, I don’t know where to find these indices. I mean, I’ve poked around in these output cell arrays and I can find the values I know should be someplace. But I don’t understand why they ended up where they are and why some of the cells are empty.

Maybe you can add a bit more detail about what is happening. Maybe you can write a batch of code using for loops that produces the same results so we can see the data flow.

However, even if I did understand all the details, I’m not sure if I’d ever use this technique. All of us here who looked at this don’t think our coworkers would follow what is going on.

Oliver-

Here’s the equivalent for-loop if f is a vector:

for i=1:numel(f)
   [ml{i} nl{i}] = find(f(i).h == 4)
end

isequal(m,ml)
isequal(n,nl)

–Loren

You have got to be kidding ! This type of thing is why Matlab scares most of my students, undergrads and postgrads.

> What specifically are you reacting to here?
Pretty much everything in the above.

Firstly error messages that give you only the barest of clues as to why your code didn’t work …. and no help as to where to find the right way of doing it.

Error using ==> evalin
Undefined function or method ‘eq’ for input arguments of type ‘cell’.

And then the most arcane unintuitive code one could possibly imagine ….
[mg,ng] = arrayfun(@(x)find(x.h==17),g,’uniformoutput’,false)

When a language gets this arcane it’s time to find another language.

And I’m a Matlab fan !

Tim-

The error from evalin is an artifact of publishing. Running the code without publishing yields this message:

Undefined function or method 'eq' for input arguments of type 'cell'.

I realize that’s only a small portion of your comments but thought it was worth clarifying here.

–Loren

Thanks for the clarification Loren, but let’s not miss the wood for the trees shall we ? :-)

The major issue here is that things that intuitively (to the average user) should just ‘work’, instead require the most arcane piece of code, that a miniscule fraction of Matlab users will hit upon.

It might offend the purists (0.00001% of Matlab users ?), but I’m of the opinion that stuff like ….
find(f.h == 4)
… should just ‘work’. Please make it so.

Matlab is about useability, not adherence to arcane rules that only computer scientists understand.

I feel similarly about the move towards object-oriented programming in Matlab. As a biomedical scientist who at one time regularly programmed in C, I have never written a piece of production code in C++ – despite having learned how to do so. It’s all well and good to try to hook professional programmers, but if Matlab comes to be perceived by scientists in the same way as C++, you will have lost the plot.

Loren:

My question concerns the conversion between an array of structures and a structure of arrays.

For purposes of my work, I often work with large arrays of structures, in order to keep information pertaining to a particular object collected and easily accessible. With this, however, the structural architecture is often quite extensive, with many nested structures, which are typically 1-by-1 (e.g., S(100).s1.s2.s3.s4.s5.s6 etc.).

While this is effective for moving and sorting data particular to a given object, when I want to perform some kind of batch analysis off a particular field across all objects (e.g., S(1:1000).s1.s2.s3.s4.s5.s6), I often want to convert the array of structures into a structure array (e.g., S(1:1000).s1.s2.s3.s4.s5.s6 –> V.s1.s2.s3.s4.s5.s6(1:1000, :)). Due to large data sets and often thousands of independent objects associated with the same fields, ‘for’ loops, in which I grab the particular field values I want and put in another array, continuously doing so for all objects, are not an efficient or reasonable choice, due to slow computation time and intensive memory allocation (e.g. a 1000 objects, data sets of 400,000 pts, and 50 different field values to compare and filter).

Thus, I have managed to write rudimentary code to do this for known fields and a known number of levels, but I am stuck in crafting a generic algorithm for unknown fields and an unknown number of levels.

I was curious if you might have any guidance in this matter, concerning converting arrays of structures to structure arrays. Any conceptual advice would be greatly appreciated.

Thanks.

-Kris

Kris-

I have no sage advice for you. You might get somewhere with struct2cell and perhaps permute. There also might be some tools on the File Exchange that could help.

One thing to ask yourself is if you really need to convert to the second deeply nested struct. Or can you operate on the extracted values and put them into some other data format.

–Loren

Loren,

I’m trying to match a string, coming from varargin, against the “name” field of an array of structs I created. All names have different lengths.

At first I tried the suggestion reza provided (comment nr 4), but as you said in your reply (comment nr 5), this doesn’t work because of the different lengths. So I switched to the arrayfun code sample you provided

However, it keeps giving me the following error:

??? Error using ==> eq
Matrix dimensions must agree.

Which points to the arrayfun function call

Below is the code I’m using

all = [struct('name', 'HOLD_OFF', 'check', @(x)islogical(x)||isnumeric(x))
   struct('name', 'FIGH', 'check', @isnumeric)
   struct('name', 'S', 'check', @ischar)
   struct('name', 'LINE_DIR', 'check', @ischar)];

c = arrayfun(@(x)find(x.name == varargin{1}), all, 'uniformoutput', false);

varargin{1} contains the value ‘FIGH’

I’ve been staring at this error and googling for an answer for quite some time, but I still haven’t found what is causing the error.

Do you have any idea what is causing this error?

Thanks

-Thomas

Thomas-

Not sure exactly because I don’t see your workspace and variable types. You might be better off doing string comparisons with strcmp though. I think the error is comparing FIGH with HOLD_OFF which are different lengths is == can’t do that.

–Loren

Loren-

using strcmp instead of == did the tric. Thank you very much! Eventually I switched to the strfind function that you talked about in one of your other blog posts.

Is there a reason why Matlab can’t handle strings of different lengths when comparing them to each other? As far as I know no other programming language has this issue too.

-Thomas

Thomas-

We *could* make == work for strings but then some other nice properties allowing mixing of strings and doubles wouldn’t work. == will only work on arrays of the same size, or if one of them is a scalar value.

–Loren

Loren -

is there an elegant way to transform a Nx1 array of structures into a Nx1 cell array, each cell containing one structure?

Thanks – Emil

Emil-

struct2cell should get you part of the way there – but not the data not arranged as a struct per se. You can use a for loop. Or cellfun (which effectively has a loop inside it).

–Loren

Hi Loren

I am beginning to learn Matlab and I have a question related to this post. Using your example, if i wanted to, first, find certain values in a struct and then replace them by another value, say 0. So the first thing would be like this:

f(1).h = [1 2 3 4];
f(2).h = [5 6 3 8];
[m,n] = arrayfun(@(x)find(x.h==3),f,'uniformoutput',false)

Now that i have the indexes, how can I replace those “3″ by, for example, 0? I mean without using loops. Thanks a lot in advance.

Miguel-

How large is f? I wonder if the code would be clearer if you wrote the for-loop.

One way is to convert the struct to a cell and work on each cell. Then convert back to a struct. Not sure this is worth the effort however.

Others?
–loren

Loren,

struct arrays seem very unwieldy, they are hard to create via preallocation, and seem impossible to merge without looping, e.g. if you have a struct array O where O(i) is combination of the two structs a and b, then there is little hope of creating O(i) = mergestruct(a,b) (substitute for mergestruct your favourite vertcat function for structs here).

It would be great if mathworks provided some more tools for struct arrays, a separate class name, detection via is* functions, and vertcat methods, easier syntax for creating (preallocation especially), etc.

Best, HH

HH,

Thanks for sharing your thoughts. I have made sure they got into the bug/enhancement list.

–Loren

Hi Loren,

Can you tell me how to use the arrayfun to locate values that match entire rows in data.

i.e.

If I do this, then I get….
??? Error using ==> eq
Matrix dimensions must agree.

Error in ==> @(x)find(x.h==A)

Thanks,

Anna

Sorry here is the code…..

A = [1 2 3; 1 2 4]

f(1).h = [1 2 5; 2 5 6; 1 2 3; 3 5 6]
f(2).h = [1 2 6; 3 5 6; 1 5 3; 3 5 6]
f(3).h = [1 2 4; 3 4 5; 1 1 3; 3 4 6]

[m,n] = arrayfun(@(x)find(x.h==A),f,'uniformoutput',false)

Hi Loren,

I get you on how struct array references like, temp(3:5).A, return coma lists… and, like you, don’t get the various confusions this seems to cause.

But I’m trying to understand an inconsistency (BUG?) in this behavior:

[temp(false)] = nan or [temp(4:3)] = nan

correctly, does nothing when temp is a regular array, BUT when temp is a structure containing a field “A”:

[temp(false).A] = nan or [temp(4:3).A] = nan

(Which looks look me like an IDENTICAL case given, the comma list interpretation)

gives an error about “too many elements” on the right side.

Is there a rational reason for this inconsistency that you can help me follow… or is it a BUG? It certainly complicates code to special-handle the null case.

Or, am I missing a shortcut there?

Thanks,
Darin

Darin-

I am not sure. I would recommend contacting support and reporting the behavior as a bug, which it seems like to me. But if I heard a reason that made sense, I might be swayed the other way.

–Loren

These postings are the author's and don't necessarily represent the opinions of MathWorks.