This post continues in the theme of other recent ones, questions frequently posed to me (and probably others). It has to do with initializing structures. There is a healthy set of posts on the MATLAB newsgroup devoted to this topic. So let's peel things apart today.
Contents
Structures - Mental Model
It first helps to understand how MATLAB treats structures and their fields. First clear the workspace.
clear variables close all
Let's just start with a scalar structure.
mystruct.FirstName = 'Loren';
mystruct.Height = 150mystruct =
FirstName: 'Loren'
Height: 150
Each field in the structure mystruct appears to be a separate MATLAB array. The first one, FirstName, is a string of length 5, and the second, height, is a scalar double (in cm, for those who are paying attention to units).
I can add another field and its contents can be the contents of any valid MATLAB variable. Each field is independent in size and datatype.
Array of structs
Suppose I want to extend this array to include other people and measurements. I can grow this array an element at a time.
mystruct(2).FirstName = 'Fred';
mystruct(2)ans =
FirstName: 'Fred'
Height: []
You can see here that since the field Height does not yet have a value, its value is set to empty ([]).
Don't Grow Arrays
Over the years, we have learned that growing arrays is a poor use of resources in MATLAB and that preallocation is helpful in terms of both not fragmenting memory and not spending time looking for a large enough memory slot. So, if I know I want to have 100 names in my struct, I can initialize the struct to be the right size. I may or may not feel the need to initialize the contents of the struct array however, since each field element is essentially its own MATLAB array.
How to Initialize a struct Array
Here are 2 ways to initialize the struct.
mystruct(100).FirstName = 'George';With this method, we can see that elements are filled in with empty arrays.
mystruct(17)
ans =
FirstName: []
Height: []
There's another way to initialize the struct and that is fill it with initial values.
If we were building our struct with the 5 sons of George Forman, we might create it like this.
georgeStruct = struct('FirstName','George','Height', ... {195 189 190 194 193})
georgeStruct =
1x5 struct array with fields:
FirstName
Height
Looking at the contents of georgeStruct we see that his sons are all named George
{georgeStruct.FirstName}ans =
'George' 'George' 'George' 'George' 'George'
and I made up their heights
[georgeStruct.Height]
ans = 195 189 190 194 193
To see when and how to use cell arrays in the initialization, read the struct reference page carefully. If you want a field to contain a cell array, you must embed that cell inside another cell array.
Initializing the Contents
How important is it to initialize the contents of the struct. Of course it depends on your specifics, but since each field is its own MATLAB array, there is not necessarily a need to initialize them all up front. The key however is to try to not grow either the struct itself or any of its contents incrementally.
Your Use of Structures
What do you use structures for? Are you able to populate the contents of your struct up front? Or at least pin down the sizes early in your application? To tell me about your usage, please post details here.
Get
the MATLAB code
Published with MATLAB® 7.5



Some C++ types (e.g., std::vector) deal with allocation as follows: 1. The space allocated grows exponentially, so that the amortized cost of adding a single element is constant. 2. A reserve() function is provided, letting you preallocate space without adding elements.
Any thoughts about adding this to Matlab?
Tom-
Can you explain what the specific benefit in MATLAB would be. Each structure member is its own entity. And the overall structure can be allocated all at once.
–Loren
Loren,
Would I be correct in thinking that a struct is really an array of pointers? By comparison that would make an array of struct an array of arrays of pointers? What implications does this have in terms of the relative efficiency of nesting structs (for example: mystruct.Physiology.height) as opposed to setting up a struct that is entirely flat at the top level? The principal reason I’m asking is that I have an application which has modularly segmented data. As a simplified example, suppose I have a pipe with material characteristics that I need in one subroutine, temperature characteristics I need in another, and stress characteristics I need in a third. Is there a penalty to creating a struct with three sub-structs under it and just passing those sub-structs into the subroutines? Assume for argument’s sake the two cases where each substruct is either large or small, in terms of its contents. Does that change the answer? I hope that I’ve asked the questions clearly…
Thanks,
Dan
I’ve grown to really appreciate the power of structures recently. Your article about vectorizing access to an array of structures was incredibly helpful, though sometimes I don’t know whether or not I am using them most efficiently. It’s an ongoing problem in my code. I generally preallocated what I call the namespace of structures whose values and lengths will be unknown. For instance, I might say:
data = repmat(struct(’field1′,[],’field2′,[],…,’fieldN’,[]),1,M);
for M instances of N fields. This way, at least I know what the data structure is supposed to look like and how big it is supposed to be. Among other things, this helps me troubleshoot various aspects of the code, when fields are unpopulated when they should be, etc.
Other preallocation, such as x = zeros(1,L) for some loop that writes to x(1,a) for a = 1:L is more useful in terms of memory preallocation, and I find that structure name preallocation doesn’t seem to have a huge performance difference. Additionally, I have found that, in the first example, if ‘field1′ happened to be a scalar double, there doesn’t seem to be an advantage to preallocating the memory space (as opposed to the so-called name space) by saying:
data = repmat(struct(’field1′,zeros(1,1),’field2′,[],…,’fieldN’,[]),1,M);
I think my benchmarks on my code showed no performance increase (and possibly a performance decrement), and presumably the reason was because when it writes a single scalar value to that space, it has to write it twice when preallocated instead of once. I think in this case that operation actually ended up being more tedious for some reason.
It’s worth playing around with. Those were some of my findings on that particular issue, of course in some arbitrary case that I happened to be looking at. Perhaps it’s not generalizable. For instance, if one preallocates a vector as in the second example, such that x = zeros(1,L) for length L, if you write a vector ‘x’ later on in your code that assigns x all at once, then it seems to slow the code down once again because now MATLAB is writing a vector twice instead of just writing it once. This showed me that preallocation must be carefully considered in instances when one’s code is well-vectorized.
Shane-
Really nice point about initialization. I totally agree that it can go overboard. If you are not replacing just a few values, but replacing the whole array, then preallocating is potentially costly.
–Loren
Dan-
A struct is essentially an array of pointers to other MATLAB arrays. In nested structs, the nested levels might not all be the same, and they themselves are also arrays of pointers (all under the hood of course) to other MATLAB arrays. When you pass a struct into a function AND return it changed from that function, only the fields that got changed will have copies made. MATLAB treats each field separately and smartly does a lazy copy of the data, or copy on write. The only penalty in passing the sub-structs is the creation of an intermediate struct, but you are not copying the data at all (except for the pieces that change).
Not sure if that really answers you question. Please feel free to nudge me in a different direction if you need more information.
–Loren
Loren,
You came close to covering all the pieces of the question. To give a concrete example of the last piece of it:
Suppose I have:
a.b.c = ones(1000,1000);
a.b.d = a.b.c;
a.b.e = a.b.c;
Then I pass a.b into my function doStuff
a.b = doStuff(a.b,’c');
function out = dostuff(in,fieldname)
out = in; % Yes I know I’m not inplacing here
out.(fieldname) = sqrt(in.(fieldname);
In a case like this, what happens to a.b.d, and a.b.e? Do they get re-copied, or is Matlab smart enough to recognize that it’s the same thing as if I were at the top level of a structure? The other piece is wondering whether I’m paying a penalty for accessing a pointer to a pointer to a variable. Does this double (or worse) the memory access time in the look up process? Finally if an allocation changes, or i create a new field, in a low (deep?) level of the structure, do I pay an additional overhead as all of the layers of pointers above it need to be reallocated? I think I may not have been clear on that last question. Let me know if I should try to clarify further.
Thanks,
Dan
I’ve found that my code is easier to maintain if I pass all variables to & from functions. This makes it much easier to follow the data flow. But, this can lead to long variable lists in the function calls.
MatLab Data Structures are a handy way to collect associated groups of variables together. This can make the function calls easier to read, while still clearly showing the source & destination of the data.
Unfortunately, many novice MatLab users aren’t aware of MatLab Data Structures so they can be confused when I use them. So, I tend to restrict my use of them to places where the name of the structure can give the reader another cue to help them understand.
For example, maybe a function needs a series of defined constants, for example, length & weight. These could be packaged into a structure called const such that when they encounter:
const.length
or
const.weight
Novice users are likely to follow the syntax when presented with the additional cue in the name of the structure.
This appears to be similar to the design of the various dynamic system objects in the Control System Toolbox. For example, the SS objects are state space systems and appear to have the fields of A, B, C, D & E for the respectively named matrices that define such state space systems. Although I haven’t found these fields defined in the MatLab documentation, many a hacker like me has stumbled upon these fields and used it when, for example, access to the system matrix is required.
I often use structures to pass parameters from one function to another. This help keeping argument lists short. I only rarely use structures of another dimension than 1×1.
Further, when programming large projects, I use Matlab classes and objects. The way I access my objects is just like I access structures. One advantage of using classes and objects is that you can never accidently create a new field instead of replacing the value of an existing one. Also, you can organize functions into class directories. The only drawback is that accessing objects is far slower than accessing structures, even if they seems to be very similar.
Loren, maybe classes and objects could be worth another blog entry.
Markus
A particular problem I had was adding a field to an empty array of structs.
This is sometimes needed if you want to concatenate arrays of structs - it would be helpful if MATLAB could be a little more forgiving in type checking empty arrays in such circumstances!
Loren,
I was responding to the ‘don’t grow arrays in a loop’ comment. The C++ STL approach partially separates allocating memory from adding a value. A benefit of this is to allow growing an array within a loop to be more efficient. This is useful in case one does not know how big the array will be, the user is less knowledgeable about the impact of growing an array on efficiency, or maybe even sometimes a knowledgeable user would rather not preallocate.
The ‘reserve’ facility would address Shane’s comment, in that it just preallocates space, and there would be no inefficiency from a possibly unnecessary initialization. Semantically, if all that has been done to array is reserve space, it is still empty.
What Tom is referring to is the dynamic-array method. Suppose you don’t know how big an array should be. The rule is, if you add a new element and run out of space, double its size. Then the amortized time of each insertion is O(1). Consider this poor code, which takes O(n^2) time:
x = 0
for i = 1:n
x (i) = i ;
end
Now consider a variant that takes O(n) total time, or O(1) *amortized* per iteration:
x = 0 ;
len = 1 ;
for i = 1:n
if (len > size(x,1))
% double the size of x
len = 2*len ;
x (len) = nan ;
end
x (i) = i ;
end
% trim x to size
x = x (1:len) ;
This works fine in MATLAB, and it’s a replacement for code that truly can’t tell how big x should be at the beginning.
The problem with trying to do this inside MATLAB itself is that it would be a huge change to the internal data structure (not having *seen* it, of course). Each MATLAB array would have to have some kind of notion of “capacity” (len in the example above) which is >= the size of the array. That would not be easy to change, I would guess, since the changes would percolate wildly.
An array of structures can be a very neat way to organize data; however, we should be aware of the price we pay in performance (pointers storage) when working with such a data structure.
For example:
a=repmat(struct(’f1′,{{}},’f2′,[1 2; 3 4]),100000,1);
whos a
Name Size Bytes Class Attributes
a 100000×1 15200128 struct
takes a little more than four times memory than
a=struct(’f1′,{cell(100000,1)},’f2′,repmat([1 2; 3 4],[1,1,100000]));
whos a
Name Size Bytes Class Attributes
a 1×1 3600248 struct
This ratio increases significantly as the structure becomes more complicated (data types)
Tom-
Thanks for the clarification. As of now, there are no empty arrays in MATLAB that don’t have at least one dimension of size 0. So to reserve space currently you have to fill the array with something, be it zeros, blanks, nans, etc. I am unaware of plans for changes to this.
–Loren
Dan,
I am going to first reproduce your code so I can discuss it:
When you call doStuff with a.b, you are passing in a new temporary variable and will not be affecting the a struct in your workspace at all. In any case, the fields a.b.d and a.d.e are unaffected because they have not be changed at all, even wrt your temporary a.b. So there is very low impact to those fields being there, even via a function call, when they are being ignored. No realloc’ing of existing fields happens when a new field gets added. The struct itself my need to realloc space because of one new array header, but each of pre-existing arrays will stay put.
Does that help?
–Loren
Very much. Thank you.
Dan
How can i create a pointer to struct?
I want to creat linked list using structures.
Priyanka-
MATLAB does not have pointers. You can create a linked list either using nested functions or, in R2008a, using the newer object oriented class system and derive your class from the handle class.
–Loren
Loren,
I have a question that is somewhat related to initializing structs. I have an object class that has a setup function and a number of structs that are global to the that class, like this:
function aObject()
anObjectIOwn = []; % this will be set to a bunch of Fhs
function setupObject(parameters)
aStruct = anotherObject();
end
end
—
The ‘anotherObject’ object passes back a struct of function handles.
I get an error “Conversion to double from struct is not possible.”
If I initialize anObjectIOwn with:
anObjectIOwn = struct();
Then I get an error ‘??? Subscripted assignment between dissimilar structures.’
If I do a clear in my setup function of the ‘anObjectIOwn’ before I try to set it, it works.
I know I could just remove the setup function and do all that work in the body of the main function and it would work (I did this before). Is there a good way of doing this short of clearing the variable? Am I missing some way of initializing a variable to be a structure which is not defined yet?
Thanks - great blog :)
Oops, I made a mistake above: the code should read:
—
function aObject()
anObjectIOwn = []; % this will be set to a bunch of Fhs
function setupObject(parameters)
anObjectIOwn = anotherObject();
end
end
—
Greg-
Do you know what field(s) the struct will have? If so, try initializing the struct (still empty) but with those fields. I don’t know if that will fix things, but it’s worth a try.
But if anObjectIOwn is really an object and not a struct, perhaps you’ll need to overload subsasgn???
–Loren
I second poster #10 above: wish Matlab could be more forgiving so that appending the 1st struct to an array of structs is easier. I am aware that
>> a.a=’q';a.b=1;q=struct(’a',{},’b',{});q(end+1)=a
q =
a: ‘q’
b: 1
works. But frequently I don’t know what the fields in the structure are (nor do I care/need that dependency), just want to collect them in an array, and then don’t know what to initialise q to so that q(end+1) still works.
Ljubomir
Hi,
I just wanted to know how to implement singly linked and doubly linked lists in matlab….using arrays.
Suhas-
See answer #18 above. You might also check the file exchange for solutions.
–Loren