Loren on the Art of MATLAB

Structure Initialization 50

Posted by Loren Shure,

This post continues in the theme of other recent ones, questions frequently posed to me (and probably others). It has to do with initializing structures. There is a healthy set of posts on the MATLAB newsgroup devoted to this topic. So let's peel things apart today.

Contents

Structures - Mental Model

It first helps to understand how MATLAB treats structures and their fields. First clear the workspace.

clear variables
close all

Let's just start with a scalar structure.

mystruct.FirstName = 'Loren';
mystruct.Height = 150
mystruct = 
    FirstName: 'Loren'
       Height: 150

Each field in the structure mystruct appears to be a separate MATLAB array. The first one, FirstName, is a string of length 5, and the second, height, is a scalar double (in cm, for those who are paying attention to units).

I can add another field and its contents can be the contents of any valid MATLAB variable. Each field is independent in size and datatype.

Array of structs

Suppose I want to extend this array to include other people and measurements. I can grow this array an element at a time.

mystruct(2).FirstName = 'Fred';
mystruct(2)
ans = 
    FirstName: 'Fred'
       Height: []

You can see here that since the field Height does not yet have a value, its value is set to empty ([]).

Don't Grow Arrays

Over the years, we have learned that growing arrays is a poor use of resources in MATLAB and that preallocation is helpful in terms of both not fragmenting memory and not spending time looking for a large enough memory slot. So, if I know I want to have 100 names in my struct, I can initialize the struct to be the right size. I may or may not feel the need to initialize the contents of the struct array however, since each field element is essentially its own MATLAB array.

How to Initialize a struct Array

Here are 2 ways to initialize the struct.

mystruct(100).FirstName = 'George';

With this method, we can see that elements are filled in with empty arrays.

mystruct(17)
ans = 
    FirstName: []
       Height: []

There's another way to initialize the struct and that is fill it with initial values.

If we were building our struct with the 5 sons of George Forman, we might create it like this.

georgeStruct = struct('FirstName','George','Height', ...
    {195 189 190 194 193})
georgeStruct = 
1x5 struct array with fields:
    FirstName
    Height

Looking at the contents of georgeStruct we see that his sons are all named George

{georgeStruct.FirstName}
ans = 
    'George'    'George'    'George'    'George'    'George'

and I made up their heights

[georgeStruct.Height]
ans =
   195   189   190   194   193

To see when and how to use cell arrays in the initialization, read the struct reference page carefully. If you want a field to contain a cell array, you must embed that cell inside another cell array.

Initializing the Contents

How important is it to initialize the contents of the struct. Of course it depends on your specifics, but since each field is its own MATLAB array, there is not necessarily a need to initialize them all up front. The key however is to try to not grow either the struct itself or any of its contents incrementally.

Your Use of Structures

What do you use structures for? Are you able to populate the contents of your struct up front? Or at least pin down the sizes early in your application? To tell me about your usage, please post details here.


Get the MATLAB code

Published with MATLAB® 7.5

50 CommentsOldest to Newest

Some C++ types (e.g., std::vector) deal with allocation as follows: 1. The space allocated grows exponentially, so that the amortized cost of adding a single element is constant. 2. A reserve() function is provided, letting you preallocate space without adding elements.

Any thoughts about adding this to Matlab?

Tom-

Can you explain what the specific benefit in MATLAB would be. Each structure member is its own entity. And the overall structure can be allocated all at once.

–Loren

Loren,
Would I be correct in thinking that a struct is really an array of pointers? By comparison that would make an array of struct an array of arrays of pointers? What implications does this have in terms of the relative efficiency of nesting structs (for example: mystruct.Physiology.height) as opposed to setting up a struct that is entirely flat at the top level? The principal reason I’m asking is that I have an application which has modularly segmented data. As a simplified example, suppose I have a pipe with material characteristics that I need in one subroutine, temperature characteristics I need in another, and stress characteristics I need in a third. Is there a penalty to creating a struct with three sub-structs under it and just passing those sub-structs into the subroutines? Assume for argument’s sake the two cases where each substruct is either large or small, in terms of its contents. Does that change the answer? I hope that I’ve asked the questions clearly…

Thanks,
Dan

I’ve grown to really appreciate the power of structures recently. Your article about vectorizing access to an array of structures was incredibly helpful, though sometimes I don’t know whether or not I am using them most efficiently. It’s an ongoing problem in my code. I generally preallocated what I call the namespace of structures whose values and lengths will be unknown. For instance, I might say:

data = repmat(struct(‘field1′,[],’field2′,[],…,’fieldN’,[]),1,M);

for M instances of N fields. This way, at least I know what the data structure is supposed to look like and how big it is supposed to be. Among other things, this helps me troubleshoot various aspects of the code, when fields are unpopulated when they should be, etc.

Other preallocation, such as x = zeros(1,L) for some loop that writes to x(1,a) for a = 1:L is more useful in terms of memory preallocation, and I find that structure name preallocation doesn’t seem to have a huge performance difference. Additionally, I have found that, in the first example, if ‘field1′ happened to be a scalar double, there doesn’t seem to be an advantage to preallocating the memory space (as opposed to the so-called name space) by saying:

data = repmat(struct(‘field1′,zeros(1,1),’field2′,[],…,’fieldN’,[]),1,M);

I think my benchmarks on my code showed no performance increase (and possibly a performance decrement), and presumably the reason was because when it writes a single scalar value to that space, it has to write it twice when preallocated instead of once. I think in this case that operation actually ended up being more tedious for some reason.

It’s worth playing around with. Those were some of my findings on that particular issue, of course in some arbitrary case that I happened to be looking at. Perhaps it’s not generalizable. For instance, if one preallocates a vector as in the second example, such that x = zeros(1,L) for length L, if you write a vector ‘x’ later on in your code that assigns x all at once, then it seems to slow the code down once again because now MATLAB is writing a vector twice instead of just writing it once. This showed me that preallocation must be carefully considered in instances when one’s code is well-vectorized.

Shane-

Really nice point about initialization. I totally agree that it can go overboard. If you are not replacing just a few values, but replacing the whole array, then preallocating is potentially costly.

–Loren

Dan-

A struct is essentially an array of pointers to other MATLAB arrays. In nested structs, the nested levels might not all be the same, and they themselves are also arrays of pointers (all under the hood of course) to other MATLAB arrays. When you pass a struct into a function AND return it changed from that function, only the fields that got changed will have copies made. MATLAB treats each field separately and smartly does a lazy copy of the data, or copy on write. The only penalty in passing the sub-structs is the creation of an intermediate struct, but you are not copying the data at all (except for the pieces that change).

Not sure if that really answers you question. Please feel free to nudge me in a different direction if you need more information.

–Loren

Loren,
You came close to covering all the pieces of the question. To give a concrete example of the last piece of it:

Suppose I have:

a.b.c = ones(1000,1000);
a.b.d = a.b.c;
a.b.e = a.b.c;

Then I pass a.b into my function doStuff
a.b = doStuff(a.b,’c');

function out = dostuff(in,fieldname)
out = in; % Yes I know I’m not inplacing here
out.(fieldname) = sqrt(in.(fieldname);

In a case like this, what happens to a.b.d, and a.b.e? Do they get re-copied, or is Matlab smart enough to recognize that it’s the same thing as if I were at the top level of a structure? The other piece is wondering whether I’m paying a penalty for accessing a pointer to a pointer to a variable. Does this double (or worse) the memory access time in the look up process? Finally if an allocation changes, or i create a new field, in a low (deep?) level of the structure, do I pay an additional overhead as all of the layers of pointers above it need to be reallocated? I think I may not have been clear on that last question. Let me know if I should try to clarify further.

Thanks,
Dan

I’ve found that my code is easier to maintain if I pass all variables to & from functions. This makes it much easier to follow the data flow. But, this can lead to long variable lists in the function calls.

MatLab Data Structures are a handy way to collect associated groups of variables together. This can make the function calls easier to read, while still clearly showing the source & destination of the data.

Unfortunately, many novice MatLab users aren’t aware of MatLab Data Structures so they can be confused when I use them. So, I tend to restrict my use of them to places where the name of the structure can give the reader another cue to help them understand.

For example, maybe a function needs a series of defined constants, for example, length & weight. These could be packaged into a structure called const such that when they encounter:

const.length
or
const.weight

Novice users are likely to follow the syntax when presented with the additional cue in the name of the structure.

This appears to be similar to the design of the various dynamic system objects in the Control System Toolbox. For example, the SS objects are state space systems and appear to have the fields of A, B, C, D & E for the respectively named matrices that define such state space systems. Although I haven’t found these fields defined in the MatLab documentation, many a hacker like me has stumbled upon these fields and used it when, for example, access to the system matrix is required.

I often use structures to pass parameters from one function to another. This help keeping argument lists short. I only rarely use structures of another dimension than 1×1.

Further, when programming large projects, I use Matlab classes and objects. The way I access my objects is just like I access structures. One advantage of using classes and objects is that you can never accidently create a new field instead of replacing the value of an existing one. Also, you can organize functions into class directories. The only drawback is that accessing objects is far slower than accessing structures, even if they seems to be very similar.

Loren, maybe classes and objects could be worth another blog entry.

Markus

A particular problem I had was adding a field to an empty array of structs.

This is sometimes needed if you want to concatenate arrays of structs – it would be helpful if MATLAB could be a little more forgiving in type checking empty arrays in such circumstances!

Loren,

I was responding to the ‘don’t grow arrays in a loop’ comment. The C++ STL approach partially separates allocating memory from adding a value. A benefit of this is to allow growing an array within a loop to be more efficient. This is useful in case one does not know how big the array will be, the user is less knowledgeable about the impact of growing an array on efficiency, or maybe even sometimes a knowledgeable user would rather not preallocate.

The ‘reserve’ facility would address Shane’s comment, in that it just preallocates space, and there would be no inefficiency from a possibly unnecessary initialization. Semantically, if all that has been done to array is reserve space, it is still empty.

What Tom is referring to is the dynamic-array method. Suppose you don’t know how big an array should be. The rule is, if you add a new element and run out of space, double its size. Then the amortized time of each insertion is O(1). Consider this poor code, which takes O(n^2) time:

x = 0
for i = 1:n
x (i) = i ;
end

Now consider a variant that takes O(n) total time, or O(1) *amortized* per iteration:

x = 0 ;
len = 1 ;
for i = 1:n
if (len > size(x,1))
% double the size of x
len = 2*len ;
x (len) = nan ;
end
x (i) = i ;
end
% trim x to size
x = x (1:len) ;

This works fine in MATLAB, and it’s a replacement for code that truly can’t tell how big x should be at the beginning.

The problem with trying to do this inside MATLAB itself is that it would be a huge change to the internal data structure (not having *seen* it, of course). Each MATLAB array would have to have some kind of notion of “capacity” (len in the example above) which is >= the size of the array. That would not be easy to change, I would guess, since the changes would percolate wildly.

An array of structures can be a very neat way to organize data; however, we should be aware of the price we pay in performance (pointers storage) when working with such a data structure.
For example:

a=repmat(struct(‘f1′,{{}},’f2′,[1 2; 3 4]),100000,1);
whos a
Name Size Bytes Class Attributes

a 100000×1 15200128 struct

takes a little more than four times memory than

a=struct(‘f1′,{cell(100000,1)},’f2′,repmat([1 2; 3 4],[1,1,100000]));
whos a
Name Size Bytes Class Attributes

a 1×1 3600248 struct

This ratio increases significantly as the structure becomes more complicated (data types)

Tom-

Thanks for the clarification. As of now, there are no empty arrays in MATLAB that don’t have at least one dimension of size 0. So to reserve space currently you have to fill the array with something, be it zeros, blanks, nans, etc. I am unaware of plans for changes to this.

–Loren

Dan,

I am going to first reproduce your code so I can discuss it:

a.b.c = ones(1000,1000);
a.b.d = a.b.c;
a.b.e = a.b.c;

Then I pass a.b into my function doStuff
a.b = doStuff(a.b,’c');

function out = dostuff(in,fieldname)
out = in; % Yes I know I’m not inplacing here
out.(fieldname) = sqrt(in.(fieldname);

When you call doStuff with a.b, you are passing in a new temporary variable and will not be affecting the a struct in your workspace at all. In any case, the fields a.b.d and a.d.e are unaffected because they have not be changed at all, even wrt your temporary a.b. So there is very low impact to those fields being there, even via a function call, when they are being ignored. No realloc’ing of existing fields happens when a new field gets added. The struct itself my need to realloc space because of one new array header, but each of pre-existing arrays will stay put.

Does that help?
–Loren

Priyanka-

MATLAB does not have pointers. You can create a linked list either using nested functions or, in R2008a, using the newer object oriented class system and derive your class from the handle class.

–Loren

Loren,
I have a question that is somewhat related to initializing structs. I have an object class that has a setup function and a number of structs that are global to the that class, like this:

function aObject()

anObjectIOwn = []; % this will be set to a bunch of Fhs

function setupObject(parameters)
aStruct = anotherObject();
end
end


The ‘anotherObject’ object passes back a struct of function handles.

I get an error “Conversion to double from struct is not possible.”
If I initialize anObjectIOwn with:
anObjectIOwn = struct();
Then I get an error ‘??? Subscripted assignment between dissimilar structures.’

If I do a clear in my setup function of the ‘anObjectIOwn’ before I try to set it, it works.

I know I could just remove the setup function and do all that work in the body of the main function and it would work (I did this before). Is there a good way of doing this short of clearing the variable? Am I missing some way of initializing a variable to be a structure which is not defined yet?

Thanks – great blog :)

Oops, I made a mistake above: the code should read:


function aObject()

anObjectIOwn = []; % this will be set to a bunch of Fhs

function setupObject(parameters)
anObjectIOwn = anotherObject();
end
end

Greg-

Do you know what field(s) the struct will have? If so, try initializing the struct (still empty) but with those fields. I don’t know if that will fix things, but it’s worth a try.

But if anObjectIOwn is really an object and not a struct, perhaps you’ll need to overload subsasgn???

–Loren

I second poster #10 above: wish Matlab could be more forgiving so that appending the 1st struct to an array of structs is easier. I am aware that
>> a.a=’q';a.b=1;q=struct(‘a’,{},’b',{});q(end+1)=a
q =
a: ‘q’
b: 1
works. But frequently I don’t know what the fields in the structure are (nor do I care/need that dependency), just want to collect them in an array, and then don’t know what to initialise q to so that q(end+1) still works.
Ljubomir

Hi,

I just wanted to know how to implement singly linked and doubly linked lists in matlab….using arrays.

Hi quick query.
General:
With structures: is it as easy (or easier) to operate on as normal arrays?

Specific:
Say i had a structure set up with a field called time which always had 3 values such as [0 5 10] and I wanted to find all entries with those certain values, would it recognise that [0 10 5] is equivalent? Which is what i would like it to do.

Baalzamon-

You’d have to set up a test that equated permutations of vectors, perhaps using ismember. If that doesn’t work, you can write a function that does the comparison you want, but would have to extract the data from the struct most likely, before doing the comparison.

–Loren

One area not covered (as far as I can see is the vectorised initialisation/population of structure fields in a structure array. Vectorised extraction has been covered:

all_field1 = [tmp_struct(:).field1];

However, what if you want to do the opposite – say you take all the field1 values and multiply them by a scalar and then try to write the answer back to the respective fields in the structure array:

all_field1 = [tmp_struct(:).field1];
new_vals = all_field1.*2;

%  Try a vectorised write operation to the fields in the structure array

tmp_struct(:).field1 = new_vals; % Fail

I always have to resort to looping through the structure array and do it the long winded way (which is frustrating):

for i = 1:length(tmp_struct)
   tmp_struct(i).field1 = new_vals(i);
end

Any ideas on a more concise method for doing this?

Paul-

Depends what you mean by concise. Three lines of code isn’t that long. You could us struct2cell, work on values in cells (or convert to numeric and work on them) and then convert back. I doubt that’s more concise. Under some conditions, it might be faster than the loop, perhaps. There isn’t a one-liner to what you want however.

–Loren

Paul – you can use the following vectorized approach, which is faster than a for loop;

new_vals = mat2cell(new_vals,1,ones(size(new_vals)));
[tmp_struct.field1] = new_vals{:};

You can choose to work with cell-arrays from the onset, saving the mat2cell conversion. In your example above:

all_field1 = {tmp_struct.field1};
new_vals = cellfun(@(d)d*2, all_field1, 'uniform',0);
[tmp_struct.field1] = new_vals{:};

note that there is no need to use (:) anywhere, only {:}

Yair

I realize this may seem a bit against the spirit of using structs, but I found that, instead of an Nx1 struct with several scalar fields, a 1×1 struct with Nx1 vectors (one in each field) is much easier to work with. It’s easy to initialize, and interfacing the data with vectorized code is instant (both reading and writing to the struct).

An example of some vectorized code using this scheme:

N = 10;
objects.x = rand(N,1);  %defined struct that holds a few "objects"
objects.y = rand(N,1);
norms = sqrt(objects.x.^2 + objects.y.^2)  %retrieved values from all objects and combined them
objects.x = objects.x ./ norms  %writing back values is just as easy
objects.y = objects.y ./ norms

As you can see, it’s very friendly to vectorized code, and doesn’t give up any of the advantages of using a struct!

Jotaf-

That’s a completely fine way to use structs and the way I use them quite often. There is nothing enforcing that .x and .y have the same length in the scalar struct version you show however, so you could have issues if the data stored in the fields isn’t commensurate. struct arrays can certainly have the same issue, but being a scalar struct doesn’t get you the guarantee either. FWIW, the scalar struct version is more memory efficient as well.

–Loren

Hi Loren,

I had a question for you in January on this post (at #12)
http://blogs.mathworks.com/loren/2006/08/09/essence-of-indexing/

After considering cells as you indicated, I also tried structures (a tip from Doug Hull). I have been playing with structures for a few months now and they seem to be the perfect solution to what I wanted to do.

I like the flexibility they afford.
And I like to use your first method for initializing.

Here is a piece of code that does exactly what I was looking for and uses the first method for initializing.

j=1:1:40;
data=repmat([1:1:40],40,1);
test.input=[];
test.sigma=j;
test.filters={[]};
test.output ={[]};
test.filterNames={[]};
test.outputNames={[]};
for i=1:10
    test.filterNames{i,:} = sprintf('filter%d', (i));
    test.outputNames{i,:} = sprintf('output%d', (i));
    test.input(:,:,i)= data(:,:);
    test.filters{:,:,i} = ones(2*i+1);
    f=cell2mat(test.filters(:,:,i));
    lp= filter2(f,test.input(:,:,i));
    test.output{:,:,i}=lp;
end
clear i j f lp;

But I have a question for you on my next step:

what if I wanted now to write the (i)th output to an individual matrix with the (i)th outputNames as its name?
Thank you

Matteo

Hi Loren,

thanks for the advice and great reference.
I did not intend to do that as an input to further analysis, only as a way to easily export variables to give to people that do not have experience with structures. But I see I can use this tip “How do I dynamically generate a filename for SAVE?” for that, so thanks twice. Cheers. Matteo

Hi Loren,

I have been learning/coding in MATLAB for a few months now, and can say that in comparison to my graduate studies in Fortran 77 and Perl, it has been nothing but a pleasure—especially while having your posts as a resource, they are so very beneficial. This is my first post, as I have finally hit a snag I haven’t yet been able to solve on my own, or find the solution through these forums and others. I apologize in advance if this has been discussed in another post.

I currently began working with structures, and am looking to develop a quick way to lookup the data it contains, using a hash table. I have been following the examples for Map Containers at:
http://www.mathworks.com/help/techdoc/matlab_prog/brqqo5e-1.html

So, first I create some structure, s, so that:

s(1).ticketNum = '2S185'; s(1).destination = 'Barbados';
s(1).reserved = '06-May-2008'; s(1).origin = 'La Guardia';
s(2).ticketNum = '947F4'; s(2).destination = 'St. John';
s(2).reserved = '14-Apr-2008'; s(2).origin = 'Oakland';
s(3).ticketNum = 'A479GY'; s(3).destination = 'St. Lucia';
s(3).reserved = '28-Mar-2008'; s(3).origin = 'JFK';
s(4).ticketNum = 'B7398'; s(4).destination = 'Granada';
s(4).reserved = '30-Apr-2008'; s(4).origin = 'JFK';
s(5).ticketNum = 'NZ1452'; s(5).destination = 'Aruba';
s(5).reserved = '01-May-2008'; s(5).origin = 'Denver';

Where we can view the first index in the structure as:

>> s(1)

ans = 

      ticketNum: '2S185'
    destination: 'Barbados'
       reserved: '06-May-2008'
         origin: 'La Guardia'

Continuing with the example, we are shown showing how one can implement mapping to a structure array by defining a map.container as:

seatingMap = containers.Map( ...
    {'23F', '15C', '15B', '09C', '12D'}, ...
    {s(1), s(2), s(3), s(4), s(5)});

which works wonderfully.

>> seatingMap('15B')

ans = 

      ticketNum: 'A479GY'
    destination: 'St. Lucia'
       reserved: '28-Mar-2008'
         origin: 'JFK'

Now, assume I have a structure, S, with N number of elements. Additionally I have a name_S, a 1xN character cell-array that contains the keys which I wish to assign as keys to the structure indexes. Being that N is quite large (near 1000), I am looking for a way to perform the mapping in a vector operation, however, can’t find a solution. I realize I could do this with a for loop, but am always trying to keep my code fast and clean.

Using the previously defined structure, I have tried to do so with the following statement (and many others), all to no avail.

seatingMap = containers.Map( ...
    {'23F', '15C', '15B', '09C', '12D'}, ...
    s(1:5));
??? Error using ==> containers.Map
Specified value type does not match the type expected for this container.

I guess this is more general question of how we can express a range of structure elements to be be operated on in one statement. Any advice is greatly appreciated.
Thank you much,
Nico.

Nico-

I don’t have time to try anything out right now, but I would look into arrayfun (which has a for loop inside) or some combination of struct2cell and indexing, and possibly deal. You might not be able to do it in one line. You might need to create a temporary cell array and then use that for the final input to containers.Map.

–Loren

I am wondering about how to initialize a structure array with more elements than I want to type in manually. For example, if I read a directory of 500 dicom headers using ‘dicominfo’, I get a structure info(1:500), where there might be ~100 field entires for each array element. One solution might be to read one header, then use it as the preallocation template, but is there a way to be more generic?

Chris-

If you have all the info at once, you can use the struct command itself which will allow you to create an array and initialize the values.

–Loren

Hi, I’m new to matlab. I would like to create empty struct and add its values using a for loop. I have an image which I divide into non-overlapping blocks. For each block I calculate the quality measure. What I would like to do is take the block store it in a struct with its corresponding quality value. Later in my code I would like to compute the mean of the quality measure for all blocks and set those blocks less then this quality mean to zeros.

Nteza-

No reason why you can’t do this. Just define your struct – but if you know how many blocks, you don’t need to make it an empty struct (size 0×0) but the correct size so it doesn’t grown. Specify the fields. So you could initialize like this

a(20,30).imageBlock = [];
a(20,30).quality = [];

Then use your for loop to fill them. Then do your other calculations.

–Loren

My contribution.

In my lab we make many runs of our experiment. Each run is a data set consisting of many fields (time, energy, polarization, momentum, jitter…). This means that I need to create an array of structures. Each structure contains the data for a run with these field names.

The problem is that each structure contains maaaany fields, and I do NOT want to write each field separately as you do here. Instead, I first create a cell array with the strings of every field name, like this:

fieldNames = {‘field1′, ‘field2′, … , ‘fieldN’};

Now I will pass each of this string variables in the cell as a dynamic fieldname to my structure, that is:

for c = fieldNames
mystruct.(c{1}) = someData; % In each iteration c is a 1×1 cell array containing the next fieldName.
end

Now, this should work for a single structure with maaany fields. In my case I still need more than this. I need an array of such structures, one for each run of experiment. In that case, I will include another for loop that goes through each run number. I will also need to initialize somehow this array of structures. Here’s the whole (but simple) code:

fieldNames = {‘field1′, ‘field2′, … , ‘fieldN’};
struct(numberOfRuns).field1 = 0; % Initialize the array of structures with at least one field each.
for i=1:numberOfRuns
for c = fieldNames
mystruct(i).(c{1}) = someData;
end
end

In my opinion this is an elegant (at least code-friendly) way to work with large arrays of structures that contain many fields. A good analogy is a dataBase containing many other personal data besides the usual ‘name’ and ‘height’.

I would anyway appreciate comments on the efficiency of this code.

Thanks!

Gerard-

If you have access to Statistics Toolbox, you may find dataset arrays very useful for your application. The idea of arrays of structures can be quite useful, but it is costly in terms of memory and access.

–Loren

Thank you Loren, I have been checking the statistics toolbox and looks very promising.

However, in all examples they generate datasets from already existing variables. I do not see how could you open, say, 10 bin files and store them directly into a dataset with the corresponding label from the cell array (which is what I am doing in my code if I replace ‘someData’ by e.g. ‘fread’ function).

Gerard, there are two ways to create a dataset array from scratch: from workspace variables, or by reading from a text or Excel file. In you case, if possible, it would be fastest to create one dataset variable at a time by reversing the order of your loops, and putting something like this line inside the outer (fields) loop, but following the inner (runs) loop:

myData.(varNames{j}) = aVectorOfData;

You can certainly use your existing code, and replace the innermost line with

myData.(c{1})(i) = someData;

but it is fastest to work on one entire variable at a time.

In either case, you would need to preallocate the dataset array before doing the above, much as you did for the structure array.

Thanks for your support Peter.

In my case, the ‘fieldNames’ cell array contains the strings that are, at the same time, both the names of each bin file to be imported AND the field names I want to use as variables in the structures.

This means I still have not created any variables of the bin files. I am just reading the files DIRECTLY into each field of the structure.

mystruct(i).(c{1}) = fread( [c{1} '.bin'] ); % It was maybe unclear before, but I read a whole vector in each field of the structure, not a single data value.
% Each ‘i’ has the run number and each field has a vector of data.

I don’t see how could I read these files DIRECTLY into a dataset (seems it is not possible, as you say). So I should first read the files into variables anyway using my code, and then create the dataset. This looks a bit messy to me.

Also, the bin files have different data size. ‘dataset’ is apparently used with equal size variables. When used with different variable sizes I cannot visualize it well in the workspace (harder to debug, then?). Moreover, with my code I have now an array of structures that contains a different structure for each run, all packed in one variable in the workspace (very clean). I guess dataset arrays cannot have more than two dimensions (?), so as to make the third dimension the run number (this way I could have sheets of datasets for each run, like excel sheets, and access them very easily).

Are in this case dataset arrays still worth the effort?

Gerard, two things:

> This means I still have not created any variables of the
> bin files. I am just reading the files DIRECTLY into
> each field of the structure.

Sure you have. You just haven’t given them a name. They’re temporary values that exist only for the lifespan of that one line. You can do the same thing with dataset.

> Also, the bin files have different data size. ‘dataset’
> is apparently used with equal size variables.

Yes. A dataset array is like a table where each column (dataset variable) has the same length. So if your data aren’t like that, then dataset isn’t for you. The 2-D-ness is perhaps also a problem, but dataset arrays do allow you to have variables that are themselves matrices. That can sometimes be convenient for higher dimensional data.

Thanks Peter. Well then, I guess I will have to stick on my code, since datasets cannot handle size and multi-D. It is a pity, they look very flexible and efficient for vectorizing and also for code simplification.

I won’t get rid of the two for loops that I have to carry during the whole code :( (I have to make calculations for each run and for every field).

One more question: how to retrieve data from a field for all the structures of an array of structures? That is, this line

myStruct(:).fieldName

does only retrieve myStruct(1).fieldName, but not all of them.

Gerard-

myStruct(:).fieldname produces a comma-separated list in MATLAB (see the doc to learn more about this). To collect all the outputs, assuming they are scalar numeric, into a single vector, enclose the expression in square brackets [myStruct(:).fieldname]. To stick the values into a cell array, in case they won’t concatenate properly, do the same trick but wrap with curly braces {myStruct(:).fieldname}

–Loren

Hi Loren, thanks for your support again.

In my case, each ‘fieldName’ contains a vector, so ‘myStruct(:).fieldName’ does only produce the first ‘fieldName’ vector, the same as (myStruct(:).fieldName).

Only {myStruct(:).fieldName} is left. Could you help me out on how to concatenate this in a for loop? This doesn’t seem correct:

fieldNames = {‘field1′, … , ‘fieldN’}

for C=fieldNames
temp = { myStruct(:).(C{1}) };

for i=1:size(myStruct.(C{1}),2)
dataCat.(C{1}) = [dataCat.(C{1}) temp{i}];
end

end

Gerard-

Please contact technical support for more help. I don’t exactly understand your situation and am not near MATLAB to try things out.

–Loren

These postings are the author's and don't necessarily represent the opinions of MathWorks.