From struct to dataset

Posted by Loren Shure, May 20, 2009

7 views (last 30 days) | 0 Likes | 14 comments

When I got to work last Friday, I saw an email discussion, on behalf of a customer, trying to find a good way to add a new field to a struct array. So this post will start with that problem, and then show a different way to collect the same information, in a dataset array.

Initial struct and New Data
First Pass - for loop
Second Pass - arrayfun
Third Pass - deal
Fourth Pass - Comma-separated List
Same Results?
What's the Data Look Like?
Completely Different View
Concatenate dataset Arrays
How Do You Arrange Your Data?

Initial struct and New Data

Let's create some information to store in a struct.

names = {'John'; 'Henri'};
ages = {26; 18};
initS = struct('Name', names, 'Age', ages);

Note that the ages data is a cell array. In addition to Name and Age, I have Height information in a numeric, not cell, array.

Heights = [168; 175];

How do I add this information to my struct? What follows are a series of possibilities, definitely not exhaustive!

First Pass - for loop

Let's start with a for loop. I add Height information to each element of the struct array, one at a time.

S1 = initS;
for index = 1:length(S1)
    S1(index).Height = 	Heights(index);
end

Second Pass - arrayfun

I can use arrayfun to remove the loop.

S2 = initS;
F = @(S,h) setfield(S, 'Height', h);
S2 = arrayfun(F, S2, Heights);

Third Pass - deal

If the data were in a cell array, I could easily distribute it to multiple outputs. Here I store the height data in a cell and deal it out.

S3 = initS;
cH = num2cell(Heights);
[S3.Height] = deal(cH{:});

Fourth Pass - Comma-separated List

If the data is in a cell array already, I can skip the step with deal and just dish out different cells to different outputs.

S4 = initS;
cH = num2cell(Heights);
[S4.Height] = cH{:};

Same Results?

Let's quickly check that we get the same results with each technique.

allsame = isequal(S1,S2,S3,S4)

allsame =
     1

What's the Data Look Like?

It's hard to look at the data here (in, e.g., S1) because the contents of each struct element is completely at the users's disposal. So I can look at one array element at a time.

S1(1)

ans = 
      Name: 'John'
       Age: 26
    Height: 168

Or I can look at all of the data in a single field at once.

[S1.Age]

ans =
    26    18

But I don't get to see all of the data in one glance.

Completely Different View

And now for something completely different. I've blogged before about dataset arrays from Statistics Toolbox. Here's another instance where one might be useful. I treat the columns like individual fields, and the rows as individual records. Each column contains data of a single datatype. Here's the data.

names = {'John'; 'Henri'}
ages = [26; 18];
d1 = dataset({names, 'Name'}, {ages, 'Age'})

names = 
    'John'
    'Henri'
d1 = 
    Name           Age
    'John'         26 
    'Henri'        18

Two things to note here in contrast to using a struct to contain the information. First, the arguments appear in a different order in the two solutions. Second, the numeric data doesn't need to be placed in a cell array for the dataset, making the data management more natural, in my opinion.

Let me make a new dataset with additional data, heights.

d2 = dataset({names, 'Name'}, {[168 ;175] 'Height'})

d2 = 
    Name           Height
    'John'         168   
    'Henri'        175

Concatenate dataset Arrays

Now let me collect the original dataset d1 with the new information in d2. Here are some ways to achieve this. First, just use square brackets ([]) as you would for regular array concatenation.

dnew1 = [d1 d2]

dnew1 = 
    Name           Age    Height
    'John'         26     168   
    'Henri'        18     175

Another way to do this is to add the information in a struct-like way to the original dataset.

dnew2 = d1;
dnew2.Height = [168; 175]

dnew2 = 
    Name           Age    Height
    'John'         26     168   
    'Henri'        18     175

Now let's make different dataset with new information, but with the order of the 2 entries swapped.

d3 = dataset({{'Henri'; 'John'}, 'Name'}, {[175; 168] 'Height'})

d3 = 
    Name           Height
    'Henri'        175   
    'John'         168

What happens if we try to collect d1 and d3 together into one dataset?

try
    dnew3 = [d1 d3];
catch ExcDataset
    disp(ExcDataset.message)
end

Duplicate variable names with distinct data.

As you can see, I can't just collect them together via concatenation. However, I can combine or join the two datasets correctly.

dnew3 = join(d1,d2,'Name')

dnew3 = 
    Name           Age    Height
    'John'         26     168   
    'Henri'        18     175

Notice how easily I can see all the data at once here, compared to the struct array.

How Do You Arrange Your Data?

Do you use either of these strategies for arranging your data (struct or dataset arrays)? Or do you do something different? I'd love to hear your experiences here.

Published with MATLAB® 7.8

Category:: New Feature,; Structures

Structures and Comma-Separated Lists

Blogs
Concatenating structs

Blogs
Using MATLAB to Grade

Blogs
dataset
struct
Structured Data Manipulation

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.

Loren on the Art of MATLAB
Turn ideas into MATLAB

Turn ideas into MATLAB

From struct to dataset

Contents

Initial struct and New Data

First Pass - for loop

Second Pass - arrayfun

Third Pass - deal

Fourth Pass - Comma-separated List

Same Results?

What's the Data Look Like?

Completely Different View

Concatenate dataset Arrays

How Do You Arrange Your Data?

Comments

Loren on the Art of MATLABTurn ideas into MATLAB

Turn ideas into MATLAB

Contents

Initial struct and New Data

First Pass - for loop

Second Pass - arrayfun

Third Pass - deal

Fourth Pass - Comma-separated List

Same Results?

What's the Data Look Like?

Completely Different View

Concatenate dataset Arrays

How Do You Arrange Your Data?

See Also

Comments

Loren on the Art of MATLAB
Turn ideas into MATLAB