# From struct to dataset

When I got to work last Friday, I saw an email discussion, on behalf of a customer, trying to find a good way to add a new field to a struct array. So this post will start with that problem, and then show a different way to collect the same information, in a dataset array.

### Initial struct and New Data

Let's create some information to store in a struct.

names = {'John'; 'Henri'};
ages = {26; 18};
initS = struct('Name', names, 'Age', ages);

Note that the ages data is a cell array. In addition to Name and Age, I have Height information in a numeric, not cell, array.

Heights = [168; 175];

How do I add this information to my struct? What follows are a series of possibilities, definitely not exhaustive!

### First Pass - for loop

Let's start with a for loop. I add Height information to each element of the struct array, one at a time.

S1 = initS;
for index = 1:length(S1)
S1(index).Height = 	Heights(index);
end

### Second Pass - arrayfun

I can use arrayfun to remove the loop.

S2 = initS;
F = @(S,h) setfield(S, 'Height', h);
S2 = arrayfun(F, S2, Heights);

### Third Pass - deal

If the data were in a cell array, I could easily distribute it to multiple outputs. Here I store the height data in a cell and deal it out.

S3 = initS;
cH = num2cell(Heights);
[S3.Height] = deal(cH{:});

### Fourth Pass - Comma-separated List

If the data is in a cell array already, I can skip the step with deal and just dish out different cells to different outputs.

S4 = initS;
cH = num2cell(Heights);
[S4.Height] = cH{:};

### Same Results?

Let's quickly check that we get the same results with each technique.

allsame = isequal(S1,S2,S3,S4)
allsame =
1


### What's the Data Look Like?

It's hard to look at the data here (in, e.g., S1) because the contents of each struct element is completely at the users's disposal. So I can look at one array element at a time.

S1(1)
ans =
Name: 'John'
Age: 26
Height: 168


Or I can look at all of the data in a single field at once.

[S1.Age]
ans =
26    18


But I don't get to see all of the data in one glance.

### Completely Different View

And now for something completely different. I've blogged before about dataset arrays from Statistics Toolbox. Here's another instance where one might be useful. I treat the columns like individual fields, and the rows as individual records. Each column contains data of a single datatype. Here's the data.

names = {'John'; 'Henri'}
ages = [26; 18];
d1 = dataset({names, 'Name'}, {ages, 'Age'})
names =
'John'
'Henri'
d1 =
Name           Age
'John'         26
'Henri'        18


Two things to note here in contrast to using a struct to contain the information. First, the arguments appear in a different order in the two solutions. Second, the numeric data doesn't need to be placed in a cell array for the dataset, making the data management more natural, in my opinion.

Let me make a new dataset with additional data, heights.

d2 = dataset({names, 'Name'}, {[168 ;175] 'Height'})
d2 =
Name           Height
'John'         168
'Henri'        175


### Concatenate dataset Arrays

Now let me collect the original dataset d1 with the new information in d2. Here are some ways to achieve this. First, just use square brackets ([]) as you would for regular array concatenation.

dnew1 = [d1 d2]
dnew1 =
Name           Age    Height
'John'         26     168
'Henri'        18     175


Another way to do this is to add the information in a struct-like way to the original dataset.

dnew2 = d1;
dnew2.Height = [168; 175]
dnew2 =
Name           Age    Height
'John'         26     168
'Henri'        18     175


Now let's make different dataset with new information, but with the order of the 2 entries swapped.

d3 = dataset({{'Henri'; 'John'}, 'Name'}, {[175; 168] 'Height'})
d3 =
Name           Height
'Henri'        175
'John'         168


What happens if we try to collect d1 and d3 together into one dataset?

try
dnew3 = [d1 d3];
catch ExcDataset
disp(ExcDataset.message)
end
Duplicate variable names with distinct data.


As you can see, I can't just collect them together via concatenation. However, I can combine or join the two datasets correctly.

dnew3 = join(d1,d2,'Name')
dnew3 =
Name           Age    Height
'John'         26     168
'Henri'        18     175


Notice how easily I can see all the data at once here, compared to the struct array.

### How Do You Arrange Your Data?

Do you use either of these strategies for arranging your data (struct or dataset arrays)? Or do you do something different? I'd love to hear your experiences here.

Published with MATLAB® 7.8

|