From struct to dataset
When I got to work last Friday, I saw an email discussion, on behalf of a customer, trying to find a good way to add a new field to a struct array. So this post will start with that problem, and then show a different way to collect the same information, in a dataset array.
Contents
Initial struct and New Data
Let's create some information to store in a struct.
names = {'John'; 'Henri'}; ages = {26; 18}; initS = struct('Name', names, 'Age', ages);
Note that the ages data is a cell array. In addition to Name and Age, I have Height information in a numeric, not cell, array.
Heights = [168; 175];
How do I add this information to my struct? What follows are a series of possibilities, definitely not exhaustive!
First Pass - for loop
Let's start with a for loop. I add Height information to each element of the struct array, one at a time.
S1 = initS; for index = 1:length(S1) S1(index).Height = Heights(index); end
Second Pass - arrayfun
I can use arrayfun to remove the loop.
S2 = initS;
F = @(S,h) setfield(S, 'Height', h);
S2 = arrayfun(F, S2, Heights);
Third Pass - deal
If the data were in a cell array, I could easily distribute it to multiple outputs. Here I store the height data in a cell and deal it out.
S3 = initS; cH = num2cell(Heights); [S3.Height] = deal(cH{:});
Fourth Pass - Comma-separated List
If the data is in a cell array already, I can skip the step with deal and just dish out different cells to different outputs.
S4 = initS; cH = num2cell(Heights); [S4.Height] = cH{:};
Same Results?
Let's quickly check that we get the same results with each technique.
allsame = isequal(S1,S2,S3,S4)
allsame = 1
What's the Data Look Like?
It's hard to look at the data here (in, e.g., S1) because the contents of each struct element is completely at the users's disposal. So I can look at one array element at a time.
S1(1)
ans = Name: 'John' Age: 26 Height: 168
Or I can look at all of the data in a single field at once.
[S1.Age]
ans = 26 18
But I don't get to see all of the data in one glance.
Completely Different View
And now for something completely different. I've blogged before about dataset arrays from Statistics Toolbox. Here's another instance where one might be useful. I treat the columns like individual fields, and the rows as individual records. Each column contains data of a single datatype. Here's the data.
names = {'John'; 'Henri'} ages = [26; 18]; d1 = dataset({names, 'Name'}, {ages, 'Age'})
names = 'John' 'Henri' d1 = Name Age 'John' 26 'Henri' 18
Two things to note here in contrast to using a struct to contain the information. First, the arguments appear in a different order in the two solutions. Second, the numeric data doesn't need to be placed in a cell array for the dataset, making the data management more natural, in my opinion.
Let me make a new dataset with additional data, heights.
d2 = dataset({names, 'Name'}, {[168 ;175] 'Height'})
d2 = Name Height 'John' 168 'Henri' 175
Concatenate dataset Arrays
Now let me collect the original dataset d1 with the new information in d2. Here are some ways to achieve this. First, just use square brackets ([]) as you would for regular array concatenation.
dnew1 = [d1 d2]
dnew1 = Name Age Height 'John' 26 168 'Henri' 18 175
Another way to do this is to add the information in a struct-like way to the original dataset.
dnew2 = d1; dnew2.Height = [168; 175]
dnew2 = Name Age Height 'John' 26 168 'Henri' 18 175
Now let's make different dataset with new information, but with the order of the 2 entries swapped.
d3 = dataset({{'Henri'; 'John'}, 'Name'}, {[175; 168] 'Height'})
d3 = Name Height 'Henri' 175 'John' 168
What happens if we try to collect d1 and d3 together into one dataset?
try dnew3 = [d1 d3]; catch ExcDataset disp(ExcDataset.message) end
Duplicate variable names with distinct data.
As you can see, I can't just collect them together via concatenation. However, I can combine or join the two datasets correctly.
dnew3 = join(d1,d2,'Name')
dnew3 = Name Age Height 'John' 26 168 'Henri' 18 175
Notice how easily I can see all the data at once here, compared to the struct array.
How Do You Arrange Your Data?
Do you use either of these strategies for arranging your data (struct or dataset arrays)? Or do you do something different? I'd love to hear your experiences here.