Did someone know if it’s possible to construct nested dataset (dataset of dataset).

It seem’s to work with 7.6 but not 7.8….

Thanks a lot for your help

]]>Good to know that

d1.Var2(:) == 3

works. Actually, this is a fast rick to get a variable as a cell array, and helps solving the problem of using strfind:

strfind(ds.Var1,'a word')

works.

About the command

datasetfun(@strfind,ds,{’stringVar1′ ’stringVar2′ …}, ‘uniformOutput’,false)

I don’t see how to pass the expression ‘a word’ to strfind as an argument, but it does not matter now that I found the shorter way to use strfind.

About speed issues, I still recommend to convert first to cell array before using in a loop.

Thank you for the help,

Arnaud

]]>I’m back, and I found the simple answer!

for two datasets, ds1 and ds2,

[c ia ib] = intersect(ds1.Properties.ObsNames,ds2.Properties.ObsNames)

gives indexes to the common observations; so ds1(ia,:) and ds2(ia,:) are matched row-by-row and can be concatenated horizontally.

paul

]]>Thank you for your informative article. I have a question

Is there an easy way to find the common elements of two datasets (a kind of ‘intersect’ function based on dataset.Properties.ObsNames?

My digging in the documents and fiddling with ‘join’ hasn’t produced anything obvious.

Thank you!

Paul

]]>1) You’re right, there are not so many methods (so far) that work on a dataset array as a whole. Your example is strfind; let me try to explain the reasoning why strfind _doesn’t_ work, and what you might do instead.

A dataset array is intended to hold variables of different types. So, for example, you can’t add 1 to a dataset array, for the same reason you can’t add 1 to a cell array: addition would make no sense in general because the contents need not be numeric. You could argue that if all of the variables in the array were numeric, you should be able to add 1, analogous to the way various functions recognize a “cell array of strings” as a special case of cell arrays in general:

>> strfind({‘abc’ ‘def’ ‘ghi’},’abc’)

ans =

[1] [] []

>> strfind({‘abc’ ‘def’ ‘ghi’ 1:5},’abc’)

??? Error using ==> cell.strfind at 35

If any of the input arguments are cell arrays, the first must be

a cell array of strings and the second must be a character array.

But the dataset array class is just not intended to be a surrogate for a numeric array, or for a cell array of strings in that way.

What you _can_ do, however, is to apply strfind to each variable (or to a subset of variables) in a dataset array using datasetfun, with the burden being on you to make sure that those variables are suitable. For example,

datasetfun(@strfind,ds,{‘stringVar1’ ‘stringVar2’ …}, ‘uniformOutput’,false)

2) You’re right, high frequency access of individual values in a dataset array is slower than for numeric, cell, or structure arrays, and you’ve put your finger on one of the reasons. However, the dataset array class is really designed more with large vectorized operations in mind, operations such as “find the mean height for all subjects over the age of 30”, or “log transform the weights of each subject.” For those kinds of operations, the access time difference from numeric arrays is not an issue.

3) The two examples you cite _can_ be done, just using different kinds of subscripting. The reason why the syntaxes you list _don’t_ work is that parenthesis subscripting in MATLAB preserves type, and the operations you’ve shown mix types, where no automatic conversion exists. However:

d1.Var2(:) == 3 % instead of d1(2,:)==3

and

d{3,2} = 3 % instead of d(3,2) = 3

do work. Admittedly, d{1:3,1:2} = X is not supported. You can write that in two lines as

d.Var1(1:3) = X(:,1);

d.Var2(1:3) = X(:,2);

and perhaps use a loop for a larger number of columns. Or,

d(1:3,1:2) = dataset({X,’Var1′,’Var2′})

Or, depending on what you have, it may be possible to restructure the array to have a variable with two columns, and rephrase this as

ds.Var(1:3,:) = X

Thanks for your comments; feedback like this is helpful.

]]>Importantly I would like to mention also that calling an element of a dataset in a loop is very slow (several minutes for 17000 iterations) and is less that one second with a cell array. It is probably because the just in time compilation doesn’t work with datasets. I think this problem may strongly discourage people to use it, and it would be a big plus to have the JIT compilation working on datasets.

Finally, it may be nice (but not urgent) if more function would be available for datasets. For instance, it would be convenient to be able search fields with a syntax of the type d1(2,:)==3, or to assign with d(1:3,1:2)=X, like with usual arrays.

Cheers,

Arnaud

]]>function data=z(filename) datum=importdata(filename); temp=num2cell(datum.data); data=cell2struct(temp,datum.colheaders,2);

Now I’ll try to use datasets, as they seem easy to work with.

]]>The @ sign is letting me create an anonymous function in MATLAB. I then apply that function to each element in my array. It’s a great way to allow me to create and evaluate a function without using eval. There are some posts on this blog about them (under the category of Function Handles) and good information in the MATLAB documentation as well.

–Loren

]]>In the May 20 post “From struct to dataset”, what is the @ symbol in this line doing?

F = @(S,h) setfield(S, ‘Height’, h);

]]>