<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: From struct to dataset</title>
	<atom:link href="http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/</link>
	<description>Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.</description>
	<lastBuildDate>Wed, 08 May 2013 12:21:15 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-31264</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Tue, 13 Apr 2010 14:09:03 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-31264</guid>
		<description><![CDATA[Djames, can you be more specific about what you&#039;re trying to do?  There are at least a couple of dfferent things that would fit the description &quot;dataset of dataset&quot;.  Thanks.]]></description>
		<content:encoded><![CDATA[<p>Djames, can you be more specific about what you&#8217;re trying to do?  There are at least a couple of dfferent things that would fit the description &#8220;dataset of dataset&#8221;.  Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Djames</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-31263</link>
		<dc:creator>Djames</dc:creator>
		<pubDate>Tue, 13 Apr 2010 10:02:09 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-31263</guid>
		<description><![CDATA[I was just searching for information about dataset in matlab and found this article.
Did someone know if it&#039;s possible to construct nested dataset (dataset of dataset).
It seem&#039;s to work with 7.6 but not 7.8....

Thanks a lot for your help]]></description>
		<content:encoded><![CDATA[<p>I was just searching for information about dataset in matlab and found this article.<br />
Did someone know if it&#8217;s possible to construct nested dataset (dataset of dataset).<br />
It seem&#8217;s to work with 7.6 but not 7.8&#8230;.</p>
<p>Thanks a lot for your help</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arnaud Amzallag</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30480</link>
		<dc:creator>Arnaud Amzallag</dc:creator>
		<pubDate>Mon, 20 Jul 2009 14:06:36 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30480</guid>
		<description><![CDATA[thank you Peter Perkins for your detailed answer. 

Good to know that &lt;pre&gt; d1.Var2(:) == 3 &lt;/pre&gt; works. Actually, this is a fast rick to get a variable as a cell array, and helps solving the problem of using strfind: 
&lt;pre&gt;
strfind(ds.Var1,&#039;a word&#039;)
&lt;/pre&gt;
works.

About the command
&lt;pre&gt;
datasetfun(@strfind,ds,{’stringVar1′ ’stringVar2′ …}, ‘uniformOutput’,false)&lt;/pre&gt;
I don&#039;t see how to pass the expression &#039;a word&#039; to strfind as an argument, but it does not matter now that I found the shorter way to use strfind.

About speed issues, I still recommend to convert first to cell array before using in a loop.

Thank you for the help,

Arnaud]]></description>
		<content:encoded><![CDATA[<p>thank you Peter Perkins for your detailed answer. </p>
<p>Good to know that
<pre> d1.Var2(:) == 3 </pre>
</p><p> works. Actually, this is a fast rick to get a variable as a cell array, and helps solving the problem of using strfind: </p>
<pre>
strfind(ds.Var1,'a word')
</pre>
<p>works.</p>
<p>About the command</p>
<pre>
datasetfun(@strfind,ds,{’stringVar1′ ’stringVar2′ …}, ‘uniformOutput’,false)</pre>
<p>I don&#8217;t see how to pass the expression &#8216;a word&#8217; to strfind as an argument, but it does not matter now that I found the shorter way to use strfind.</p>
<p>About speed issues, I still recommend to convert first to cell array before using in a loop.</p>
<p>Thank you for the help,</p>
<p>Arnaud</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul R Martin</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30474</link>
		<dc:creator>Paul R Martin</dc:creator>
		<pubDate>Fri, 17 Jul 2009 03:54:29 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30474</guid>
		<description><![CDATA[Dear Loren,

I&#039;m back, and I found the simple answer!

for two datasets, ds1 and ds2,

&lt;pre&gt;
[c ia ib] = intersect(ds1.Properties.ObsNames,ds2.Properties.ObsNames)
&lt;/pre&gt;

gives indexes to the common observations; so ds1(ia,:) and ds2(ia,:) are matched row-by-row and can be concatenated horizontally.

paul]]></description>
		<content:encoded><![CDATA[<p>Dear Loren,</p>
<p>I&#8217;m back, and I found the simple answer!</p>
<p>for two datasets, ds1 and ds2,</p>
<pre>
[c ia ib] = intersect(ds1.Properties.ObsNames,ds2.Properties.ObsNames)
</pre>
<p>gives indexes to the common observations; so ds1(ia,:) and ds2(ia,:) are matched row-by-row and can be concatenated horizontally.</p>
<p>paul</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul R Martin</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30473</link>
		<dc:creator>Paul R Martin</dc:creator>
		<pubDate>Fri, 17 Jul 2009 03:32:44 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30473</guid>
		<description><![CDATA[Dear Loren,

Thank you for your informative article. I have a question

Is there an easy way to find the common elements of two datasets (a kind of &#039;intersect&#039; function based on dataset.Properties.ObsNames?

My digging in the documents and fiddling with &#039;join&#039; hasn&#039;t produced anything obvious.

Thank you!

Paul]]></description>
		<content:encoded><![CDATA[<p>Dear Loren,</p>
<p>Thank you for your informative article. I have a question</p>
<p>Is there an easy way to find the common elements of two datasets (a kind of &#8216;intersect&#8217; function based on dataset.Properties.ObsNames?</p>
<p>My digging in the documents and fiddling with &#8216;join&#8217; hasn&#8217;t produced anything obvious.</p>
<p>Thank you!</p>
<p>Paul</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30471</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Thu, 16 Jul 2009 18:46:35 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30471</guid>
		<description><![CDATA[Arnaud, let me try to respond to each of your comments.

1) You&#039;re right, there are not so many methods (so far) that work on a dataset array as a whole.  Your example is strfind; let me try to explain the reasoning why strfind _doesn&#039;t_ work, and what you might do instead.

A dataset array is intended to hold variables of different types.  So, for example, you can&#039;t add 1 to a dataset array, for the same reason you can&#039;t add 1 to a cell array: addition would make no sense in general because the contents need not be numeric.  You could argue that if all of the variables in the array were numeric, you should be able to add 1, analogous to the way various functions recognize a &quot;cell array of strings&quot; as a special case of cell arrays in general:

&gt;&gt; strfind({&#039;abc&#039; &#039;def&#039; &#039;ghi&#039;},&#039;abc&#039;)
ans = 
    [1]     []     []
&gt;&gt; strfind({&#039;abc&#039; &#039;def&#039; &#039;ghi&#039; 1:5},&#039;abc&#039;)
??? Error using ==&gt; cell.strfind at 35
If any of the input arguments are cell arrays, the first must be
a cell array of strings and the second must be a character array.

But the dataset array class is just not intended to be a surrogate for a numeric array, or for a cell array of strings in that way.

What you _can_ do, however, is to apply strfind to each variable (or to a subset of variables) in a dataset array using datasetfun, with the burden being on you to make sure that those variables are suitable.  For example,

datasetfun(@strfind,ds,{&#039;stringVar1&#039; &#039;stringVar2&#039; ...}, &#039;uniformOutput&#039;,false)


2)  You&#039;re right, high frequency access of individual values in a dataset array is slower than for numeric, cell, or structure arrays, and you&#039;ve put your finger on one of the reasons.  However, the dataset array class is really designed more with large vectorized operations in mind, operations such as &quot;find the mean height for all subjects over the age of 30&quot;, or &quot;log transform the weights of each subject.&quot;  For those kinds of operations, the access time difference from numeric arrays is not an issue.


3) The two examples you cite _can_ be done, just using different kinds of subscripting.  The reason why the syntaxes you list _don&#039;t_ work is that parenthesis subscripting in MATLAB preserves type, and the operations you&#039;ve shown mix types, where no automatic conversion exists.  However:

d1.Var2(:) == 3   % instead of d1(2,:)==3

and

d{3,2} = 3 % instead of d(3,2) = 3

do work.  Admittedly, d{1:3,1:2} = X is not supported.  You can write that in two lines as 

d.Var1(1:3) = X(:,1);
d.Var2(1:3) = X(:,2);

and perhaps use a loop for a larger number of columns.  Or,

d(1:3,1:2) = dataset({X,&#039;Var1&#039;,&#039;Var2&#039;})

Or, depending on what you have, it may be possible to restructure the array to have a variable with two columns, and rephrase this as

ds.Var(1:3,:) = X

Thanks for your comments; feedback like this is helpful.]]></description>
		<content:encoded><![CDATA[<p>Arnaud, let me try to respond to each of your comments.</p>
<p>1) You&#8217;re right, there are not so many methods (so far) that work on a dataset array as a whole.  Your example is strfind; let me try to explain the reasoning why strfind _doesn&#8217;t_ work, and what you might do instead.</p>
<p>A dataset array is intended to hold variables of different types.  So, for example, you can&#8217;t add 1 to a dataset array, for the same reason you can&#8217;t add 1 to a cell array: addition would make no sense in general because the contents need not be numeric.  You could argue that if all of the variables in the array were numeric, you should be able to add 1, analogous to the way various functions recognize a &#8220;cell array of strings&#8221; as a special case of cell arrays in general:</p>
<p>&gt;&gt; strfind({&#8216;abc&#8217; &#8216;def&#8217; &#8216;ghi&#8217;},&#8217;abc&#8217;)<br />
ans =<br />
    [1]     []     []<br />
&gt;&gt; strfind({&#8216;abc&#8217; &#8216;def&#8217; &#8216;ghi&#8217; 1:5},&#8217;abc&#8217;)<br />
??? Error using ==&gt; cell.strfind at 35<br />
If any of the input arguments are cell arrays, the first must be<br />
a cell array of strings and the second must be a character array.</p>
<p>But the dataset array class is just not intended to be a surrogate for a numeric array, or for a cell array of strings in that way.</p>
<p>What you _can_ do, however, is to apply strfind to each variable (or to a subset of variables) in a dataset array using datasetfun, with the burden being on you to make sure that those variables are suitable.  For example,</p>
<p>datasetfun(@strfind,ds,{&#8216;stringVar1&#8242; &#8216;stringVar2&#8242; &#8230;}, &#8216;uniformOutput&#8217;,false)</p>
<p>2)  You&#8217;re right, high frequency access of individual values in a dataset array is slower than for numeric, cell, or structure arrays, and you&#8217;ve put your finger on one of the reasons.  However, the dataset array class is really designed more with large vectorized operations in mind, operations such as &#8220;find the mean height for all subjects over the age of 30&#8243;, or &#8220;log transform the weights of each subject.&#8221;  For those kinds of operations, the access time difference from numeric arrays is not an issue.</p>
<p>3) The two examples you cite _can_ be done, just using different kinds of subscripting.  The reason why the syntaxes you list _don&#8217;t_ work is that parenthesis subscripting in MATLAB preserves type, and the operations you&#8217;ve shown mix types, where no automatic conversion exists.  However:</p>
<p>d1.Var2(:) == 3   % instead of d1(2,:)==3</p>
<p>and</p>
<p>d{3,2} = 3 % instead of d(3,2) = 3</p>
<p>do work.  Admittedly, d{1:3,1:2} = X is not supported.  You can write that in two lines as </p>
<p>d.Var1(1:3) = X(:,1);<br />
d.Var2(1:3) = X(:,2);</p>
<p>and perhaps use a loop for a larger number of columns.  Or,</p>
<p>d(1:3,1:2) = dataset({X,&#8217;Var1&#8242;,&#8217;Var2&#8242;})</p>
<p>Or, depending on what you have, it may be possible to restructure the array to have a variable with two columns, and rephrase this as</p>
<p>ds.Var(1:3,:) = X</p>
<p>Thanks for your comments; feedback like this is helpful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arnaud Amzallag</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30467</link>
		<dc:creator>Arnaud Amzallag</dc:creator>
		<pubDate>Tue, 14 Jul 2009 10:35:55 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30467</guid>
		<description><![CDATA[I recently discovered datasets in Matlab, and it seems appropriate for the the data I handle, that is genomic annotations: a table with fields name, chromosome, coordinates, etc. However the dataset type is not so easy to handle, propably because it is not as widely supported by Matlab classic functions as cell arrays. For instance, strfind works on cell arrays but not on datasets. I contributed a short code which converts the dataset to cell array in order to perform a search in the dataset (strfind for datasets, at http://www.mathworks.com/matlabcentral/fileexchange/24690).

Importantly I would like to mention also that calling an element of a dataset in a loop is very slow (several minutes for 17000 iterations) and is less that one second with a cell array. It is probably because the just in time compilation doesn&#039;t work with datasets. I think this problem may strongly discourage people to use it, and it would be a big plus to have the JIT compilation working on datasets.

Finally, it may be nice (but not urgent) if more function would be available for datasets. For instance, it would be convenient to be able search fields with  a syntax of the type d1(2,:)==3, or to assign with d(1:3,1:2)=X, like with usual arrays.

Cheers,

Arnaud]]></description>
		<content:encoded><![CDATA[<p>I recently discovered datasets in Matlab, and it seems appropriate for the the data I handle, that is genomic annotations: a table with fields name, chromosome, coordinates, etc. However the dataset type is not so easy to handle, propably because it is not as widely supported by Matlab classic functions as cell arrays. For instance, strfind works on cell arrays but not on datasets. I contributed a short code which converts the dataset to cell array in order to perform a search in the dataset (strfind for datasets, at <a href="http://www.mathworks.com/matlabcentral/fileexchange/24690" rel="nofollow">http://www.mathworks.com/matlabcentral/fileexchange/24690</a>).</p>
<p>Importantly I would like to mention also that calling an element of a dataset in a loop is very slow (several minutes for 17000 iterations) and is less that one second with a cell array. It is probably because the just in time compilation doesn&#8217;t work with datasets. I think this problem may strongly discourage people to use it, and it would be a big plus to have the JIT compilation working on datasets.</p>
<p>Finally, it may be nice (but not urgent) if more function would be available for datasets. For instance, it would be convenient to be able search fields with  a syntax of the type d1(2,:)==3, or to assign with d(1:3,1:2)=X, like with usual arrays.</p>
<p>Cheers,</p>
<p>Arnaud</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marcelo</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30338</link>
		<dc:creator>Marcelo</dc:creator>
		<pubDate>Sat, 23 May 2009 16:23:16 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30338</guid>
		<description><![CDATA[I have been using a function to import csv files (with headers) as struct:

&lt;pre&gt;
function data=z(filename)
datum=importdata(filename);
temp=num2cell(datum.data);
data=cell2struct(temp,datum.colheaders,2);
&lt;/pre&gt;

Now I&#039;ll try to use datasets, as they seem easy to work with.]]></description>
		<content:encoded><![CDATA[<p>I have been using a function to import csv files (with headers) as struct:</p>
<pre>
function data=z(filename)
datum=importdata(filename);
temp=num2cell(datum.data);
data=cell2struct(temp,datum.colheaders,2);
</pre>
<p>Now I&#8217;ll try to use datasets, as they seem easy to work with.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loren</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30337</link>
		<dc:creator>Loren</dc:creator>
		<pubDate>Thu, 21 May 2009 15:03:03 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30337</guid>
		<description><![CDATA[Chris-

The @ sign is letting me create an anonymous function in MATLAB.  I then apply that function to each element in my array.  It&#039;s a great way to allow me to create and evaluate a function without using eval.  There are some posts on this blog about them (under the category of Function Handles) and good information in the MATLAB documentation as well.

--Loren]]></description>
		<content:encoded><![CDATA[<p>Chris-</p>
<p>The @ sign is letting me create an anonymous function in MATLAB.  I then apply that function to each element in my array.  It&#8217;s a great way to allow me to create and evaluate a function without using eval.  There are some posts on this blog about them (under the category of Function Handles) and good information in the MATLAB documentation as well.</p>
<p>&#8211;Loren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Eklund</title>
		<link>http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30336</link>
		<dc:creator>Chris Eklund</dc:creator>
		<pubDate>Thu, 21 May 2009 14:59:01 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/#comment-30336</guid>
		<description><![CDATA[Ms. Shure:

In the May 20 post &quot;From struct to dataset&quot;, what is the @ symbol in this line doing?

 F = @(S,h) setfield(S, &#039;Height&#039;, h);]]></description>
		<content:encoded><![CDATA[<p>Ms. Shure:</p>
<p>In the May 20 post &#8220;From struct to dataset&#8221;, what is the @ symbol in this line doing?</p>
<p> F = @(S,h) setfield(S, &#8216;Height&#8217;, h);</p>
]]></content:encoded>
	</item>
</channel>
</rss>
