<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Using MATLAB  to Grade</title>
	<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/</link>
	<description>Loren Shure  works on design of the MATLAB language at &#60;a href="http://www.mathworks.com/"&#62;The MathWorks&#60;/a&#62;. She writes here about once a week on MATLAB programming and related topics. &#60;br&#62;&#60;br&#62;&#60;a href="/images/loren-full.jpg"&#62;&#60;img src="/images/loren.jpg"&#62;&#60;/a&#62;</description>
	<pubDate>Mon, 23 Nov 2009 01:03:25 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30796</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Mon, 16 Nov 2009 15:27:40 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30796</guid>
		<description>Bassam, I can't tell from your description exactly what you intend to do, partially because there's a cut-and-paste mistake (your first line).  But let me take a shot at explaining what's happening, and what you might do.  First, set up some arrays:
&lt;pre&gt;
&#62;&#62; d = dataset({1,'name1'})
d = 
    name1
    1    
&#62;&#62; e = dataset({2,'name2'})
e = 
    name2
    2    
&lt;/pre&gt;
(These are 1x1 to make things short.)

Now assign e *into an existing subset* of d.
&lt;pre&gt;
&#62;&#62; d(1,1) = e
d = 
    name1
    2    
&lt;/pre&gt;
Notice that d's names don't change.  That's intentional -- it only assigns values.  You were doing something more like this:
&lt;pre&gt;
&#62;&#62; d(:,2) = e
d = 
    name1    Var2
    2        2   
&lt;/pre&gt;
Even here, e's name hasn't carried over, because the same rule applies, for consistency -- the names are not propagated from the RHS to the LHS if you *assign into*.  So how to do what (I think) you want?  You can explicitly specify a new name (even multiple names) as part of the assignment:
&lt;pre&gt;
&#62;&#62; d(:,'name2') = e
d = 
    name1    name2
    1        2    
&lt;/pre&gt;

Or you can concatenate:
&lt;pre&gt;
&#62;&#62; f = [d e]
f = 
    name1    name2
    1        2    
&lt;/pre&gt;
Both create the name you want as part of the assignment.

If you want to change the name of an existing variable in a dataset array, you can assign directly to the name:
&lt;pre&gt;
&#62;&#62; d.Properties.VarNames{2} = 'name2'
d = 
    name1    name2
    1        2    

&lt;/pre&gt;
The list of variable names is a cell array of strings, so you can assign to them all, or one, or even a single character of one.  There's also a SET method, similar to what you'd do with Handle Graphics.

It's apparent that the documentation was not sufficiently clear here, I'll make a note to have that looked into.  In the mean time, I hope this helps.</description>
		<content:encoded><![CDATA[<p>Bassam, I can&#8217;t tell from your description exactly what you intend to do, partially because there&#8217;s a cut-and-paste mistake (your first line).  But let me take a shot at explaining what&#8217;s happening, and what you might do.  First, set up some arrays:</p>
<pre>
&gt;&gt; d = dataset({1,'name1'})
d =
    name1
    1
&gt;&gt; e = dataset({2,'name2'})
e =
    name2
    2
</pre>
<p>(These are 1&#215;1 to make things short.)</p>
<p>Now assign e *into an existing subset* of d.</p>
<pre>
&gt;&gt; d(1,1) = e
d =
    name1
    2
</pre>
<p>Notice that d&#8217;s names don&#8217;t change.  That&#8217;s intentional &#8212; it only assigns values.  You were doing something more like this:</p>
<pre>
&gt;&gt; d(:,2) = e
d =
    name1    Var2
    2        2
</pre>
<p>Even here, e&#8217;s name hasn&#8217;t carried over, because the same rule applies, for consistency &#8212; the names are not propagated from the RHS to the LHS if you *assign into*.  So how to do what (I think) you want?  You can explicitly specify a new name (even multiple names) as part of the assignment:</p>
<pre>
&gt;&gt; d(:,'name2') = e
d =
    name1    name2
    1        2
</pre>
<p>Or you can concatenate:</p>
<pre>
&gt;&gt; f = [d e]
f =
    name1    name2
    1        2
</pre>
<p>Both create the name you want as part of the assignment.</p>
<p>If you want to change the name of an existing variable in a dataset array, you can assign directly to the name:</p>
<pre>
&gt;&gt; d.Properties.VarNames{2} = 'name2'
d =
    name1    name2
    1        2    
</pre>
<p>The list of variable names is a cell array of strings, so you can assign to them all, or one, or even a single character of one.  There&#8217;s also a SET method, similar to what you&#8217;d do with Handle Graphics.</p>
<p>It&#8217;s apparent that the documentation was not sufficiently clear here, I&#8217;ll make a note to have that looked into.  In the mean time, I hope this helps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bassam</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30788</link>
		<dc:creator>Bassam</dc:creator>
		<pubDate>Fri, 13 Nov 2009 19:36:28 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30788</guid>
		<description>I've am trying out dataset 

There doesn't seem to be a way to change individual or subsets of VarNames in a dataset.

You can create a dataset as documented 
&lt;pre&gt;
d(1,2) = dataset({2,'name2'})

d = 
    name1
    1    
&lt;/pre&gt;

but if you try to add to it in an intuitive fashion the VarName specified is not used:
&lt;pre&gt;
d(1,2) = dataset({2,'name2'})
d = 
    name1    Var2
    1        2   
&lt;/pre&gt;

futhermore this doesn't work either
&lt;pre&gt;
d = dataset({1,'name1'},{2,'name2'})
d(1,2)   =dataset('VarNames',{'name3'})
??? Error using ==&#62; setvarnames at 21
NEWNAMES must have one name for each variable in A.

Error in ==&#62; dataset.dataset&#62;dataset.dataset at 274
&lt;/pre&gt;
is there any way to specify a 1 or subset of Varnames.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve am trying out dataset </p>
<p>There doesn&#8217;t seem to be a way to change individual or subsets of VarNames in a dataset.</p>
<p>You can create a dataset as documented </p>
<pre>
d(1,2) = dataset({2,'name2'})

d =
    name1
    1
</pre>
<p>but if you try to add to it in an intuitive fashion the VarName specified is not used:</p>
<pre>
d(1,2) = dataset({2,'name2'})
d =
    name1    Var2
    1        2
</pre>
<p>futhermore this doesn&#8217;t work either</p>
<pre>
d = dataset({1,'name1'},{2,'name2'})
d(1,2)   =dataset('VarNames',{'name3'})
??? Error using ==&gt; setvarnames at 21
NEWNAMES must have one name for each variable in A.

Error in ==&gt; dataset.dataset&gt;dataset.dataset at 274
</pre>
<p>is there any way to specify a 1 or subset of Varnames.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30096</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Mon, 09 Mar 2009 13:59:03 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30096</guid>
		<description>Sung, the Statistics Toolbox has functions such as NANMEAN and NANSTD with "NAN" explicitly in the name, and those all treat NaNs as "missing values" and remove them.  But many other Statistics Toolbox functions also treat NaN as a missing value flag, even if "NAN" is not in the name.</description>
		<content:encoded><![CDATA[<p>Sung, the Statistics Toolbox has functions such as NANMEAN and NANSTD with &#8220;NAN&#8221; explicitly in the name, and those all treat NaNs as &#8220;missing values&#8221; and remove them.  But many other Statistics Toolbox functions also treat NaN as a missing value flag, even if &#8220;NAN&#8221; is not in the name.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sung Soo</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30092</link>
		<dc:creator>Sung Soo</dc:creator>
		<pubDate>Fri, 06 Mar 2009 20:51:36 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30092</guid>
		<description>Honestly, I welcome this 'dataset' feature. BTW when was this introduced? I haven't been aware of it.

At first glance, it really looks like the basic data type in 'R statistics language'. Though R language is (in my personal opinion) not modern at all and unintuitive, it has a great advantage on handling data. It is mainly because of its data structure, which is almost identical to 'dataset' introduced in this blog.

Another advantage of R is its huge user base (most of them are previous users of SAS or SPSS), and their contribution to R with so many tools.

I don't want R's not-modern programming style to creep into MATLAB's statistical toolbox, but I really hope MATLAB can deal with essential features of R language. 'dataset' looks a very good thing to be added. The next thing I want from MATLAB is to provide broad range of essential functions that properly deal 'NaN', which is a missing data. If most functions of statistical toolbox has its 'NaN' version, it will be great help to most statistician.</description>
		<content:encoded><![CDATA[<p>Honestly, I welcome this &#8216;dataset&#8217; feature. BTW when was this introduced? I haven&#8217;t been aware of it.</p>
<p>At first glance, it really looks like the basic data type in &#8216;R statistics language&#8217;. Though R language is (in my personal opinion) not modern at all and unintuitive, it has a great advantage on handling data. It is mainly because of its data structure, which is almost identical to &#8216;dataset&#8217; introduced in this blog.</p>
<p>Another advantage of R is its huge user base (most of them are previous users of SAS or SPSS), and their contribution to R with so many tools.</p>
<p>I don&#8217;t want R&#8217;s not-modern programming style to creep into MATLAB&#8217;s statistical toolbox, but I really hope MATLAB can deal with essential features of R language. &#8216;dataset&#8217; looks a very good thing to be added. The next thing I want from MATLAB is to provide broad range of essential functions that properly deal &#8216;NaN&#8217;, which is a missing data. If most functions of statistical toolbox has its &#8216;NaN&#8217; version, it will be great help to most statistician.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29738</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Thu, 04 Sep 2008 12:45:35 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29738</guid>
		<description>Jasmine, I'm not exactly sure of the situation that you're describing.  There's no reason to preallocate an array in MATLAB _just_ because it's big.  However, it is good practice to preallocate an array if, for example, you are going to fill it in one row at a time in a loop, especially if it will be big.  So I'm guessing you mean something like, "I will be storing rows of data one a time and end up with a large dataset array, so I want to preallocate it."

The way to do that is more or less just as with any other array: create variables using, for example, ZEROS, and create a dataset array from those.  Then overwrite each row as you get the real data.

For categorical variables, it's hard to say what your code should look like without knowing what data you are converting, but it will definitely be advantageous to "pre-define" the levels you care about, using the third input to the NOMINAL/ORDINAL constructor.

Hope this helps.</description>
		<content:encoded><![CDATA[<p>Jasmine, I&#8217;m not exactly sure of the situation that you&#8217;re describing.  There&#8217;s no reason to preallocate an array in MATLAB _just_ because it&#8217;s big.  However, it is good practice to preallocate an array if, for example, you are going to fill it in one row at a time in a loop, especially if it will be big.  So I&#8217;m guessing you mean something like, &#8220;I will be storing rows of data one a time and end up with a large dataset array, so I want to preallocate it.&#8221;</p>
<p>The way to do that is more or less just as with any other array: create variables using, for example, ZEROS, and create a dataset array from those.  Then overwrite each row as you get the real data.</p>
<p>For categorical variables, it&#8217;s hard to say what your code should look like without knowing what data you are converting, but it will definitely be advantageous to &#8220;pre-define&#8221; the levels you care about, using the third input to the NOMINAL/ORDINAL constructor.</p>
<p>Hope this helps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jasmine</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29736</link>
		<dc:creator>jasmine</dc:creator>
		<pubDate>Wed, 03 Sep 2008 20:31:22 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29736</guid>
		<description>Hi Loren,

I am trying to store both numerical and categorical values to a dataset array.  As the data size is big, I wan to initialize the dataset array so that I can speed up the operation.  How can I do that?

Many thanks!</description>
		<content:encoded><![CDATA[<p>Hi Loren,</p>
<p>I am trying to store both numerical and categorical values to a dataset array.  As the data size is big, I wan to initialize the dataset array so that I can speed up the operation.  How can I do that?</p>
<p>Many thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jessee</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29652</link>
		<dc:creator>Jessee</dc:creator>
		<pubDate>Tue, 05 Aug 2008 12:20:25 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29652</guid>
		<description>Peter, I suppose the data I typically use is homogeneous in the sense that the columns are all doubles.  You're right though, the dataset array is better for mixed data types.</description>
		<content:encoded><![CDATA[<p>Peter, I suppose the data I typically use is homogeneous in the sense that the columns are all doubles.  You&#8217;re right though, the dataset array is better for mixed data types.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dimitri Shvorob</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29634</link>
		<dc:creator>Dimitri Shvorob</dc:creator>
		<pubDate>Wed, 30 Jul 2008 16:50:32 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29634</guid>
		<description>Kudos to Loren for highlighting this neat (relatively) new feature of Statistics Toolbox. I hope dataset arrays' functionality will be expanding in forthcoming releases.</description>
		<content:encoded><![CDATA[<p>Kudos to Loren for highlighting this neat (relatively) new feature of Statistics Toolbox. I hope dataset arrays&#8217; functionality will be expanding in forthcoming releases.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29613</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Thu, 24 Jul 2008 15:29:50 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29613</guid>
		<description>Jessee, there is a property that you can use to tag variables with units.  For example,

&#62;&#62; load hospital % sample file in Statistics Toolbox
&#62;&#62; hospital.Properties.Units
ans = 
     ''     ''    'Yrs'    'Lbs'     ''    'mm Hg'    'Counts'
&#62;&#62; hospital.Weight = hospital.Weight/2.2;
&#62;&#62; hospital.Properties.Units{4} = 'kg';

The units also show up in the description of each dataset variable if you use the summary method on a dataset array.  Note that these are just for the purpose of labelling, there is not any provision for conversions or units checking in math or anything like that.

I'm curious about your comment about large datasets.  Certainly if you have data that are homogeneous, you are better off using a matrix.  But if your columns have different types, the dataset array is every bit as efficient as a scalar structure, and a good deal more efficient (memory-wise) than a structure array that has one element for every row of your data.  It has the benefit over a scalar structure that you can easily subscript across dataset variables, which correspond to fields for the scalar atructure solution.  And it allows you to use names for both dataset variables and observations.</description>
		<content:encoded><![CDATA[<p>Jessee, there is a property that you can use to tag variables with units.  For example,</p>
<p>&gt;&gt; load hospital % sample file in Statistics Toolbox<br />
&gt;&gt; hospital.Properties.Units<br />
ans =<br />
     &#8221;     &#8221;    &#8216;Yrs&#8217;    &#8216;Lbs&#8217;     &#8221;    &#8216;mm Hg&#8217;    &#8216;Counts&#8217;<br />
&gt;&gt; hospital.Weight = hospital.Weight/2.2;<br />
&gt;&gt; hospital.Properties.Units{4} = &#8216;kg&#8217;;</p>
<p>The units also show up in the description of each dataset variable if you use the summary method on a dataset array.  Note that these are just for the purpose of labelling, there is not any provision for conversions or units checking in math or anything like that.</p>
<p>I&#8217;m curious about your comment about large datasets.  Certainly if you have data that are homogeneous, you are better off using a matrix.  But if your columns have different types, the dataset array is every bit as efficient as a scalar structure, and a good deal more efficient (memory-wise) than a structure array that has one element for every row of your data.  It has the benefit over a scalar structure that you can easily subscript across dataset variables, which correspond to fields for the scalar atructure solution.  And it allows you to use names for both dataset variables and observations.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jessee</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29612</link>
		<dc:creator>Jessee</dc:creator>
		<pubDate>Thu, 24 Jul 2008 12:55:40 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29612</guid>
		<description>I could potentially see myself using dataset for casually looking at data, but from an application standpoint where you might be processing large data sets I think I'd stick with structures and matrices.

Is there any way to associate units with a column in the data set?</description>
		<content:encoded><![CDATA[<p>I could potentially see myself using dataset for casually looking at data, but from an application standpoint where you might be processing large data sets I think I&#8217;d stick with structures and matrices.</p>
<p>Is there any way to associate units with a column in the data set?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
