<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Using MATLAB  to Grade</title>
	<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/</link>
	<description>Loren Shure  works on design of the MATLAB language at &#60;a href="http://www.mathworks.com/"&#62;The MathWorks&#60;/a&#62;. She writes here about once a week on MATLAB programming and related topics. &#60;br&#62;&#60;br&#62;&#60;a href="/images/loren-full.jpg"&#62;&#60;img src="/images/loren.jpg"&#62;&#60;/a&#62;</description>
	<pubDate>Sun, 08 Nov 2009 04:08:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30096</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Mon, 09 Mar 2009 13:59:03 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30096</guid>
		<description>Sung, the Statistics Toolbox has functions such as NANMEAN and NANSTD with "NAN" explicitly in the name, and those all treat NaNs as "missing values" and remove them.  But many other Statistics Toolbox functions also treat NaN as a missing value flag, even if "NAN" is not in the name.</description>
		<content:encoded><![CDATA[<p>Sung, the Statistics Toolbox has functions such as NANMEAN and NANSTD with &#8220;NAN&#8221; explicitly in the name, and those all treat NaNs as &#8220;missing values&#8221; and remove them.  But many other Statistics Toolbox functions also treat NaN as a missing value flag, even if &#8220;NAN&#8221; is not in the name.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sung Soo</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30092</link>
		<dc:creator>Sung Soo</dc:creator>
		<pubDate>Fri, 06 Mar 2009 20:51:36 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-30092</guid>
		<description>Honestly, I welcome this 'dataset' feature. BTW when was this introduced? I haven't been aware of it.

At first glance, it really looks like the basic data type in 'R statistics language'. Though R language is (in my personal opinion) not modern at all and unintuitive, it has a great advantage on handling data. It is mainly because of its data structure, which is almost identical to 'dataset' introduced in this blog.

Another advantage of R is its huge user base (most of them are previous users of SAS or SPSS), and their contribution to R with so many tools.

I don't want R's not-modern programming style to creep into MATLAB's statistical toolbox, but I really hope MATLAB can deal with essential features of R language. 'dataset' looks a very good thing to be added. The next thing I want from MATLAB is to provide broad range of essential functions that properly deal 'NaN', which is a missing data. If most functions of statistical toolbox has its 'NaN' version, it will be great help to most statistician.</description>
		<content:encoded><![CDATA[<p>Honestly, I welcome this &#8216;dataset&#8217; feature. BTW when was this introduced? I haven&#8217;t been aware of it.</p>
<p>At first glance, it really looks like the basic data type in &#8216;R statistics language&#8217;. Though R language is (in my personal opinion) not modern at all and unintuitive, it has a great advantage on handling data. It is mainly because of its data structure, which is almost identical to &#8216;dataset&#8217; introduced in this blog.</p>
<p>Another advantage of R is its huge user base (most of them are previous users of SAS or SPSS), and their contribution to R with so many tools.</p>
<p>I don&#8217;t want R&#8217;s not-modern programming style to creep into MATLAB&#8217;s statistical toolbox, but I really hope MATLAB can deal with essential features of R language. &#8216;dataset&#8217; looks a very good thing to be added. The next thing I want from MATLAB is to provide broad range of essential functions that properly deal &#8216;NaN&#8217;, which is a missing data. If most functions of statistical toolbox has its &#8216;NaN&#8217; version, it will be great help to most statistician.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29738</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Thu, 04 Sep 2008 12:45:35 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29738</guid>
		<description>Jasmine, I'm not exactly sure of the situation that you're describing.  There's no reason to preallocate an array in MATLAB _just_ because it's big.  However, it is good practice to preallocate an array if, for example, you are going to fill it in one row at a time in a loop, especially if it will be big.  So I'm guessing you mean something like, "I will be storing rows of data one a time and end up with a large dataset array, so I want to preallocate it."

The way to do that is more or less just as with any other array: create variables using, for example, ZEROS, and create a dataset array from those.  Then overwrite each row as you get the real data.

For categorical variables, it's hard to say what your code should look like without knowing what data you are converting, but it will definitely be advantageous to "pre-define" the levels you care about, using the third input to the NOMINAL/ORDINAL constructor.

Hope this helps.</description>
		<content:encoded><![CDATA[<p>Jasmine, I&#8217;m not exactly sure of the situation that you&#8217;re describing.  There&#8217;s no reason to preallocate an array in MATLAB _just_ because it&#8217;s big.  However, it is good practice to preallocate an array if, for example, you are going to fill it in one row at a time in a loop, especially if it will be big.  So I&#8217;m guessing you mean something like, &#8220;I will be storing rows of data one a time and end up with a large dataset array, so I want to preallocate it.&#8221;</p>
<p>The way to do that is more or less just as with any other array: create variables using, for example, ZEROS, and create a dataset array from those.  Then overwrite each row as you get the real data.</p>
<p>For categorical variables, it&#8217;s hard to say what your code should look like without knowing what data you are converting, but it will definitely be advantageous to &#8220;pre-define&#8221; the levels you care about, using the third input to the NOMINAL/ORDINAL constructor.</p>
<p>Hope this helps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jasmine</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29736</link>
		<dc:creator>jasmine</dc:creator>
		<pubDate>Wed, 03 Sep 2008 20:31:22 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29736</guid>
		<description>Hi Loren,

I am trying to store both numerical and categorical values to a dataset array.  As the data size is big, I wan to initialize the dataset array so that I can speed up the operation.  How can I do that?

Many thanks!</description>
		<content:encoded><![CDATA[<p>Hi Loren,</p>
<p>I am trying to store both numerical and categorical values to a dataset array.  As the data size is big, I wan to initialize the dataset array so that I can speed up the operation.  How can I do that?</p>
<p>Many thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jessee</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29652</link>
		<dc:creator>Jessee</dc:creator>
		<pubDate>Tue, 05 Aug 2008 12:20:25 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29652</guid>
		<description>Peter, I suppose the data I typically use is homogeneous in the sense that the columns are all doubles.  You're right though, the dataset array is better for mixed data types.</description>
		<content:encoded><![CDATA[<p>Peter, I suppose the data I typically use is homogeneous in the sense that the columns are all doubles.  You&#8217;re right though, the dataset array is better for mixed data types.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dimitri Shvorob</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29634</link>
		<dc:creator>Dimitri Shvorob</dc:creator>
		<pubDate>Wed, 30 Jul 2008 16:50:32 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29634</guid>
		<description>Kudos to Loren for highlighting this neat (relatively) new feature of Statistics Toolbox. I hope dataset arrays' functionality will be expanding in forthcoming releases.</description>
		<content:encoded><![CDATA[<p>Kudos to Loren for highlighting this neat (relatively) new feature of Statistics Toolbox. I hope dataset arrays&#8217; functionality will be expanding in forthcoming releases.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Perkins</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29613</link>
		<dc:creator>Peter Perkins</dc:creator>
		<pubDate>Thu, 24 Jul 2008 15:29:50 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29613</guid>
		<description>Jessee, there is a property that you can use to tag variables with units.  For example,

&#62;&#62; load hospital % sample file in Statistics Toolbox
&#62;&#62; hospital.Properties.Units
ans = 
     ''     ''    'Yrs'    'Lbs'     ''    'mm Hg'    'Counts'
&#62;&#62; hospital.Weight = hospital.Weight/2.2;
&#62;&#62; hospital.Properties.Units{4} = 'kg';

The units also show up in the description of each dataset variable if you use the summary method on a dataset array.  Note that these are just for the purpose of labelling, there is not any provision for conversions or units checking in math or anything like that.

I'm curious about your comment about large datasets.  Certainly if you have data that are homogeneous, you are better off using a matrix.  But if your columns have different types, the dataset array is every bit as efficient as a scalar structure, and a good deal more efficient (memory-wise) than a structure array that has one element for every row of your data.  It has the benefit over a scalar structure that you can easily subscript across dataset variables, which correspond to fields for the scalar atructure solution.  And it allows you to use names for both dataset variables and observations.</description>
		<content:encoded><![CDATA[<p>Jessee, there is a property that you can use to tag variables with units.  For example,</p>
<p>&gt;&gt; load hospital % sample file in Statistics Toolbox<br />
&gt;&gt; hospital.Properties.Units<br />
ans =<br />
     &#8221;     &#8221;    &#8216;Yrs&#8217;    &#8216;Lbs&#8217;     &#8221;    &#8216;mm Hg&#8217;    &#8216;Counts&#8217;<br />
&gt;&gt; hospital.Weight = hospital.Weight/2.2;<br />
&gt;&gt; hospital.Properties.Units{4} = &#8216;kg&#8217;;</p>
<p>The units also show up in the description of each dataset variable if you use the summary method on a dataset array.  Note that these are just for the purpose of labelling, there is not any provision for conversions or units checking in math or anything like that.</p>
<p>I&#8217;m curious about your comment about large datasets.  Certainly if you have data that are homogeneous, you are better off using a matrix.  But if your columns have different types, the dataset array is every bit as efficient as a scalar structure, and a good deal more efficient (memory-wise) than a structure array that has one element for every row of your data.  It has the benefit over a scalar structure that you can easily subscript across dataset variables, which correspond to fields for the scalar atructure solution.  And it allows you to use names for both dataset variables and observations.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jessee</title>
		<link>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29612</link>
		<dc:creator>Jessee</dc:creator>
		<pubDate>Thu, 24 Jul 2008 12:55:40 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/07/23/using-matlab-to-grade/#comment-29612</guid>
		<description>I could potentially see myself using dataset for casually looking at data, but from an application standpoint where you might be processing large data sets I think I'd stick with structures and matrices.

Is there any way to associate units with a column in the data set?</description>
		<content:encoded><![CDATA[<p>I could potentially see myself using dataset for casually looking at data, but from an application standpoint where you might be processing large data sets I think I&#8217;d stick with structures and matrices.</p>
<p>Is there any way to associate units with a column in the data set?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
