<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Analyzing Addresses Using Different Data Structures</title>
	<link>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/</link>
	<description>Loren Shure  works on design of the MATLAB language at &#60;a href="http://www.mathworks.com/"&#62;The MathWorks&#60;/a&#62;. She writes here about once a week on MATLAB programming and related topics. &#60;br&#62;&#60;br&#62;&#60;a href="/images/loren-full.jpg"&#62;&#60;img src="/images/loren.jpg"&#62;&#60;/a&#62;</description>
	<pubDate>Mon, 23 Nov 2009 00:10:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: OysterEngineer</title>
		<link>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-30721</link>
		<dc:creator>OysterEngineer</dc:creator>
		<pubDate>Tue, 03 Nov 2009 17:11:41 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-30721</guid>
		<description>I love this type of post since each solution shows something about the background of the developer.  Kind of like, "If the only tool you have is a hammer, every problem looks like a nail."

Still, Loren's solution is the easiest to follow.

Although I can follow a lot of Seth's solution, I am stumped by the regexp call &#38; the guts of the for loop.

I understand the power of regular expressions &#38; know that they are complex enough that several books have been published on just them.  But, I've always had trouble understanding the syntax &#38; I've never found a complete, comprehensive FRP for just the arguments for regexp.

Looking at Seth's command, &#38; the MatLab FRP for regexp, he is clearly using the 3rd syntax option.  I understand the s is the cell array from above &#38; under Remarks in the FRP, it does explain why s can be a cell array.

Further, I see that the expression is the '@[\w.]+' string.  But, I can't find any documentation that allows the use of the square brackets in this syntax.  In reading the FRP of regexp, it seems to me that '@' all by itself should work.

Next, I see that 'match' is an allowable qualifier that forces the returned result to be the matching string.  But, that isn't what he wants here.  He wants to return the index of the '@', not the '@' itself.  And, frankly, since you know you are searching for the '@' character, you could populate the cell array sl with the '@' characters without needing to use the regexp function.

Finally, I see how the how 'once' is an allowable option.

This illustrates why MatLab either needs a more user friendly pattern matching function or it needs a much better written FRP for regexp.

I understand that given the history of regular expressions, it clearly is important for MatLab to include it.  But, since the documentation out there in the world of Unix, C and C similar languages for regular expressions is generally inadequate for a user to master it, MatLab must not just base its documentation on these flawed sources &#38; must write its own comprehensive set.</description>
		<content:encoded><![CDATA[<p>I love this type of post since each solution shows something about the background of the developer.  Kind of like, &#8220;If the only tool you have is a hammer, every problem looks like a nail.&#8221;</p>
<p>Still, Loren&#8217;s solution is the easiest to follow.</p>
<p>Although I can follow a lot of Seth&#8217;s solution, I am stumped by the regexp call &amp; the guts of the for loop.</p>
<p>I understand the power of regular expressions &amp; know that they are complex enough that several books have been published on just them.  But, I&#8217;ve always had trouble understanding the syntax &amp; I&#8217;ve never found a complete, comprehensive FRP for just the arguments for regexp.</p>
<p>Looking at Seth&#8217;s command, &amp; the MatLab FRP for regexp, he is clearly using the 3rd syntax option.  I understand the s is the cell array from above &amp; under Remarks in the FRP, it does explain why s can be a cell array.</p>
<p>Further, I see that the expression is the &#8216;@[\w.]+&#8217; string.  But, I can&#8217;t find any documentation that allows the use of the square brackets in this syntax.  In reading the FRP of regexp, it seems to me that &#8216;@&#8217; all by itself should work.</p>
<p>Next, I see that &#8216;match&#8217; is an allowable qualifier that forces the returned result to be the matching string.  But, that isn&#8217;t what he wants here.  He wants to return the index of the &#8216;@&#8217;, not the &#8216;@&#8217; itself.  And, frankly, since you know you are searching for the &#8216;@&#8217; character, you could populate the cell array sl with the &#8216;@&#8217; characters without needing to use the regexp function.</p>
<p>Finally, I see how the how &#8216;once&#8217; is an allowable option.</p>
<p>This illustrates why MatLab either needs a more user friendly pattern matching function or it needs a much better written FRP for regexp.</p>
<p>I understand that given the history of regular expressions, it clearly is important for MatLab to include it.  But, since the documentation out there in the world of Unix, C and C similar languages for regular expressions is generally inadequate for a user to master it, MatLab must not just base its documentation on these flawed sources &amp; must write its own comprehensive set.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Mullhaupt</title>
		<link>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-30185</link>
		<dc:creator>Andrew Mullhaupt</dc:creator>
		<pubDate>Sat, 04 Apr 2009 01:21:12 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-30185</guid>
		<description>I need something more like an associative array, since my keys are integer vectors.

Cell arrays could in principle, be used, except they appear to use way too much memory. (Mine would be very sparse).</description>
		<content:encoded><![CDATA[<p>I need something more like an associative array, since my keys are integer vectors.</p>
<p>Cell arrays could in principle, be used, except they appear to use way too much memory. (Mine would be very sparse).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seth Popinchalk</title>
		<link>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-29889</link>
		<dc:creator>Seth Popinchalk</dc:creator>
		<pubDate>Sun, 23 Nov 2008 21:11:11 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-29889</guid>
		<description>Dave - great analysis.  Interesting - the length of my loop changes with the number of unique domains.  I like the vectorized methods because of the use of tricks like additional outputs from UNIQUE or ACCUMARRAY.</description>
		<content:encoded><![CDATA[<p>Dave - great analysis.  Interesting - the length of my loop changes with the number of unique domains.  I like the vectorized methods because of the use of tricks like additional outputs from UNIQUE or ACCUMARRAY.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave Tarkowski</title>
		<link>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-29879</link>
		<dc:creator>Dave Tarkowski</dc:creator>
		<pubDate>Thu, 20 Nov 2008 16:11:28 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-29879</guid>
		<description>I was interested to see the relative timing for these methods.  I wrote up a little script to generate random e-mail address in MATLAB.  Here are the results for 10,000 e-mail addresses with 100 unique domains:

&#62;&#62; doCompare(10000)

ans = 

     seth: 1.5446
     dave: 2.2605
    steve: 0.4783
    loren: 0.1521

What I found interesting is how the timing changes if you change the number of unique domains.  Here are the results with 10,000 e-mail addresses, but only 10 unique domains:

&#62;&#62; doCompare(10000)

ans = 

     seth: 0.1973
     dave: 2.2455
    steve: 0.4725
    loren: 0.1494

Of course, running time, although relatively easy to measure, is only one aspect to consider.  One big difference between these algorithms is their memory usage.</description>
		<content:encoded><![CDATA[<p>I was interested to see the relative timing for these methods.  I wrote up a little script to generate random e-mail address in MATLAB.  Here are the results for 10,000 e-mail addresses with 100 unique domains:</p>
<p>&gt;&gt; doCompare(10000)</p>
<p>ans = </p>
<p>     seth: 1.5446<br />
     dave: 2.2605<br />
    steve: 0.4783<br />
    loren: 0.1521</p>
<p>What I found interesting is how the timing changes if you change the number of unique domains.  Here are the results with 10,000 e-mail addresses, but only 10 unique domains:</p>
<p>&gt;&gt; doCompare(10000)</p>
<p>ans = </p>
<p>     seth: 0.1973<br />
     dave: 2.2455<br />
    steve: 0.4725<br />
    loren: 0.1494</p>
<p>Of course, running time, although relatively easy to measure, is only one aspect to consider.  One big difference between these algorithms is their memory usage.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marc</title>
		<link>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-29875</link>
		<dc:creator>Marc</dc:creator>
		<pubDate>Wed, 19 Nov 2008 22:27:17 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/11/19/analyzing-addresses-using-different-data-structures/#comment-29875</guid>
		<description>Been wanting a container/map/hashtable to be native in ML for a while.  Thanks to the devs.</description>
		<content:encoded><![CDATA[<p>Been wanting a container/map/hashtable to be native in ML for a while.  Thanks to the devs.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
