<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: More Ways to Find Matching Data</title>
	<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/</link>
	<description>Loren Shure  works on design of the MATLAB language at &#60;a href="http://www.mathworks.com/"&#62;The MathWorks&#60;/a&#62;. She writes here about once a week on MATLAB programming and related topics. &#60;br&#62;&#60;br&#62;&#60;a href="/images/loren-full.jpg"&#62;&#60;img src="/images/loren.jpg"&#62;&#60;/a&#62;</description>
	<pubDate>Mon, 23 Nov 2009 00:39:40 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: dpath2o</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30767</link>
		<dc:creator>dpath2o</dc:creator>
		<pubDate>Wed, 11 Nov 2009 22:06:20 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30767</guid>
		<description>From the above example here is my solution that works well but doesn't allow for near misses -- i.e. times that are close but not exact:

&lt;pre&gt;
x=[5.1 12.23 0.567 12.01 0.555;
      9.5  12.03 0.578 12.11 0.595;
      20   12.06 0.588 12.12 0.596;
      31   12.09  0.591 12.20 0.601]; 
A = x(:,1);
B = 5:5:30;
tmp=ismember(A,B);
X=repmat(nan,length(B),length(A(1,:)));
for l1=1:length(tmp)
X(tmp(l1),:) = A(l1,2:end);
end
&lt;/pre&gt;

Not that pretty and it doesn't allow for near misses, but I can't think of a different approach.

Any other thoughtful clues?

Thanks!

Cheers,
Dan</description>
		<content:encoded><![CDATA[<p>From the above example here is my solution that works well but doesn&#8217;t allow for near misses &#8212; i.e. times that are close but not exact:</p>
<pre>
x=[5.1 12.23 0.567 12.01 0.555;
      9.5  12.03 0.578 12.11 0.595;
      20   12.06 0.588 12.12 0.596;
      31   12.09  0.591 12.20 0.601];
A = x(:,1);
B = 5:5:30;
tmp=ismember(A,B);
X=repmat(nan,length(B),length(A(1,:)));
for l1=1:length(tmp)
X(tmp(l1),:) = A(l1,2:end);
end
</pre>
<p>Not that pretty and it doesn&#8217;t allow for near misses, but I can&#8217;t think of a different approach.</p>
<p>Any other thoughtful clues?</p>
<p>Thanks!</p>
<p>Cheers,<br />
Dan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dpath2o</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30766</link>
		<dc:creator>dpath2o</dc:creator>
		<pubDate>Wed, 11 Nov 2009 21:40:38 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30766</guid>
		<description>Loren,

Thanks for help. Though this approach constructs a length(A)xlength(B(1,:)) matrix and that is not what the job requires.

What I have is a time array of matlab datenums. Sometimes there are gaps in this time array. The gaps are not filled with nan's or zeros -- i.e. the gap is merely missing.

So what I need to construct is a complete array of times and then find then find the times that match from the in-situ data. Using the indexes of the found times I will put the in-situ results back into a large matrix that is filled with nan's so that gaps in time will be represented by NaN.

This is my hurdle.

Cheers,
Dan</description>
		<content:encoded><![CDATA[<p>Loren,</p>
<p>Thanks for help. Though this approach constructs a length(A)xlength(B(1,:)) matrix and that is not what the job requires.</p>
<p>What I have is a time array of matlab datenums. Sometimes there are gaps in this time array. The gaps are not filled with nan&#8217;s or zeros &#8212; i.e. the gap is merely missing.</p>
<p>So what I need to construct is a complete array of times and then find then find the times that match from the in-situ data. Using the indexes of the found times I will put the in-situ results back into a large matrix that is filled with nan&#8217;s so that gaps in time will be represented by NaN.</p>
<p>This is my hurdle.</p>
<p>Cheers,<br />
Dan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loren</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30765</link>
		<dc:creator>Loren</dc:creator>
		<pubDate>Wed, 11 Nov 2009 12:24:53 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30765</guid>
		<description>Dpath2o-

You can try bsxfun with an anonymous function (@(x,y) x-y+2.5) and turn one of the arrays into a column.  You will end up with a (possibly large) matrix with 1s where the corresponding row and column have the relationship you want.  With floating point though, checking for equality is not likely to get you results you really want.  See my little example here.  You probably want to build in some tolerance.

&lt;pre class="code"&gt;
A = 1:3
B = [1.1 3.5 6]
f = @(x,y) (x-y == 2.5)
bsxfun(f,A',B)
ans =
     0     0     0
     0     0     0
     0     0     0
&lt;/pre&gt;


Do you really mean to compare all elements of A with all elements of B?

--Loren</description>
		<content:encoded><![CDATA[<p>Dpath2o-</p>
<p>You can try bsxfun with an anonymous function (@(x,y) x-y+2.5) and turn one of the arrays into a column.  You will end up with a (possibly large) matrix with 1s where the corresponding row and column have the relationship you want.  With floating point though, checking for equality is not likely to get you results you really want.  See my little example here.  You probably want to build in some tolerance.</p>
<pre class="code">
A = 1:3
B = [1.1 3.5 6]
f = @(x,y) (x-y == 2.5)
bsxfun(f,A',B)
ans =
     0     0     0
     0     0     0
     0     0     0
</pre>
<p>Do you really mean to compare all elements of A with all elements of B?</p>
<p>&#8211;Loren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dpath2o</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30763</link>
		<dc:creator>dpath2o</dc:creator>
		<pubDate>Wed, 11 Nov 2009 11:57:05 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30763</guid>
		<description>Using ismember is a novel technique and reading through this discussion has been very informative. I'm wondering if someone here has had to deal with tolerances in there matching? What I mean to say is: 

A = [ 5.1 9.5 20 31];
B = [ 5 10 15 20 25 30];

My of finding of near and exact matches:
i1=[];
i2=[];
for l1=1:numel(B)
for l2=1:numel(A)
if (B(l1)=A(l2)-2.5)
i1 = [i1;l1];
i2 = [i2;l2];
end
end
end

This works fine for small arrays but starts taking LONG time when A and B are considerably large, which they are in my real-world case.

Anyone have a suggestion as to vectorize or speed this up?

Thanks</description>
		<content:encoded><![CDATA[<p>Using ismember is a novel technique and reading through this discussion has been very informative. I&#8217;m wondering if someone here has had to deal with tolerances in there matching? What I mean to say is: </p>
<p>A = [ 5.1 9.5 20 31];<br />
B = [ 5 10 15 20 25 30];</p>
<p>My of finding of near and exact matches:<br />
i1=[];<br />
i2=[];<br />
for l1=1:numel(B)<br />
for l2=1:numel(A)<br />
if (B(l1)=A(l2)-2.5)<br />
i1 = [i1;l1];<br />
i2 = [i2;l2];<br />
end<br />
end<br />
end</p>
<p>This works fine for small arrays but starts taking LONG time when A and B are considerably large, which they are in my real-world case.</p>
<p>Anyone have a suggestion as to vectorize or speed this up?</p>
<p>Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loren</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30747</link>
		<dc:creator>Loren</dc:creator>
		<pubDate>Sun, 08 Nov 2009 03:45:17 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30747</guid>
		<description>Hi Ghazanfar-

Please look at the help for the function hist (and perhaps histc).  They will do what you want for these integer values. 

--Loren</description>
		<content:encoded><![CDATA[<p>Hi Ghazanfar-</p>
<p>Please look at the help for the function hist (and perhaps histc).  They will do what you want for these integer values. </p>
<p>&#8211;Loren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ghazanfar Ali</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30746</link>
		<dc:creator>Ghazanfar Ali</dc:creator>
		<pubDate>Sun, 08 Nov 2009 02:24:05 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30746</guid>
		<description>Hi Miss Loren 
I am in need of an algo to count cosecutive duplicate values in a one dimensional matrix. e.g A=[1 1 1 2 2 3 3 3 3 4 4 4 4 4 5 6 6] should give me another matrix with the exact count of these repeated values e.g B=[3 2 4 5 1 2] . If u kindly send me the code at my email address i.e ghazanfar.ali@live.com   I shall be very obliged. Thanks and Best Regards. Ghazanfar Ali. Pakistan</description>
		<content:encoded><![CDATA[<p>Hi Miss Loren<br />
I am in need of an algo to count cosecutive duplicate values in a one dimensional matrix. e.g A=[1 1 1 2 2 3 3 3 3 4 4 4 4 4 5 6 6] should give me another matrix with the exact count of these repeated values e.g B=[3 2 4 5 1 2] . If u kindly send me the code at my email address i.e <a href="mailto:ghazanfar.ali@live.com">ghazanfar.ali@live.com</a>   I shall be very obliged. Thanks and Best Regards. Ghazanfar Ali. Pakistan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30019</link>
		<dc:creator>Ben</dc:creator>
		<pubDate>Fri, 06 Feb 2009 02:55:01 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30019</guid>
		<description>Users working with larger arrays could consider the following algorithm.  The complexity O(N*log(N)) is about same as ismember if the arrays have similar size, but surprisingly it seems to run up to 2x faster considering it's up against mex code.  If one of the arrays is small then ismember is faster.  I haven't tested it on distributed arrays but it should work if sort does.

&lt;pre&gt; &lt;code&gt;function y=ismember2(a,b)

[as,ia]=sort(a(:));
bs=sort(b(:));
iis=false(size(as));
n=numel(as);
m=numel(bs);
i=1;
j=1;
while (i&#60;=n)&#38;&#38;(j&#60;=m),
  if (as(i)&#62;bs(j)),
    j=j+1;
  else
    iis(i)=(as(i)==bs(j));
    i=i+1;
  end;
end;
y=false(size(a));
y(ia)=iis;
end&lt;/code&gt; &lt;/pre&gt;

Timings:

&lt;code&gt;&#62;&#62; n=2e7;a=ceil(n*rand(n,1));b=ceil(n*rand(n,1));
&#62;&#62; tic;ismember(a,b);toc;
Elapsed time is 39.229270 seconds.
&#62;&#62; tic;ismember2(a,b);toc;
Elapsed time is 21.876701 seconds.
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>Users working with larger arrays could consider the following algorithm.  The complexity O(N*log(N)) is about same as ismember if the arrays have similar size, but surprisingly it seems to run up to 2x faster considering it&#8217;s up against mex code.  If one of the arrays is small then ismember is faster.  I haven&#8217;t tested it on distributed arrays but it should work if sort does.</p>
<pre> <code>function y=ismember2(a,b)

[as,ia]=sort(a(:));
bs=sort(b(:));
iis=false(size(as));
n=numel(as);
m=numel(bs);
i=1;
j=1;
while (i&lt;=n)&amp;&amp;(j&lt;=m),
  if (as(i)&gt;bs(j)),
    j=j+1;
  else
    iis(i)=(as(i)==bs(j));
    i=i+1;
  end;
end;
y=false(size(a));
y(ia)=iis;
end</code> </pre>
<p>Timings:</p>
<p><code>&gt;&gt; n=2e7;a=ceil(n*rand(n,1));b=ceil(n*rand(n,1));<br />
&gt;&gt; tic;ismember(a,b);toc;<br />
Elapsed time is 39.229270 seconds.<br />
&gt;&gt; tic;ismember2(a,b);toc;<br />
Elapsed time is 21.876701 seconds.<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vijay Pappu</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30016</link>
		<dc:creator>Vijay Pappu</dc:creator>
		<pubDate>Wed, 04 Feb 2009 22:26:46 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30016</guid>
		<description>I just happened to see this post today and thought of giving it a try:
My solution is:

C = zeros(numel(A),1);
for i = 1:numel(A)
          C(i) = (any(B == A(i)));
end
C = reshape(C,size(A));

I am sure there are more elegant solutions for this but this is one of the ways of doing it.</description>
		<content:encoded><![CDATA[<p>I just happened to see this post today and thought of giving it a try:<br />
My solution is:</p>
<p>C = zeros(numel(A),1);<br />
for i = 1:numel(A)<br />
          C(i) = (any(B == A(i)));<br />
end<br />
C = reshape(C,size(A));</p>
<p>I am sure there are more elegant solutions for this but this is one of the ways of doing it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris S</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30015</link>
		<dc:creator>Chris S</dc:creator>
		<pubDate>Tue, 03 Feb 2009 17:17:24 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30015</guid>
		<description>Loren, 

Good question about the single letter thing. I am quickly discovering that if the strings of S and C were more than a single letter, the problem becomes a bit more difficult.  

Chris</description>
		<content:encoded><![CDATA[<p>Loren, </p>
<p>Good question about the single letter thing. I am quickly discovering that if the strings of S and C were more than a single letter, the problem becomes a bit more difficult.  </p>
<p>Chris</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: matt fig</title>
		<link>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30008</link>
		<dc:creator>matt fig</dc:creator>
		<pubDate>Mon, 02 Feb 2009 17:54:12 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2009/01/20/more-ways-to-find-matching-data/#comment-30008</guid>
		<description>Loren,
That is the approach I took.  I didn't post to the blog because I was hoping Chris would take the question to the newsgroup.  One problem is that it is not entirely clear what to do if more than one match is found.  That is why I have a try catch in my code, which should be replaced once the appropriate action is clear.

&lt;pre&gt; &lt;code&gt;

[r,c] = size(C);
Sc = char(S)';
Cc = reshape(char(C),r,c);
nm2kp = 2;  % The number to keep after the match.
C(r + 1:r + nm2kp,:) = {[]}; % Re-use C to make X.

for ii = 1:c
    try
    idx = strfind(Sc,Cc(:,ii)');
    % This next line will error if more than one match.
    C(r+1 : r+nm2kp,ii) = S(idx+r : idx+r+1);
    catch
    end
end

&lt;/code&gt; &lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>Loren,<br />
That is the approach I took.  I didn&#8217;t post to the blog because I was hoping Chris would take the question to the newsgroup.  One problem is that it is not entirely clear what to do if more than one match is found.  That is why I have a try catch in my code, which should be replaced once the appropriate action is clear.</p>
<pre> <code>

[r,c] = size(C);
Sc = char(S)';
Cc = reshape(char(C),r,c);
nm2kp = 2;  % The number to keep after the match.
C(r + 1:r + nm2kp,:) = {[]}; % Re-use C to make X.

for ii = 1:c
    try
    idx = strfind(Sc,Cc(:,ii)');
    % This next line will error if more than one match.
    C(r+1 : r+nm2kp,ii) = S(idx+r : idx+r+1);
    catch
    end
end

</code> </pre>
]]></content:encoded>
	</item>
</channel>
</rss>
