Here is a new MATLAB Puzzler.
Given two vectors representing the X and Y coordinates of some points, find the two closest points in Cartesian space.
n = 10;
x = rand(1,n);
y = rand(1,n);

Brett and I took very different paths to solving this problem, he knows more about the statistics toolbox than I do and found a much shorter solution because there is a function that basically does this. As a benchmark. sticking with base MATLAB, I solved this in seven lines of code.
Post your solutions in the comments area along with how long it took you to solve.

You can follow any comments to this entry through the RSS 2.0 feed.
You can skip to the end and leave a comment. Pinging is currently not allowed.

25 CommentsOldest to Newest

n = 10;
x = rand(1,n);
y = rand(1,n);
min_dist = Inf;
min_p1 = 0;
min_p2 = 0;
for p1 = 1:n
for p2 = 2:n
d = (x(p1)-x(p2))^2+(y(p1)-y(p2))^2;
if d < min_dist && p1 ~= p2
min_p1 = p1;
min_p2 = p2;
min_dist = d;
end
end
end
plot(x,y,'b.',[x(min_p1),x(min_p2)],[y(min_p1),y(min_p2)],'ro');

I like your solution for putting more of Matlab’s unique features to use. For large N, I have a suspicion that the matrix based approach will become a bit of a memory issue, but I it’ll be much faster for smaller N, taking advantage of the fast matrix/vector operations in Matlab.

The code above is the trivial solution of forming all combinations of points in sequence, measuring the distance and keeping track of the minimum (eliminating the case where both chosen points are the same point).

Chris,

Excellent solution. Yes, your solution will scale much better than mine. There are trade-offs with such decisions as you pointed out.

In your solution, you initiate min_p1, and min_p2 to zero. It is not needed to initiate these, but it is needed for min_dist. This does not hurt anything, just an observation.

Thanks for playing!
Doug

%% Given code
n = 10;
x = rand(1,n);
y = rand(1,n);
%% Rob's solution
pairs = nchoosek(1:n,2);
dist = sqrt((diff(x(pairs),1,2)).^2 + (diff(y(pairs),1,2)).^2);
[B,IX] = sort(dist);
scatter(x,y);
hold on;
plot([x(pairs(IX(1),:))],[y(pairs(IX(1),:))],'-+r');

Doug,

This took about 15 mins. I first thought of trying bwdist, but couldn’t make it go. Love these problems!

Rob

Rob,

Also an excellent solution that uses different functions. There are many ways to solve these problems.

I guess not as many people know about HYPOT as I thought, since I see both of you calculated the distances by hand. I see that Chris actually skipped the SQRT part of the calculation since he was only looking for the greatest magnitude and was saving a few ‘computrons’ here and there.

Glad you enjoy the challenges, I plan to do them from time to time. If you have a good challenge, send it to me, and I will post it.

Doug

I was a little excessive earlier with generating pairs, here’s a better version of my earlier code (note loop indices)

min_dist = Inf;
for p1 = 1:(n-1)
for p2 = (p1+1):n
d = (x(p1)-x(p2))^2+(y(p1)-y(p2))^2;
if d < min_dist
min_p1 = p1;
min_p2 = p2;
min_dist = d;
end
end
end

@ Rob
Good thinking to use the built in combination generator.

@Doug
Keep putting up these problems, they’re fun and short enough that I can still get some real work done.

Doug,
Ah yes, HYPOT. Simplifies things nicely.

Cris also taught me something I have forgotten about PLOT, so I could have gotten down to 4 lines of code. Dangit!

BTW Doug, saw the interview you granted the guys over at http://www.blinkdagger.com – very cool of you to do that. I like the ‘behind the scenes’ type questions once in a while.

Cheers,
Rob

Doug, using the PDIST function from the Statistics Toolbox, I came up with this one-liner (okay, technically, two lines combined):
dist = squareform(pdist([x',y']));

Oops…forgot the mindist step:
[ii,jj] = find(dist == min(dist(dist~=0)));

(or DISTS for the second case.)

Brett again

Hi.
I notice all fo you use the following construction:

[ix,iy] = find(dist==min(dist));

This is an operation of O(2n), first find the min(O(n)) and then find the index if the min (O(n)). For large N, the following might be faster (O(n)), although of course it requires 2 lines due to matlab syntax.

Five lines. Five minutes. After I wrote it, I looked at Doug’s solution, and I think I took the long way around with my use of the meshgrid function (generating indices rather than x,y matrices). Also, mine would fail if two of the x,y points were identical…

% Find closest pair of points
% starter code
n=10;
x=rand(n,1);
y=rand(n,1);
% go to complex vector representation
z = x + i*y;
% find distances between all pairs
a = kron(z, ones(1, length(z)));
b = kron(z.', ones(size(z)));
zdist = abs(a-b);
% only need to look in lower triangle for symmetric pair-wise dists
mindist = min(zdist(tril(zdist)~=0));
%row and col; will contain indexes for the pair, but only need to keep one
[row, col] = find(zdist==mindist);
%plot
figure(1);
scatter(real(z), imag(z), '.b');
hold on;
scatter(real(z(row)), imag(z(row)), 'or');
hold off;

% I always forget to use the meshgrid function unless I’m doing something 3-d, but some people might want to know about kron products anyway.
% Tril (and triu) can be convenient filters.
% And for 2-d, complex variables work ok, but going into 3+ dimensions, then I imagine HYPOT or it’s higher-dim cousins would keep working.

n = 10;
x = rand(1,n);
y = rand(1,n);
% Find the closest points
[xx,yy]=meshgrid(x,y); %(1)
zz = triu((xx-xx.').^2 + (yy-yy.').^2); %(2)
zz(zz==0)=inf; %(3)
[d,i]=min(zz(:)); %(4) should use ind2sub, but requires 2 outputs
disp(['The closes points are: ' num2str([mod(i,10)+(mod(i,10)==0), ceil(i/10)])]) %(5)
%% Plotting code
ij = [mod(i,10)+(mod(i,10)==0), ceil(i/10)];
plot(x,y,'*b', x(ij),y(ij),'or-')
text(x(ij(1)),y(ij(1))-.05,num2str(d))
axis equal

I think it takes a full 5 lines to get the solution. I took about 15 minutes and had to add the graphing code to make sure I was getting it right. I definitely wanted a shortcut syntax to get the index of the minimum value without having to get both outputs...

two comments
1) setting the diagonal elements directly – more efficient memory-wise. for readability may use
sub2ind([n,n],1:n,1:n)
instead of
1+(0:(n-1))*(n+1)
2) by using the min function with two outputs, get a unique answer (instead of at least two when using find)

Given:

n=10;
x=rand(n,1);
y=rand(n,1);

This version uses complex numbers to simplify the expressions. I think it’s very readable, too (except for the odd “nan” thingy which tells “min” to ignore the diagonal). I came up with it in about 5 minutes, without reading any of the above first:

[X,Y] = meshgrid (x + 1i*y) ;
D = abs (X-Y) + diag (nan (n,1)) ;
[d k] = min (D (:)) ;
i = mod (k-1,n) + 1 ;
j = fix ((k-1)/n) + 1 ;

p.s. when you all post code snippets, you should use the “pre” and “code” format. It makes the code easier to read. Maybe Doug could edit the above posts and add the code and pre’s.

<pre><code>

code goes here.

</pre></code>

[edit by Doug to have both <pre> and <code> tags in suggestion. Also edited old comment to do this.

Now, the real challenge … can anyone solve it in O(n log n) time using recursive divide-and-conquer? How’s that for a new Puzzler?

dist=sqrt(bsxfun(@minus,x,x').^2+bsxfun(@minus,y,y').^2);
dist(dist==0)=10; % Cancel the main diag
[i,j]=find(dist==min(dist(:)))
plot(x,y,'.',x(i),y(i),'o')

I really like the topic.

I haven’t used meshgrid or surf before so I’m learning something new. I must be missing something since it seems like meshgrid was written to address a limitation of the syntax option of surf(X, Y, Z) when X & Y are arrays. I’m having trouble thinking of why anyone would choose to make X & Y arrays instead of vectors.

The video worked very well to talk me thru the example and you did a very fine job of recording & editing a very understandable lesson. The only negative is that after I finished the video, I went to read the FRPs for meshgrid & surf, but I didn’t have your code to reference since the video had ended & reset.

@Dan: You might want to increase the ‘dummy value’ for dist to INF since in some cases 10 might be the smallest (not in this case, but if the code were used elsewhere with different inputs.

@Oliver: As a best practice, I will post my solution as the first comment so it is accessible.

Doug

If we consider the simplest solution, using pdist might be the solution:

n = 10;
x = rand(1,n);
y = rand(1,n);
allDist = pdist([x;y]');
[minDist,minLoc] = min(allDist);
[loc1,loc2] = find(tril(ones(n),-1));
minLoc1 = loc1(minLoc);
minLoc2 = loc2(minLoc);

Alternatively, we can use bsxfun to build up the distance map.

z = x + i*y;
distMap = tril(abs(bsxfun(@minus,z,z.')));
distMap(~distMap) = Inf;
[minCol,rowLoc] = min(distMap);
[minDist,colLoc] = min(minCol);
rowLoc = rowLoc(colLoc);

Nice solution, but it call pdist twice unnecessarily. If you wish to reduce the length of code, the following is shorter but also more readable:

d = pdist([x;y]');
[i,j] = find(squareform(d)==min(d),1,'first');

The length of code can be checked as follows:

t = mtree('[i,j]=find(squareform(pdist([x; y]''))==min(pdist([x; y]'')),1,''first'')')

which gives 33 nodes.

t = mtree('d=pdist([x; y]'');[i,j]=find(squareform(d)==min(d),1,''first'')')

only 28 nodes.

hth
Yi

Yi,

You bring up an interesting point about metrics on code. If people were to specify metrics to judge code on, how would they be defined?

* Binary (works/does not work)
* Length (MATLAB golf)
* Node count
* Community “thumbs up/down” voting (not automatable)

Are there automatable metrics that could be used to give an idea of what code is best? Personally, I think Golf is an interesting challenge, but I prefer readability by making variable with descriptive names on separate lines:

flagIsBig = (variable > 1000);
if flagIsBig
%Do big code
else %not flagIsBig
%Do not big code
end

This takes more characters, but is very readable. I think only community voting can clear this up. What do you think?

-Doug

Doug,

Personally, I like the node count because it encourages to use meaningful variable names. A variable with a name ‘d’ has the same node count as using name ‘distance’, but latter clearly is more readable.

## 25 CommentsOldest to Newest

I like your solution for putting more of Matlab’s unique features to use. For large N, I have a suspicion that the matrix based approach will become a bit of a memory issue, but I it’ll be much faster for smaller N, taking advantage of the fast matrix/vector operations in Matlab.

The code above is the trivial solution of forming all combinations of points in sequence, measuring the distance and keeping track of the minimum (eliminating the case where both chosen points are the same point).

Chris,

Excellent solution. Yes, your solution will scale much better than mine. There are trade-offs with such decisions as you pointed out.

In your solution, you initiate min_p1, and min_p2 to zero. It is not needed to initiate these, but it is needed for min_dist. This does not hurt anything, just an observation.

Thanks for playing!

Doug

Doug,

This took about 15 mins. I first thought of trying bwdist, but couldn’t make it go. Love these problems!

Rob

Rob,

Also an excellent solution that uses different functions. There are many ways to solve these problems.

I guess not as many people know about HYPOT as I thought, since I see both of you calculated the distances by hand. I see that Chris actually skipped the SQRT part of the calculation since he was only looking for the greatest magnitude and was saving a few ‘computrons’ here and there.

Glad you enjoy the challenges, I plan to do them from time to time. If you have a good challenge, send it to me, and I will post it.

Doug

I was a little excessive earlier with generating pairs, here’s a better version of my earlier code (note loop indices)

@ Rob

Good thinking to use the built in combination generator.

@Doug

Keep putting up these problems, they’re fun and short enough that I can still get some real work done.

Doug,

Ah yes, HYPOT. Simplifies things nicely.

Cris also taught me something I have forgotten about PLOT, so I could have gotten down to 4 lines of code. Dangit!

BTW Doug, saw the interview you granted the guys over at http://www.blinkdagger.com – very cool of you to do that. I like the ‘behind the scenes’ type questions once in a while.

Cheers,

Rob

Doug, using the PDIST function from the Statistics Toolbox, I came up with this one-liner (okay, technically, two lines combined):

dist = squareform(pdist([x',y']));

WITHOUT the toolbox, check this out:

`[xs,ys] = meshgrid(x,y);`

dists = sqrt((xs-xs').^2 + (ys-ys').^2);

Cheers!

Brett

Oops…forgot the mindist step:

[ii,jj] = find(dist == min(dist(dist~=0)));

(or DISTS for the second case.)

Brett again

Hi.

I notice all fo you use the following construction:

`[ix,iy] = find(dist==min(dist));`

This is an operation of O(2n), first find the min(O(n)) and then find the index if the min (O(n)). For large N, the following might be faster (O(n)), although of course it requires 2 lines due to matlab syntax.

`[minVal, minIndex] = min(dist);`

[ix,iy] = ind2sub(size(dist),minIndex);

–DA

Not better just different:

Then I looked at other solutions. Very interesting ideas. I learned a few things this morning. :)

Cheers

Bob

Five lines. Five minutes. After I wrote it, I looked at Doug’s solution, and I think I took the long way around with my use of the meshgrid function (generating indices rather than x,y matrices). Also, mine would fail if two of the x,y points were identical…

% I always forget to use the meshgrid function unless I’m doing something 3-d, but some people might want to know about kron products anyway.

% Tril (and triu) can be convenient filters.

% And for 2-d, complex variables work ok, but going into 3+ dimensions, then I imagine HYPOT or it’s higher-dim cousins would keep working.

my suggestion:

two comments

1) setting the diagonal elements directly – more efficient memory-wise. for readability may use

sub2ind([n,n],1:n,1:n)

instead of

1+(0:(n-1))*(n+1)

2) by using the min function with two outputs, get a unique answer (instead of at least two when using find)

Given:

This version uses complex numbers to simplify the expressions. I think it’s very readable, too (except for the odd “nan” thingy which tells “min” to ignore the diagonal). I came up with it in about 5 minutes, without reading any of the above first:

p.s. when you all post code snippets, you should use the “pre” and “code” format. It makes the code easier to read. Maybe Doug could edit the above posts and add the code and pre’s.

<pre><code>

</pre></code>

[edit by Doug to have both <pre> and <code> tags in suggestion. Also edited old comment to do this.

Now, the real challenge … can anyone solve it in O(n log n) time using recursive divide-and-conquer? How’s that for a new Puzzler?

http://en.wikipedia.org/wiki/Closest_pair_of_points

:-)

My solution…

I really like the topic.

I haven’t used meshgrid or surf before so I’m learning something new. I must be missing something since it seems like meshgrid was written to address a limitation of the syntax option of surf(X, Y, Z) when X & Y are arrays. I’m having trouble thinking of why anyone would choose to make X & Y arrays instead of vectors.

The video worked very well to talk me thru the example and you did a very fine job of recording & editing a very understandable lesson. The only negative is that after I finished the video, I went to read the FRPs for meshgrid & surf, but I didn’t have your code to reference since the video had ended & reset.

@Dan: You might want to increase the ‘dummy value’ for dist to INF since in some cases 10 might be the smallest (not in this case, but if the code were used elsewhere with different inputs.

@Oliver: As a best practice, I will post my solution as the first comment so it is accessible.

Doug

If we consider the simplest solution, using pdist might be the solution:

Alternatively, we can use bsxfun to build up the distance map.

Ben,

Nice solution, but it call pdist twice unnecessarily. If you wish to reduce the length of code, the following is shorter but also more readable:

The length of code can be checked as follows:

which gives 33 nodes.

only 28 nodes.

hth

Yi

Yi,

You bring up an interesting point about metrics on code. If people were to specify metrics to judge code on, how would they be defined?

* Binary (works/does not work)

* Length (MATLAB golf)

* Node count

* Community “thumbs up/down” voting (not automatable)

Are there automatable metrics that could be used to give an idea of what code is best? Personally, I think Golf is an interesting challenge, but I prefer readability by making variable with descriptive names on separate lines:

This takes more characters, but is very readable. I think only community voting can clear this up. What do you think?

-Doug

Doug,

Personally, I like the node count because it encourages to use meaningful variable names. A variable with a name ‘d’ has the same node count as using name ‘distance’, but latter clearly is more readable.

Yi