Collinearity
I recently heard Greg Wilson speak about a topic dear to him, Beautiful Code. He and Andy Oran edited a book on this topic in which programmers explain their thought processes. The essay Writing Programs for "The Book", by Brian Hayes, really resonates for me, in part because of the actual problem he discusses, in part because of the eureka moment he describes. Brian wrote his code in Lisp. I will write similar code in MATLAB to illustrate the points.
Contents
Problem Description
Given 3 points in a plane, find out if they lie on a single line, i.e., find out if the points are collinear. Here are the first set of points we'll use for trying out algorithms.
p1 = [1 1]; p2 = [3.5 3.5]; p3 = [-7.2 -7.2];
Method 1 - See if Third Point Obeys Same Slope
In this variant, we compute the straight lines connected 2 pairs of points, and then check that the third point fits the same line.
collinear1(p1,p2,p3)
ans = 1
dbtype collinear1
1 function tf = collinear1(p1,p2,p3) 2 m = slope(p1,p2); 3 b = intercept(p1,p2); 4 tf = (m*p3(1)+b) == p3(2); 5 end 6 function m = slope(p1,p2) 7 m = (p2(2)-p1(2))/(p2(1)-p1(1)); 8 end 9 function b = intercept(p1,p2) 10 m = slope(p1,p2); 11 b = -p1(2)+m*p1(1); 12 end
Method 2 - Method 1 Modified
There's a flaw in method one in the case where the points are collinear and lie on a vertical line, i.e., where there are repeated x values. Then the denominator we calculate will have the differences of two x values that are the same, resulting in division by 0 and a non-finite slope. First, let's see what happens with the first algorithm and vertically aligned points.
q1 = [0 1]; q2 = [0 3.5]; q3 = [0 -7.2]; collinear1(q1,q2,q3)
ans = 0
The first algorithm says these points are not collinear. Time to patch the algorithm.
collinear2(q1,q2,q3)
ans = 1
dbtype collinear2
1 function tf = collinear2(p1,p2,p3) 2 m = slope(p1,p2); 3 b = intercept(p1,p2); 4 % put in check for m being finite in case 5 % points are all on y axis 6 if ~isfinite(m) 7 if (p3(1)==p1(1)) 8 tf = true; 9 else 10 tf = false; 11 end 12 else 13 tf = (m*p3(1)+b) == p3(2); 14 end 15 end 16 function m = slope(p1,p2) 17 m = (p2(2)-p1(2))/(p2(1)-p1(1)); 18 end 19 function b = intercept(p1,p2) 20 m = slope(p1,p2); 21 b = -p1(2)+m*p1(1); 22 end
Methods 3 and 4 - Compare Two Slopes Only
The next thing Brian noticed was he really didn't need to see if the 3rd point fit the same line. What he really needed to determine is if two lines had the same slope. If so, all three lines have the same slope (convince yourself that this is true, at least for Euclidean geometry).
Let's try this algorithm on our two sets of collinear points.
collinear3(p1,p2,p3) collinear3(q1,q2,q3)
ans = 1 ans = 0
dbtype collinear3
1 function tf = collinear3(p1,p2,p3) 2 m1 = slope(p1,p2); 3 m2 = slope(p1,p3); 4 tf = isequal(m1,m2); 5 end 6 function m = slope(p1,p2) 7 q = p2-p1; 8 m = q(2)/q(1); 9 end
Once again we need to modify the code for infinite slopes.
collinear4(q1,q2,q3)
ans = 1
dbtype collinear4
1 function tf = collinear3(p1,p2,p3) 2 m1 = slope(p1,p2); 3 m2 = slope(p1,p3); 4 if isfinite(m1) 5 tf = isequal(m1,m2); 6 else 7 if ~isfinite(m2) 8 tf = true; 9 else 10 tf = false; 11 end 12 end 13 end 14 function m = slope(p1,p2) 15 q = p2-p1; 16 m = q(2)/q(1); 17 end
Method 5 and 6 - Summing Lengths of Sides of Triangle
The next idea that Brian explores is exploiting the relationship between the sides of the triangle that 3 points define. The longest leg is shorter than the sum of the lengths of the other 2 sides, except when the points are collinear. Then the length of the longest side is equal to the sum of the 2 other lengths. Let's try it out.
collinear5(p1,p2,p3) collinear5(q1,q2,q3)
ans = 0 ans = 1
Let's use the point values that Brian uses.
bh1 = [0 1]; bh2 = [14 5]; bh3 = [8 4]; collinear5(bh1,bh2,bh3);
dbtype collinear5
1 function tf = collinear5(p1,p2,p3) 2 side(3) = norm(p2-p1); 3 side(2) = norm(p3-p1); 4 side(1) = norm(p3-p2); 5 lengths = sort(side,'descend'); 6 tf = isequal(lengths(1), ... 7 (lengths(2)+lengths(3))); 8 end
So far, so good. Points that are not collinear don't claim to be. But the first set was and the answer came out negative. Let's try another batch of points to be sure.
bh4 = [0 0]; bh5 = [3 3]; bh6 = [5 5]; collinear5(bh1,bh2,bh3) collinear5(bh1*1e5,bh2*1e5,bh3*1e5)
ans = 0 ans = 0
Uh-oh! What's happened? Looking back at collinear5, we see we are no longer doing just addition, subtraction, multiplication, and division. Now we're computing the norm which involves computing a sqrt. As a result, we have more opportunity for numerical roundoff issues, even if the points are confined to be rational numbers (including integers).
collinear6(bh1,bh2,bh3) collinear6(bh1*1e5,bh2*1e5,bh3*1e5)
ans = 0 ans = 0
Version collinear6 no longer does an exact equality comparison but looks for differences on the order of eps for the appropriate lengths.
dbtype collinear6
1 function tf = collinear6(p1,p2,p3) 2 side(3) = norm(p2-p1); 3 side(2) = norm(p3-p1); 4 side(1) = norm(p3-p2); 5 lengths = sort(side,'descend'); 6 tf = ... 7 abs(lengths(1)-(lengths(2)+lengths(3))) ... 8 <= eps(lengths(1)); 9 end
Method 7 - Eureka, Compute Triangle Area!
Finally, as Brian describes it, he realized there was a completely different way to see if 3 points were collinear, and that was to compute the area of the triangle they define.
collinear7(bh1,bh2,bh3) collinear7(bh1*1e5,bh2*1e5,bh3*1e5)
ans = 0 ans = 0
To compute the area, you could compute base * height / 2, but again this would involve square roots. The trig approach would involve transcendental functions. Another way to think of this is to view any 2 pairs of the points as defining a parallelogram, and the triangle area is half that area. To compute the area, treat the sides as vectors which you translate to the origin. Then the area computation is straight-forward, boiling down to something proportional to the determinant If the result is 0, the points are collinear.
dbtype collinear7
1 function tf = collinear7(p1,p2,p3) 2 mat = [p1(1)-p3(1) p1(2)-p3(2); ... 3 p2(1)-p3(1) p2(2)-p3(2)]; 4 tf = det(mat) == 0;
This algorithm requires an equality test following 6 arithmetic operations. Pretty simple, very elegant, much less prone to numerical issues than other algorithms. Much nicer than all of the other code above, no accounting for singular behavior necessary.
Note: Added 9 June 2008
If you read the comments for this blog, you will find an even more satisfactory solution relying on svd. svd (and the related rank function) have better numerical attributes than det.
What's Your Recent Aha Moment?
Have you had a similar epiphany recently? We'd love to hear about it here.