Doug’s MATLAB Video Tutorials
August 18th, 2008
Puzzler: Find largest connected island
Every now and then we send MATLAB challenges through the department to help sharpen our programming skills. What amazes me is the sheer variety of solutions that people come up with.
Today, I have a challenge for the community here. Given a matrix of random, positive, whole numbers, find the largest group of identical, four connected values.
If there are two groups tied for the same size, choose the largest valued. If there is still a tie, it is a free choice which to use. Simply return a binary matrix with ones where the island of connected pixels is.

I suspect that this challenge will be much easier for people with access to the Image Processing Toolbox. I am curious to see what kind of creative solutions people will find with and without the toolbox.
Here is the starter code:
n = 10; % size of matrix
v = 9; % max value of matrix, different values give bigger islands
a = round(rand(n)*v);
flag = mySolver(a); % mySolver will return a binary matrix
Your mission, write mySolver.m and place it in the comment section.
When you post your code in the comments, please use the tags
<pre> <code>
all the code so someone can just copy and paste it from the comments.
</code> </pre>
09:38 UTC |
Posted in Topic: Puzzler |
Permalink |
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.
Leave a Reply
|
Not at all elegant, but I just wanted to see if I could do it…
To make the post come out right, I had to change all less than statements to use greater than instead. Something funny with html tags, I guess.
Not having the image toolbox I gave this a shot using core matlab, but the only solutions I come up with are non-vectorized, and use a multitude of counters. At the very best, my code would become something like what Anne allready wrote, so I am refraining from submitting.
Think this one works (not using the toolbox)
brett’s version seems to fail for certain examples, haven’t quite pinned down why yet.
I wasn’t sure whether to optimize for length of code or speed of execution. I opted for the latter, so I added the first part of the code to first locate the connected islands and then search through them instead of searching (looping) through the entire matrix.
Everyone,
I think this exercise shows the value of having a large library of functions to call upon. By far, the easiest code to maintain and understand is the code that used Image Processing Toolbox to figure out the connectivity of the values.
There are very often calculations and visualizations that will be done by many people in your community. That community may be your work group, your company, people in your field, or just people using MATLAB in general.
Many companies have internal MATLAB toolboxes for their specialized functions, The MathWorks often has specialized toolboxes for fields of study (Image Processing, Signal processing, Parallel Computing, etc…) and the whole of the MATLAB community produces high quality functions also.
I encourage you to seek-out (or create) user groups within your company, look for toolboxes that The MathWorks has made, and to look to The MATLAB Central File Exchange. This search will very often save you from re-inventing the Wheel. Even better, it will keep you from having to maintain code if it comes from a trusted source.
-Doug
Everyone,
I found it interesting how there are a few different ways to solve this problem. Brett and Ted both made use of the Image Processing Toolbox to deal with the connectivity, that makes the problem a lot easier. It is mostly bookkeeping from there to make sure you find the biggest connected island.
Anne and Zachary used clever logical short circuiting in their solutions. If MATLAB is ever faced with an *condition(1) and condition(2)* type of operation if condition(1) is false, the entire statement is automatically false, so condition(2) is never executed. They use this idea to make sure they never step out of bounds on a matrix.
David uses convolution with a plus shaped mask to flood fill and find connectivity. I had not thought of that solution.
Thanks for all the interesting ways to solve this problem.
-Doug
David,
I tested all the posted solutions thus far. For every test case, they all gave valid solutions. I ran hundreds of random tests.
In your comment you said that Brett’s fails for certain inputs. Many inputs have multiple correct answers. For simplicity, I specified to make an arbitrary choice. My strong suspicion is that the “failures” you saw were cases where each of you chose different correct answers.
-Doug
Here is my solution. It is long, but it appears to be as fast or faster than any of the other non-toolbox codes. Thanks for the puzzle!
Matt,
Looking at your code, it is clear that you are optimizing for speed. I did two tests to compare all of the current solutions. They are all fast when doing one iteration of the problem, essentially instant. So for many people that means there is no need to optimize.
However, if this is a problem that were to scale larger, there are two ways to do that. You could do many small problems, or one larger problem. I tried both, and for the most part all of the solutions scaled well. Ted, Zach and you had solutions that scaled the best in both cases.
The profiler is very good at finding bottlenecks in code. Oddly enough, the biggest bottleneck when running lots of small problems was in Brett’s code in STRMATCH, which is called by REGIONPROPS. This kind of bottleneck is not noticed until you run a function thousands of times.
I say, just write your code in the clearest manner possible, then optimize for speed only if you need to. Very often you will not need to anyways.
-Doug
PS. A video on profiler:
http://blogs.mathworks.com/pick/2006/10/19/profiler-to-find-code-bottlenecks/
Anne M,
Your code will not find the island if it is made of zeros.
All other codes found this island.
Matt,
Excellent catch. I am surprised that my testing did not find that! You would think that running so many random test cases would have found a similar case. What is the feature that causes the failure?
This shows that good testing is important and tough to do.
Doug
Waitaminute…is zero a positive whole number? If it is not, I maintain that I solved the problem as it was presented. Although I must admit that the illustrative graphic doesn’t match the way I considered the problem. But just change
in my code to
and it’s fixed.
Oh my! It seems Anne is right, in fact hers is the only right answer of the bunch! I would love to take credit for making a very clever “special case” into the fabric of the challenge, but alas, I did not. Even better catch by Anne!
-Doug
Doug,
I don’t agree that mine is the only right answer, because according to the exact problem definition, there should be no zeros anywhere in the matrix, so from what I understand, everybody else’s code does in fact succeed on all legally defined matrices. The only point I was trying to make was, my code at least behaved the way I intended it to, and if I was wrong about the problem definition a simple change would rectify that.
-Anne
Anne,
At this point I will leave it up as is, since I made a mistake in the problem statement. Instead of “positive whole numbers” I should have said “nonnegative integers” (0, 1, 2, 3,…)
Thanks for keeping me honest on terminology, once I get your address, a bunch of MATLAB swag will be on its way!
Thanks,
Doug
Good catch Anne, sorry about that.
Hey Doug, I don’t have access to the toolboxes where I work yet, could you just say a little about how our (the non-toolbox folks) solutions compare speed-wise and scale-wise to the code using the toolboxes?
Also, speaking of scaling, I ran the profiler and found the bottleneck in my code, it was mostly the find statement at the end. I spent a few minutes fixing it, and now it is way faster. Especially for a larger problems, for example:
This compares with about 50 seconds for my first crack at it! This simply proves that a few minutes spent profiling can pay off big time.
Here is the code:
Try this one. It doesn’t require any toolboxes, just “sparse” and “dmperm”.
Here are some timings. With
The code above takes just under 9 seconds, which is about 20 times faster than at least one of the earlier posts. “dmperm” is also what the Image Processing Toolbox uses. This code uses dmperm directly.
I was a little confused by all the back-and-forth discussion of zero blocks. Is a block of zeros to be found, if it’s the biggest? Or are zeros to be ignored? The version above intentionally ignores zeros. If you want to include blocks of zeros, then do this instead. Take your pick:
I left out the comments in the code, above. Most of the functions should be familiar to most MATLAB users. dmperm is the odd one; you can read more about dmperm and how it works, and look at its source code, here:
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=11740&objectType=FILE
Tim,
There is a problem in your second code. In the case when a zero block is the same size as a non-zero block, your code sometimes returns the zero block.
a = zeros(4) + triu(ones(4)) + diag([-1 -1 0 0])
vs.
a = zeros(4) + triu(ones(4)) + diag([0 0 -1 -1])
Matt,
Good point, thanks for catching that. I missed that in the problem statement.
It’s easy to fix. The size of each component is given by diff(r), where the kth component is nodes p(r(k):r(k+1)-1) where a “node” is an entry in the linear-index order of the image a.
max(diff(r)) just finds the biggest one. It’s a small tweak to pick amongst the ties. I’ll post an update.
The fix is to replace the max(diff(r)) line with:
I’ll post a file on the File Exchange that’s commented so you can see what the algorithm is doing. The non-commented code is:
Thanks for pointing out the glitch.
Oh, I’m a bit dyslexic. The code has a variable “west” that should be called “east.” East and West are the same to me, and I always get rows vs columns mixed up too …
The code on the File Exchange will have the right name.
Check out find_components, on the File Exchange. It doesn’t require the Image Toolbox, just MATLAB itself:
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=21366
which has lots of comments that explains what the code does, and some plots of the graph showing the connected components (graph, as in graph theory). This Puzzler is a fun example of the kind of things dmperm is good for.
(Except that the Code Metrics don’t display properly … sigh).