Let's start looking at connected component labeling algorithms. In this post I want to explain how you can think of pixel neighborhood relationships in terms of a graph. We'll look at how to represent and visualize a graph in MATLAB, as well as how to compute the connected components of a graph.
Here's our sample binary image:
bw = logical([ ...
0 0 0 0 0 0 0
0 1 1 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0 ])
bw =
0 0 0 0 0 0 0
0 1 1 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0
Each foreground pixel becomes a node in the graph. Let's number the nodes. Theoretically, the node ordering doesn't matter very much, but for us it's convenient to use the same order as the find function:
0 0 0 0 0 0 0 0 1 3 0 0 0 0 0 2 4 0 0 0 0 0 0 0 5 7 9 0 0 0 0 6 8 10 0
Next, make a list of which nodes in the graph are connected to each other. Two nodes are connected if the corresponding pixels are neighbors. Using 4-connectivity, for example, node 1 is connected to nodes 2 and 3. Node 4 is connected to nodes 2 and 3. Let's capture these pairwise connections in a matrix:
pairs = [...
1 3; 1 2;
2 1; 2 4;
3 1; 3 4;
4 2; 4 3;
5 6; 5 7;
6 5; 6 8;
7 5; 7 8; 7 9;
8 6; 8 7; 8 10;
9 7; 9 10;
10 8; 10 9]
pairs =
1 3
1 2
2 1
2 4
3 1
3 4
4 2
4 3
5 6
5 7
6 5
6 8
7 5
7 8
7 9
8 6
8 7
8 10
9 7
9 10
10 8
10 9
From these pairs you can build up a graph representation called an adjacency matrix. An adjacency matrix has one row and column for each graph node. A nonzero element (p,q) means that there is a connection between nodes p and q. In other words, nodes p and q are adjacent. You can build up an adjacency matrix from the list of pairs by using accumarray, like this:
A = accumarray(pairs, 1)
A =
0 1 1 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 1 1 0
0 0 0 0 0 1 1 0 0 1
0 0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 1 1 0
For graphs containing many nodes, you are better off constructing a sparse adjacency matrix, like this:
A = sparse(pairs(:,1), pairs(:,2), 1);
spy(A)
title('Spy plot of sparse adjacency matrix')
MATLAB has a function called dmperm, which computes the Dulmage-Mendelsohn decomposition of a matrix. If the matrix is an adjacency matrix, dmperm can be used to compute the connected components of the corresponding graph. Here's how to do it.
First, you have to put 1s on the diagonal of A:
A(1:11:end) = 1;
[Note added June 29, 2007: Tim Davis pointed out this the above line is an inefficient way to get ones onto the diagonal of a sparse matrix. Please see his comments for an explanation and for better ways to do it.]
Next, call dmperm:
[p,q,r,s] = dmperm(A);
The output p contains a permuted list of nodes:
p
p =
3 4 2 1 10 9 7 8 6 5
And the output r tells you which nodes of p belong to the same connected component:
r
r =
1 5 11
Specifically, the first connected component contains these nodes:
p(r(1):r(2)-1)
ans =
3 4 2 1
The second connected component contains these nodes:
p(r(2):r(3)-1)
ans =
10 9 7 8 6 5
These two sets of nodes are the connected components of our original binary image.
Let's finish this example by using 8-connectivity instead of 4-connectivity. Our list of pairs becomes:
pairs = [...
1 2; 1 3; 1 4;
2 1; 2 3; 2 4;
3 1; 3 2; 3 4;
4 1; 4 2; 4 3; 4 5;
5 4; 5 6; 5 7; 5 8;
6 5; 6 7; 6 8;
7 5; 7 6; 7 8; 7 9; 7 10;
8 5; 8 6; 8 7; 8 9; 8 10;
9 7; 9 8; 9 10;
10 7; 10 8; 10 9];Now repeat the steps of building the sparse adjacency matrix and calling dmperm:
A = sparse(pairs(:,1), pairs(:,2), 1); A(1:11:end) = 1; [p,q,r,s] = dmperm(A); r
r =
1 11
[Note added June 29, 2007: See Tim's comment about A(1:11:end) = 1.]
Now there's only one connected component:
p(r(1):r(2)-1)
ans =
10 9 8 7 6 5 4 3 2 1
Next time we'll look at computing connected components using a kind of "flood fill" approach.
Get
the MATLAB code
Published with MATLAB® 7.4



Hi, Steve:
It is easy to “find” and number the nodes with the “find” function, but how can I programatically find the “pairs” matrix (4- or 8-connected)? Is there any function in Matlab that can do this for us?
JanKees - Good question. I usually approach vectorized neighbor searching in MATLAB by using linear indexing and linear neighbor index offsets. It’s a pretty useful technique for a variety of problems, so I’ll insert a discussion of it into my planned postings on connected components.
Hi, Steve:
Thanks for your answer. I will wait anxiously :-) for that post.
Hi Steve- I’ve been using dmperm to find the connected components of a graph for a little while. Unfortunately, it seems to require a symmetric adjacency matrix. Is there an analogous function that works without explicit symmetrization of the matrix (which requires twice the storage). Or would I have to implement a DFS algorithm?
Srini - I’m not aware of anything already in MATLAB that’ll find connected components without using a symmetric adjacency matrix. You could implement depth-first or breadth-first search yourself. The binary image connected component method I described in part 4 of this series is breadth-first. Or you could try using something like union-find, which I’ll describe soon.
dmperm doesn’t require a symmetric nonzero pattern. It can find the strongly connected components of a directed graph (is that what you are trying to compute?). For that kind of matrix, you need to ensure that A has a zero-free diagonal first. Otherwise, it will permute the rows and columns differently by first finding a maximum matching (a permutation for a zero-free diagonal). That permutation (part 1 of dmperm) is an unsymmetric permutation.
You can skip that step by using cs_scc in CSparse (see the File Exchange), instead. It which finds the strongly connected components of a graph (which is part 2 of the dmperm algorithm). Also in CSparse (or CXSparse if you have a 64-bit MATLAB) is a faster dmperm and better than the one in the current version of MATLAB.
Tim—Thanks! For others interested in dmperm, here’s a reference that Tim just posted in comp.soft-sys.matlab: Direct Methods for Sparse Linear Systems, SIAM, 2006.
One other comment. A(1:11:end) = 1 is a very slow way to add the diagonal to a matrix. It always best to avoid A(i,j) = … when constructing sparse matrices (see http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/ )
Here’s an example. First, I create two matrices with the same pattern. To ensure B=A actually makes a real copy and not a “shallow” one, I do B=A*.2, just for kicks. Cut and past the following:
The A(1:(n+1):end) = 1 statement takes 0.17 seconds in MATLAB 7.4. The B=B+speye(n) statement takes 0.0056 seconds, which is 30 times faster. This is a contrived example but typical of the kind of performance difference you would see in matrices and graphs arising in practice.
If the diagonal of A started out with -1’s, then you would get cancellation (that doesn’t happen here). So do this instead:
if that’s an issue.
Yet another way to do it, and faster still, is to do the following to construct A in the first place:
Moral: Avoid A(i,j) = x if at all possible (sparse sub assignment). Less of a problem but still slow is the sparse sub reference, x = A(i,j).
For comparison, for this matrix x=A\b where b is a dense vector takes 0.45 seconds and does 56 million floating-point operations. The A(1:(n+1):end)=1 statement should take at most O(nnz(A)) time, where in this case nnz(A) is about 96,000. So A\b is working at 1.2 Gflops (excluding all the integer work that it also has to do…), and A(1:(n+1):end)=1 is working at 56 thousand “index operations per second”. That’s a factor almost exactly one thousand times slower.
I realize of course that this connected components problem has nothing to do with solving A*x=b. What this does point out is just how slow, by comparison, sparse sub assignment truly is.
uhh … silly me … I can’t divide. The subsref is working at 560 thousand “ops per second”. x=A\b is working at 1.2 Gflops. So subsref is over Two Thousand times slower than backslash, not one thousand.
Sorry for the confusion.
Dear Steve,
I would like to count the size of connectors for each “cluster” in the following example
bw =
0 0 0 0 0 0 1
0 1 1 0 0 0 1
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0
Then based on 8-connectivity, define
cluster=
0 0 0 0 0 0 2
0 1 1 0 0 0 2
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0
Thus,
cluster one has 10 pixels and cluster 2= 2 pixels. How could I use the above code to get the size of cluster?
Best regards,
Ake
Ake—Use bwlabel and regionprops.
dmperm in MATLAB 7.5 or later has been modified, since this article was first written (MATLAB is now using the dmperm in CSparse). Among other things, dmperm now returns each connected component in their natural order. That is, p(r(1):r(2)-1) is now [1 2 3 4], not [3 4 2 1], for this example.
There’s nothing special to the ordering of nodes within a single connected component, so it makes more sense to return them in a natural order instead of a rather arbitrarily jumbled order.
Does this have any noticeable effect on bwlabel or other functions?
Tim—I don’t think it affects bwlabel. I can’t speak to the question of “other functions.” :-)
OK. I was thinking it might affect other functions such as mindread.m, or time-transfer-related functions such as doc(’who’) … ;-)
Tim—I knew that. ;-)
am doing project on handwritten character segmentation….can i know the BFS(Breadth First Search) code for finding out the connected components?
Angel—We provide connected component code in the form of the Image Processing Toolbox. :-) If you have that, there’s no need to implement anything yourself. If you can’t use the Image Processing Toolbox, then there’s a wealth of information readily accessible on the web about connected-component labeling and breadth-first searching algorithms.