Let's start looking at connected component labeling algorithms. In this post I want to explain how you can think of pixel neighborhood relationships in terms of a graph. We'll look at how to represent and visualize a graph in MATLAB, as well as how to compute the connected components of a graph.
Here's our sample binary image:
bw = logical([ ...
0 0 0 0 0 0 0
0 1 1 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0 ])
bw =
0 0 0 0 0 0 0
0 1 1 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0
Each foreground pixel becomes a node in the graph. Let's number the nodes. Theoretically, the node ordering doesn't matter very much, but for us it's convenient to use the same order as the find function:
0 0 0 0 0 0 0 0 1 3 0 0 0 0 0 2 4 0 0 0 0 0 0 0 5 7 9 0 0 0 0 6 8 10 0
Next, make a list of which nodes in the graph are connected to each other. Two nodes are connected if the corresponding pixels are neighbors. Using 4-connectivity, for example, node 1 is connected to nodes 2 and 3. Node 4 is connected to nodes 2 and 3. Let's capture these pairwise connections in a matrix:
pairs = [...
1 3; 1 2;
2 1; 2 4;
3 1; 3 4;
4 2; 4 3;
5 6; 5 7;
6 5; 6 8;
7 5; 7 8; 7 9;
8 6; 8 7; 8 10;
9 7; 9 10;
10 8; 10 9]
pairs =
1 3
1 2
2 1
2 4
3 1
3 4
4 2
4 3
5 6
5 7
6 5
6 8
7 5
7 8
7 9
8 6
8 7
8 10
9 7
9 10
10 8
10 9
From these pairs you can build up a graph representation called an adjacency matrix. An adjacency matrix has one row and column for each graph node. A nonzero element (p,q) means that there is a connection between nodes p and q. In other words, nodes p and q are adjacent. You can build up an adjacency matrix from the list of pairs by using accumarray, like this:
A = accumarray(pairs, 1)
A =
0 1 1 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 1 1 0
0 0 0 0 0 1 1 0 0 1
0 0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 1 1 0
For graphs containing many nodes, you are better off constructing a sparse adjacency matrix, like this:
A = sparse(pairs(:,1), pairs(:,2), 1);
spy(A)
title('Spy plot of sparse adjacency matrix')
MATLAB has a function called dmperm, which computes the Dulmage-Mendelsohn decomposition of a matrix. If the matrix is an adjacency matrix, dmperm can be used to compute the connected components of the corresponding graph. Here's how to do it.
First, you have to put 1s on the diagonal of A:
A(1:11:end) = 1;
[Note added June 29, 2007: Tim Davis pointed out this the above line is an inefficient way to get ones onto the diagonal of a sparse matrix. Please see his comments for an explanation and for better ways to do it.]
Next, call dmperm:
[p,q,r,s] = dmperm(A);
The output p contains a permuted list of nodes:
p
p =
3 4 2 1 10 9 7 8 6 5
And the output r tells you which nodes of p belong to the same connected component:
r
r =
1 5 11
Specifically, the first connected component contains these nodes:
p(r(1):r(2)-1)
ans =
3 4 2 1
The second connected component contains these nodes:
p(r(2):r(3)-1)
ans =
10 9 7 8 6 5
These two sets of nodes are the connected components of our original binary image.
Let's finish this example by using 8-connectivity instead of 4-connectivity. Our list of pairs becomes:
pairs = [...
1 2; 1 3; 1 4;
2 1; 2 3; 2 4;
3 1; 3 2; 3 4;
4 1; 4 2; 4 3; 4 5;
5 4; 5 6; 5 7; 5 8;
6 5; 6 7; 6 8;
7 5; 7 6; 7 8; 7 9; 7 10;
8 5; 8 6; 8 7; 8 9; 8 10;
9 7; 9 8; 9 10;
10 7; 10 8; 10 9];Now repeat the steps of building the sparse adjacency matrix and calling dmperm:
A = sparse(pairs(:,1), pairs(:,2), 1); A(1:11:end) = 1; [p,q,r,s] = dmperm(A); r
r =
1 11
[Note added June 29, 2007: See Tim's comment about A(1:11:end) = 1.]
Now there's only one connected component:
p(r(1):r(2)-1)
ans =
10 9 8 7 6 5 4 3 2 1
Next time we'll look at computing connected components using a kind of "flood fill" approach.
Get
the MATLAB code
Published with MATLAB® 7.4


Hi, Steve:
It is easy to “find” and number the nodes with the “find” function, but how can I programatically find the “pairs” matrix (4- or 8-connected)? Is there any function in Matlab that can do this for us?
JanKees – Good question. I usually approach vectorized neighbor searching in MATLAB by using linear indexing and linear neighbor index offsets. It’s a pretty useful technique for a variety of problems, so I’ll insert a discussion of it into my planned postings on connected components.
Hi, Steve:
Thanks for your answer. I will wait anxiously :-) for that post.
Hi Steve- I’ve been using dmperm to find the connected components of a graph for a little while. Unfortunately, it seems to require a symmetric adjacency matrix. Is there an analogous function that works without explicit symmetrization of the matrix (which requires twice the storage). Or would I have to implement a DFS algorithm?
Srini – I’m not aware of anything already in MATLAB that’ll find connected components without using a symmetric adjacency matrix. You could implement depth-first or breadth-first search yourself. The binary image connected component method I described in part 4 of this series is breadth-first. Or you could try using something like union-find, which I’ll describe soon.
dmperm doesn’t require a symmetric nonzero pattern. It can find the strongly connected components of a directed graph (is that what you are trying to compute?). For that kind of matrix, you need to ensure that A has a zero-free diagonal first. Otherwise, it will permute the rows and columns differently by first finding a maximum matching (a permutation for a zero-free diagonal). That permutation (part 1 of dmperm) is an unsymmetric permutation.
You can skip that step by using cs_scc in CSparse (see the File Exchange), instead. It which finds the strongly connected components of a graph (which is part 2 of the dmperm algorithm). Also in CSparse (or CXSparse if you have a 64-bit MATLAB) is a faster dmperm and better than the one in the current version of MATLAB.
Tim—Thanks! For others interested in dmperm, here’s a reference that Tim just posted in comp.soft-sys.matlab: Direct Methods for Sparse Linear Systems, SIAM, 2006.
One other comment. A(1:11:end) = 1 is a very slow way to add the diagonal to a matrix. It always best to avoid A(i,j) = … when constructing sparse matrices (see http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/ )
Here’s an example. First, I create two matrices with the same pattern. To ensure B=A actually makes a real copy and not a “shallow” one, I do B=A*.2, just for kicks. Cut and past the following:
The A(1:(n+1):end) = 1 statement takes 0.17 seconds in MATLAB 7.4. The B=B+speye(n) statement takes 0.0056 seconds, which is 30 times faster. This is a contrived example but typical of the kind of performance difference you would see in matrices and graphs arising in practice.
If the diagonal of A started out with -1′s, then you would get cancellation (that doesn’t happen here). So do this instead:
if that’s an issue.
Yet another way to do it, and faster still, is to do the following to construct A in the first place:
Moral: Avoid A(i,j) = x if at all possible (sparse sub assignment). Less of a problem but still slow is the sparse sub reference, x = A(i,j).
For comparison, for this matrix x=A\b where b is a dense vector takes 0.45 seconds and does 56 million floating-point operations. The A(1:(n+1):end)=1 statement should take at most O(nnz(A)) time, where in this case nnz(A) is about 96,000. So A\b is working at 1.2 Gflops (excluding all the integer work that it also has to do…), and A(1:(n+1):end)=1 is working at 56 thousand “index operations per second”. That’s a factor almost exactly one thousand times slower.
I realize of course that this connected components problem has nothing to do with solving A*x=b. What this does point out is just how slow, by comparison, sparse sub assignment truly is.
uhh … silly me … I can’t divide. The subsref is working at 560 thousand “ops per second”. x=A\b is working at 1.2 Gflops. So subsref is over Two Thousand times slower than backslash, not one thousand.
Sorry for the confusion.
Dear Steve,
I would like to count the size of connectors for each “cluster” in the following example
bw =
0 0 0 0 0 0 1
0 1 1 0 0 0 1
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0
Then based on 8-connectivity, define
cluster=
0 0 0 0 0 0 2
0 1 1 0 0 0 2
0 1 1 0 0 0 0
0 0 0 1 1 1 0
0 0 0 1 1 1 0
0 0 0 0 0 0 0
Thus,
cluster one has 10 pixels and cluster 2= 2 pixels. How could I use the above code to get the size of cluster?
Best regards,
Ake
Ake—Use bwlabel and regionprops.
dmperm in MATLAB 7.5 or later has been modified, since this article was first written (MATLAB is now using the dmperm in CSparse). Among other things, dmperm now returns each connected component in their natural order. That is, p(r(1):r(2)-1) is now [1 2 3 4], not [3 4 2 1], for this example.
There’s nothing special to the ordering of nodes within a single connected component, so it makes more sense to return them in a natural order instead of a rather arbitrarily jumbled order.
Does this have any noticeable effect on bwlabel or other functions?
Tim—I don’t think it affects bwlabel. I can’t speak to the question of “other functions.” :-)
OK. I was thinking it might affect other functions such as mindread.m, or time-transfer-related functions such as doc(‘who’) … ;-)
Tim—I knew that. ;-)
am doing project on handwritten character segmentation….can i know the BFS(Breadth First Search) code for finding out the connected components?
Angel—We provide connected component code in the form of the Image Processing Toolbox. :-) If you have that, there’s no need to implement anything yourself. If you can’t use the Image Processing Toolbox, then there’s a wealth of information readily accessible on the web about connected-component labeling and breadth-first searching algorithms.
I found your example very useful for finding connected nodes in a large set of input pairs. I start with an Nx6 array of pixel matches between video frames, and needed to find all points in every frame that are actually of the same location on the ground. I use Delauney tessellation to come up with unique identifiers for every node, then build the sparse adjacency matrix and apply dmperm. Thanks for you example, it was a big help!
David—I’m glad you found it useful!
Hi, i am facing a problem. i have a network data which starts like this
10010 17693
10010 18139
10010 23625
10010 23785
10010 23795
10010 2497
10010 48326
10010 7690
10010 9615
10012 23720
10013 7514
and after 45000 lines of this array it ends up with a number
9989 2647
9989 3320
9989 5384
9989 5511
9989 6453
9989 6762
9989 7568
9989 8966
9993 2497
now problem is this that i tried to use the accumarray command but that is good only for few data. Actually i have to first find the adjacency matrix of this data and then i have to plot the connection graph between these nodes.
Can anyone please help me in this?
Is there a given code for component algorithm that accepts a predicate to evaluale on each pair of neighboring pixels (not from binary image but for matrix with varies valus?)
Ronit—Not in the Image Processing Toolbox.
and in other place?
Ronit—Not that I personally know about. But you should look.
A—It sounds like you are trying to create an extremely large adjacency matrix. You’ll probably need to construct a sparse matrix. Read this post over again; it tells you how to construct a sparse adjacency matrix.
Sir,
I am working on text extraction in colored images. I want to know if there is a connected component labeling function similar to bwlabel for grayscale images? I want to label the components and then threshold and binarize them. I do no want to operate on binary images initially.
Amey—The Image Processing Toolbox does not have such a function.
hi Steve! how can you “find” the number of nodes with the “find” function?
Respect your dmperm ( dulmage-mendelsohn descomposition) I dont understand. Can you explain that algorithm? thank you in advance
Stefano—I don’t understand your first question. As to the second, I am not personally familiar the details of the dmperm algorithm; you’ll need to research that yourself.
ok Steve thanks anyway. but why do you have to put 1s on the diagonal of A so that you put dmperm? . what is the propose of dmperm? why do you use this command?
Stefano—To compute the connected components of the graph represented by the adjacency matrix.
do you know about K nearest neighbors? can you explain this into a post?
http://www.mathworks.com/matlabcentral/fileexchange/15562-k-nearest-neighbors
Jorge—What do want to know about it?
Mr. Steve , why did you have to put 1s on the diagonal of A?
Hello Steve,
I have a kinda-similar problem. I am trying to detect buildings and obtain a labeled image after segmentation. I have also binary mask of buildings and I want to train my system for each building object. Thus, I have a labeled image and a binary mask for the desired objects. However, there are several labels in the region of corresponding component in binary mask and I want to merge these regions in labeled image and give them a single label for each object. I mean, I want to merge building segment labels which I know they are part of a building object indeed. I guess traversing all the pixels of objects in the binary mask is the most cumbersome way, but I can’t think of any other thing. Any suggestions will be appreciated. Thanks a lot.
Sorry for the previous comment. I thought I was writing for the post titled “Almost-connected-component labeling”. I posted my comment there as well, thanks.