Steve on Image Processing

March 20th, 2007

Connected component labeling - Part 3

Let's start looking at connected component labeling algorithms. In this post I want to explain how you can think of pixel neighborhood relationships in terms of a graph. We'll look at how to represent and visualize a graph in MATLAB, as well as how to compute the connected components of a graph.

Here's our sample binary image:

bw = logical([ ...
    0 0 0 0 0 0 0
    0 1 1 0 0 0 0
    0 1 1 0 0 0 0
    0 0 0 1 1 1 0
    0 0 0 1 1 1 0
    0 0 0 0 0 0 0 ])
bw =

     0     0     0     0     0     0     0
     0     1     1     0     0     0     0
     0     1     1     0     0     0     0
     0     0     0     1     1     1     0
     0     0     0     1     1     1     0
     0     0     0     0     0     0     0

Each foreground pixel becomes a node in the graph. Let's number the nodes. Theoretically, the node ordering doesn't matter very much, but for us it's convenient to use the same order as the find function:

0  0  0  0  0  0  0
0  1  3  0  0  0  0
0  2  4  0  0  0  0
0  0  0  5  7  9  0
0  0  0  6  8 10  0

Next, make a list of which nodes in the graph are connected to each other. Two nodes are connected if the corresponding pixels are neighbors. Using 4-connectivity, for example, node 1 is connected to nodes 2 and 3. Node 4 is connected to nodes 2 and 3. Let's capture these pairwise connections in a matrix:

pairs = [...
    1 3; 1 2;
    2 1; 2 4;
    3 1; 3 4;
    4 2; 4 3;
    5 6; 5 7;
    6 5; 6 8;
    7 5; 7 8; 7 9;
    8 6; 8 7; 8 10;
    9 7; 9 10;
    10 8; 10 9]
pairs =

     1     3
     1     2
     2     1
     2     4
     3     1
     3     4
     4     2
     4     3
     5     6
     5     7
     6     5
     6     8
     7     5
     7     8
     7     9
     8     6
     8     7
     8    10
     9     7
     9    10
    10     8
    10     9

From these pairs you can build up a graph representation called an adjacency matrix. An adjacency matrix has one row and column for each graph node. A nonzero element (p,q) means that there is a connection between nodes p and q. In other words, nodes p and q are adjacent. You can build up an adjacency matrix from the list of pairs by using accumarray, like this:

A = accumarray(pairs, 1)
A =

     0     1     1     0     0     0     0     0     0     0
     1     0     0     1     0     0     0     0     0     0
     1     0     0     1     0     0     0     0     0     0
     0     1     1     0     0     0     0     0     0     0
     0     0     0     0     0     1     1     0     0     0
     0     0     0     0     1     0     0     1     0     0
     0     0     0     0     1     0     0     1     1     0
     0     0     0     0     0     1     1     0     0     1
     0     0     0     0     0     0     1     0     0     1
     0     0     0     0     0     0     0     1     1     0

For graphs containing many nodes, you are better off constructing a sparse adjacency matrix, like this:

A = sparse(pairs(:,1), pairs(:,2), 1);
spy(A)
title('Spy plot of sparse adjacency matrix')

MATLAB has a function called dmperm, which computes the Dulmage-Mendelsohn decomposition of a matrix. If the matrix is an adjacency matrix, dmperm can be used to compute the connected components of the corresponding graph. Here's how to do it.

First, you have to put 1s on the diagonal of A:

A(1:11:end) = 1;

[Note added June 29, 2007: Tim Davis pointed out this the above line is an inefficient way to get ones onto the diagonal of a sparse matrix. Please see his comments for an explanation and for better ways to do it.]

Next, call dmperm:

[p,q,r,s] = dmperm(A);

The output p contains a permuted list of nodes:

p
p =

     3     4     2     1    10     9     7     8     6     5

And the output r tells you which nodes of p belong to the same connected component:

r
r =

     1     5    11

Specifically, the first connected component contains these nodes:

p(r(1):r(2)-1)
ans =

     3     4     2     1

The second connected component contains these nodes:

p(r(2):r(3)-1)
ans =

    10     9     7     8     6     5

These two sets of nodes are the connected components of our original binary image.

Let's finish this example by using 8-connectivity instead of 4-connectivity. Our list of pairs becomes:

pairs = [...
    1 2; 1 3; 1 4;
    2 1; 2 3; 2 4;
    3 1; 3 2; 3 4;
    4 1; 4 2; 4 3; 4 5;
    5 4; 5 6; 5 7; 5 8;
    6 5; 6 7; 6 8;
    7 5; 7 6; 7 8; 7 9; 7 10;
    8 5; 8 6; 8 7; 8 9; 8 10;
    9 7; 9 8; 9 10;
    10 7; 10 8; 10 9];

Now repeat the steps of building the sparse adjacency matrix and calling dmperm:

A = sparse(pairs(:,1), pairs(:,2), 1);
A(1:11:end) = 1;
[p,q,r,s] = dmperm(A);
r
r =

     1    11

[Note added June 29, 2007: See Tim's comment about A(1:11:end) = 1.]

Now there's only one connected component:

p(r(1):r(2)-1)
ans =

    10     9     8     7     6     5     4     3     2     1

Next time we'll look at computing connected components using a kind of "flood fill" approach.


Get the MATLAB code

Published with MATLAB® 7.4

18 Responses to “Connected component labeling - Part 3”

  1. JanKees replied on :

    Hi, Steve:

    It is easy to “find” and number the nodes with the “find” function, but how can I programatically find the “pairs” matrix (4- or 8-connected)? Is there any function in Matlab that can do this for us?

  2. Steve replied on :

    JanKees - Good question. I usually approach vectorized neighbor searching in MATLAB by using linear indexing and linear neighbor index offsets. It’s a pretty useful technique for a variety of problems, so I’ll insert a discussion of it into my planned postings on connected components.

  3. JanKees replied on :

    Hi, Steve:

    Thanks for your answer. I will wait anxiously :-) for that post.

  4. Srini replied on :

    Hi Steve- I’ve been using dmperm to find the connected components of a graph for a little while. Unfortunately, it seems to require a symmetric adjacency matrix. Is there an analogous function that works without explicit symmetrization of the matrix (which requires twice the storage). Or would I have to implement a DFS algorithm?

  5. Steve replied on :

    Srini - I’m not aware of anything already in MATLAB that’ll find connected components without using a symmetric adjacency matrix. You could implement depth-first or breadth-first search yourself. The binary image connected component method I described in part 4 of this series is breadth-first. Or you could try using something like union-find, which I’ll describe soon.

  6. Tim Davis replied on :

    dmperm doesn’t require a symmetric nonzero pattern. It can find the strongly connected components of a directed graph (is that what you are trying to compute?). For that kind of matrix, you need to ensure that A has a zero-free diagonal first. Otherwise, it will permute the rows and columns differently by first finding a maximum matching (a permutation for a zero-free diagonal). That permutation (part 1 of dmperm) is an unsymmetric permutation.

    You can skip that step by using cs_scc in CSparse (see the File Exchange), instead. It which finds the strongly connected components of a graph (which is part 2 of the dmperm algorithm). Also in CSparse (or CXSparse if you have a 64-bit MATLAB) is a faster dmperm and better than the one in the current version of MATLAB.

  7. Steve replied on :

    Tim—Thanks! For others interested in dmperm, here’s a reference that Tim just posted in comp.soft-sys.matlab: Direct Methods for Sparse Linear Systems, SIAM, 2006.

  8. Tim Davis replied on :

    One other comment. A(1:11:end) = 1 is a very slow way to add the diagonal to a matrix. It always best to avoid A(i,j) = … when constructing sparse matrices (see http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/ )

    Here’s an example. First, I create two matrices with the same pattern. To ensure B=A actually makes a real copy and not a “shallow” one, I do B=A*.2, just for kicks. Cut and past the following:

    n=1000;
    A=sprand(n,n,.1) ;
    B=A*2;  % just to make sure the timing is fair
    tic; A(1:(n+1):end) = 1 ; toc
    tic ; B=B+speye(n) ; toc
    spones(A) - spones (B)
    

    The A(1:(n+1):end) = 1 statement takes 0.17 seconds in MATLAB 7.4. The B=B+speye(n) statement takes 0.0056 seconds, which is 30 times faster. This is a contrived example but typical of the kind of performance difference you would see in matrices and graphs arising in practice.

    If the diagonal of A started out with -1’s, then you would get cancellation (that doesn’t happen here). So do this instead:

    A = spones(A) + speye(n)
    

    if that’s an issue.

    Yet another way to do it, and faster still, is to do the following to construct A in the first place:

    A = sparse([pairs(:,1) (1:n)'], [pairs(:,2) (1:n)'], 1);
    

    Moral: Avoid A(i,j) = x if at all possible (sparse sub assignment). Less of a problem but still slow is the sparse sub reference, x = A(i,j).

  9. Tim Davis replied on :

    For comparison, for this matrix x=A\b where b is a dense vector takes 0.45 seconds and does 56 million floating-point operations. The A(1:(n+1):end)=1 statement should take at most O(nnz(A)) time, where in this case nnz(A) is about 96,000. So A\b is working at 1.2 Gflops (excluding all the integer work that it also has to do…), and A(1:(n+1):end)=1 is working at 56 thousand “index operations per second”. That’s a factor almost exactly one thousand times slower.

    I realize of course that this connected components problem has nothing to do with solving A*x=b. What this does point out is just how slow, by comparison, sparse sub assignment truly is.

  10. Tim Davis replied on :

    uhh … silly me … I can’t divide. The subsref is working at 560 thousand “ops per second”. x=A\b is working at 1.2 Gflops. So subsref is over Two Thousand times slower than backslash, not one thousand.

    Sorry for the confusion.

  11. Ake Lu replied on :

    Dear Steve,

    I would like to count the size of connectors for each “cluster” in the following example
    bw =

    0 0 0 0 0 0 1
    0 1 1 0 0 0 1
    0 1 1 0 0 0 0
    0 0 0 1 1 1 0
    0 0 0 1 1 1 0
    0 0 0 0 0 0 0
    Then based on 8-connectivity, define
    cluster=
    0 0 0 0 0 0 2
    0 1 1 0 0 0 2
    0 1 1 0 0 0 0
    0 0 0 1 1 1 0
    0 0 0 1 1 1 0
    0 0 0 0 0 0 0
    Thus,
    cluster one has 10 pixels and cluster 2= 2 pixels. How could I use the above code to get the size of cluster?

    Best regards,

    Ake

  12. Steve replied on :

    Ake—Use bwlabel and regionprops.

  13. Tim Davis replied on :

    dmperm in MATLAB 7.5 or later has been modified, since this article was first written (MATLAB is now using the dmperm in CSparse). Among other things, dmperm now returns each connected component in their natural order. That is, p(r(1):r(2)-1) is now [1 2 3 4], not [3 4 2 1], for this example.

    There’s nothing special to the ordering of nodes within a single connected component, so it makes more sense to return them in a natural order instead of a rather arbitrarily jumbled order.

    Does this have any noticeable effect on bwlabel or other functions?

  14. Steve replied on :

    Tim—I don’t think it affects bwlabel. I can’t speak to the question of “other functions.” :-)

  15. Tim Davis replied on :

    OK. I was thinking it might affect other functions such as mindread.m, or time-transfer-related functions such as doc(’who’) … ;-)

  16. Steve replied on :

    Tim—I knew that. ;-)

  17. angel replied on :

    am doing project on handwritten character segmentation….can i know the BFS(Breadth First Search) code for finding out the connected components?

  18. Steve replied on :

    Angel—We provide connected component code in the form of the Image Processing Toolbox. :-) If you have that, there’s no need to implement anything yourself. If you can’t use the Image Processing Toolbox, then there’s a wealth of information readily accessible on the web about connected-component labeling and breadth-first searching algorithms.

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Steve Eddins manages the Image & Geospatial development team at The MathWorks and coauthored Digital Image Processing Using MATLAB. He writes here about image processing concepts, algorithm implementations, and MATLAB.

  • murat: Hi Steve, I have an rgb image of a kind of cream and it contains some small black particles (black dots). In...
  • Steve: Ernest—Look at setting the FaceColor property. The code for setting that is shown on the page you asked...
  • Ernest Miller: Hi Steve, Understood. However, can you explain how to change the colors? Thanks, Ernest
  • Jan: Hi Steve Very useful code, yet what if I parts of my rotated+translated object are outside the original...
  • Steve: MoHDa—It might be possible. You’ll need to use one of the options that produces closed edge...
  • MoHDa: I have one question about the ROIPOLY: I have an image with stripes, I use the “edge” command for...
  • Steve: Shahn—My November 17, 2006 post shows you how to do it.
  • Steve: Kay-Uwe—Thanks for following up. I am planning to make it easier to use test directories in a package....
  • shahn: Hello Steve Instead of superimposing a star on the image to show the centroide. How would you superimpose a...
  • Kay-Uwe: Having TestSuite.fromPackag e() would be nice to have, but so far using simple “test” subdirs...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.