Steve on Image Processing

March 20th, 2007

Connected component labeling – Part 3

Let's start looking at connected component labeling algorithms. In this post I want to explain how you can think of pixel neighborhood relationships in terms of a graph. We'll look at how to represent and visualize a graph in MATLAB, as well as how to compute the connected components of a graph.

Here's our sample binary image:

bw = logical([ ...
    0 0 0 0 0 0 0
    0 1 1 0 0 0 0
    0 1 1 0 0 0 0
    0 0 0 1 1 1 0
    0 0 0 1 1 1 0
    0 0 0 0 0 0 0 ])
bw =

     0     0     0     0     0     0     0
     0     1     1     0     0     0     0
     0     1     1     0     0     0     0
     0     0     0     1     1     1     0
     0     0     0     1     1     1     0
     0     0     0     0     0     0     0

Each foreground pixel becomes a node in the graph. Let's number the nodes. Theoretically, the node ordering doesn't matter very much, but for us it's convenient to use the same order as the find function:

0  0  0  0  0  0  0
0  1  3  0  0  0  0
0  2  4  0  0  0  0
0  0  0  5  7  9  0
0  0  0  6  8 10  0

Next, make a list of which nodes in the graph are connected to each other. Two nodes are connected if the corresponding pixels are neighbors. Using 4-connectivity, for example, node 1 is connected to nodes 2 and 3. Node 4 is connected to nodes 2 and 3. Let's capture these pairwise connections in a matrix:

pairs = [...
    1 3; 1 2;
    2 1; 2 4;
    3 1; 3 4;
    4 2; 4 3;
    5 6; 5 7;
    6 5; 6 8;
    7 5; 7 8; 7 9;
    8 6; 8 7; 8 10;
    9 7; 9 10;
    10 8; 10 9]
pairs =

     1     3
     1     2
     2     1
     2     4
     3     1
     3     4
     4     2
     4     3
     5     6
     5     7
     6     5
     6     8
     7     5
     7     8
     7     9
     8     6
     8     7
     8    10
     9     7
     9    10
    10     8
    10     9

From these pairs you can build up a graph representation called an adjacency matrix. An adjacency matrix has one row and column for each graph node. A nonzero element (p,q) means that there is a connection between nodes p and q. In other words, nodes p and q are adjacent. You can build up an adjacency matrix from the list of pairs by using accumarray, like this:

A = accumarray(pairs, 1)
A =

     0     1     1     0     0     0     0     0     0     0
     1     0     0     1     0     0     0     0     0     0
     1     0     0     1     0     0     0     0     0     0
     0     1     1     0     0     0     0     0     0     0
     0     0     0     0     0     1     1     0     0     0
     0     0     0     0     1     0     0     1     0     0
     0     0     0     0     1     0     0     1     1     0
     0     0     0     0     0     1     1     0     0     1
     0     0     0     0     0     0     1     0     0     1
     0     0     0     0     0     0     0     1     1     0

For graphs containing many nodes, you are better off constructing a sparse adjacency matrix, like this:

A = sparse(pairs(:,1), pairs(:,2), 1);
spy(A)
title('Spy plot of sparse adjacency matrix')

MATLAB has a function called dmperm, which computes the Dulmage-Mendelsohn decomposition of a matrix. If the matrix is an adjacency matrix, dmperm can be used to compute the connected components of the corresponding graph. Here's how to do it.

First, you have to put 1s on the diagonal of A:

A(1:11:end) = 1;

[Note added June 29, 2007: Tim Davis pointed out this the above line is an inefficient way to get ones onto the diagonal of a sparse matrix. Please see his comments for an explanation and for better ways to do it.]

Next, call dmperm:

[p,q,r,s] = dmperm(A);

The output p contains a permuted list of nodes:

p
p =

     3     4     2     1    10     9     7     8     6     5

And the output r tells you which nodes of p belong to the same connected component:

r
r =

     1     5    11

Specifically, the first connected component contains these nodes:

p(r(1):r(2)-1)
ans =

     3     4     2     1

The second connected component contains these nodes:

p(r(2):r(3)-1)
ans =

    10     9     7     8     6     5

These two sets of nodes are the connected components of our original binary image.

Let's finish this example by using 8-connectivity instead of 4-connectivity. Our list of pairs becomes:

pairs = [...
    1 2; 1 3; 1 4;
    2 1; 2 3; 2 4;
    3 1; 3 2; 3 4;
    4 1; 4 2; 4 3; 4 5;
    5 4; 5 6; 5 7; 5 8;
    6 5; 6 7; 6 8;
    7 5; 7 6; 7 8; 7 9; 7 10;
    8 5; 8 6; 8 7; 8 9; 8 10;
    9 7; 9 8; 9 10;
    10 7; 10 8; 10 9];

Now repeat the steps of building the sparse adjacency matrix and calling dmperm:

A = sparse(pairs(:,1), pairs(:,2), 1);
A(1:11:end) = 1;
[p,q,r,s] = dmperm(A);
r
r =

     1    11

[Note added June 29, 2007: See Tim's comment about A(1:11:end) = 1.]

Now there's only one connected component:

p(r(1):r(2)-1)
ans =

    10     9     8     7     6     5     4     3     2     1

Next time we'll look at computing connected components using a kind of "flood fill" approach.


Get the MATLAB code

Published with MATLAB® 7.4

37 Responses to “Connected component labeling – Part 3”

  1. JanKees replied on :

    Hi, Steve:

    It is easy to “find” and number the nodes with the “find” function, but how can I programatically find the “pairs” matrix (4- or 8-connected)? Is there any function in Matlab that can do this for us?

  2. Steve replied on :

    JanKees – Good question. I usually approach vectorized neighbor searching in MATLAB by using linear indexing and linear neighbor index offsets. It’s a pretty useful technique for a variety of problems, so I’ll insert a discussion of it into my planned postings on connected components.

  3. JanKees replied on :

    Hi, Steve:

    Thanks for your answer. I will wait anxiously :-) for that post.

  4. Srini replied on :

    Hi Steve- I’ve been using dmperm to find the connected components of a graph for a little while. Unfortunately, it seems to require a symmetric adjacency matrix. Is there an analogous function that works without explicit symmetrization of the matrix (which requires twice the storage). Or would I have to implement a DFS algorithm?

  5. Steve replied on :

    Srini – I’m not aware of anything already in MATLAB that’ll find connected components without using a symmetric adjacency matrix. You could implement depth-first or breadth-first search yourself. The binary image connected component method I described in part 4 of this series is breadth-first. Or you could try using something like union-find, which I’ll describe soon.

  6. Tim Davis replied on :

    dmperm doesn’t require a symmetric nonzero pattern. It can find the strongly connected components of a directed graph (is that what you are trying to compute?). For that kind of matrix, you need to ensure that A has a zero-free diagonal first. Otherwise, it will permute the rows and columns differently by first finding a maximum matching (a permutation for a zero-free diagonal). That permutation (part 1 of dmperm) is an unsymmetric permutation.

    You can skip that step by using cs_scc in CSparse (see the File Exchange), instead. It which finds the strongly connected components of a graph (which is part 2 of the dmperm algorithm). Also in CSparse (or CXSparse if you have a 64-bit MATLAB) is a faster dmperm and better than the one in the current version of MATLAB.

  7. Steve replied on :

    Tim—Thanks! For others interested in dmperm, here’s a reference that Tim just posted in comp.soft-sys.matlab: Direct Methods for Sparse Linear Systems, SIAM, 2006.

  8. Tim Davis replied on :

    One other comment. A(1:11:end) = 1 is a very slow way to add the diagonal to a matrix. It always best to avoid A(i,j) = … when constructing sparse matrices (see http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-finite-element-matrices-in-matlab/ )

    Here’s an example. First, I create two matrices with the same pattern. To ensure B=A actually makes a real copy and not a “shallow” one, I do B=A*.2, just for kicks. Cut and past the following:

    n=1000;
    A=sprand(n,n,.1) ;
    B=A*2;  % just to make sure the timing is fair
    tic; A(1:(n+1):end) = 1 ; toc
    tic ; B=B+speye(n) ; toc
    spones(A) - spones (B)
    

    The A(1:(n+1):end) = 1 statement takes 0.17 seconds in MATLAB 7.4. The B=B+speye(n) statement takes 0.0056 seconds, which is 30 times faster. This is a contrived example but typical of the kind of performance difference you would see in matrices and graphs arising in practice.

    If the diagonal of A started out with -1′s, then you would get cancellation (that doesn’t happen here). So do this instead:

    A = spones(A) + speye(n)
    

    if that’s an issue.

    Yet another way to do it, and faster still, is to do the following to construct A in the first place:

    A = sparse([pairs(:,1) (1:n)'], [pairs(:,2) (1:n)'], 1);
    

    Moral: Avoid A(i,j) = x if at all possible (sparse sub assignment). Less of a problem but still slow is the sparse sub reference, x = A(i,j).

  9. Tim Davis replied on :

    For comparison, for this matrix x=A\b where b is a dense vector takes 0.45 seconds and does 56 million floating-point operations. The A(1:(n+1):end)=1 statement should take at most O(nnz(A)) time, where in this case nnz(A) is about 96,000. So A\b is working at 1.2 Gflops (excluding all the integer work that it also has to do…), and A(1:(n+1):end)=1 is working at 56 thousand “index operations per second”. That’s a factor almost exactly one thousand times slower.

    I realize of course that this connected components problem has nothing to do with solving A*x=b. What this does point out is just how slow, by comparison, sparse sub assignment truly is.

  10. Tim Davis replied on :

    uhh … silly me … I can’t divide. The subsref is working at 560 thousand “ops per second”. x=A\b is working at 1.2 Gflops. So subsref is over Two Thousand times slower than backslash, not one thousand.

    Sorry for the confusion.

  11. Ake Lu replied on :

    Dear Steve,

    I would like to count the size of connectors for each “cluster” in the following example
    bw =

    0 0 0 0 0 0 1
    0 1 1 0 0 0 1
    0 1 1 0 0 0 0
    0 0 0 1 1 1 0
    0 0 0 1 1 1 0
    0 0 0 0 0 0 0
    Then based on 8-connectivity, define
    cluster=
    0 0 0 0 0 0 2
    0 1 1 0 0 0 2
    0 1 1 0 0 0 0
    0 0 0 1 1 1 0
    0 0 0 1 1 1 0
    0 0 0 0 0 0 0
    Thus,
    cluster one has 10 pixels and cluster 2= 2 pixels. How could I use the above code to get the size of cluster?

    Best regards,

    Ake

  12. Steve replied on :

    Ake—Use bwlabel and regionprops.

  13. Tim Davis replied on :

    dmperm in MATLAB 7.5 or later has been modified, since this article was first written (MATLAB is now using the dmperm in CSparse). Among other things, dmperm now returns each connected component in their natural order. That is, p(r(1):r(2)-1) is now [1 2 3 4], not [3 4 2 1], for this example.

    There’s nothing special to the ordering of nodes within a single connected component, so it makes more sense to return them in a natural order instead of a rather arbitrarily jumbled order.

    Does this have any noticeable effect on bwlabel or other functions?

  14. Steve replied on :

    Tim—I don’t think it affects bwlabel. I can’t speak to the question of “other functions.” :-)

  15. Tim Davis replied on :

    OK. I was thinking it might affect other functions such as mindread.m, or time-transfer-related functions such as doc(‘who’) … ;-)

  16. Steve replied on :

    Tim—I knew that. ;-)

  17. angel replied on :

    am doing project on handwritten character segmentation….can i know the BFS(Breadth First Search) code for finding out the connected components?

  18. Steve replied on :

    Angel—We provide connected component code in the form of the Image Processing Toolbox. :-) If you have that, there’s no need to implement anything yourself. If you can’t use the Image Processing Toolbox, then there’s a wealth of information readily accessible on the web about connected-component labeling and breadth-first searching algorithms.

  19. David Lalejini replied on :

    I found your example very useful for finding connected nodes in a large set of input pairs. I start with an Nx6 array of pixel matches between video frames, and needed to find all points in every frame that are actually of the same location on the ground. I use Delauney tessellation to come up with unique identifiers for every node, then build the sparse adjacency matrix and apply dmperm. Thanks for you example, it was a big help!

    %"matchedPixelUV" is an Nx6 array of matching pixel pairs between video
    %frames, in this order: frame# pixel_u pixel_v frame# pixel_u pixel_v.
    
    %Set up an array of unique points from all points in matchedPixelUV.
    points = [matchedPixelUV(:,1:3);matchedPixelUV(:,4:6)];
    points = unique(points,'rows');
    
    %Use Delaunay Tessellation to get the indices to the unique point list from
    %the original matchedPixelUV.  The k1 and k2 vectors are the matching node
    %pairs, which will be used to build the adjacency matrix.
    T = delaunayn(points);
    [k1 d1] = dsearchn(points,T,matchedPixelUV(:,1:3));
    [k2 d2] = dsearchn(points,T,matchedPixelUV(:,4:6));
    display(['distance checks should be zero: d1=' num2str(sum(d1)) ', d2=' num2str(sum(d1))])
    clear T d1 d2 matchedPixelUV
    
    %Now build a sparse adjacency matrix.  Make sure it is square by
    %temporarily putting ones in the upper left and lower right of the
    %diagonal, then remove them.
    A = sparse([1; k1; size(points,1)],[1; k2; size(points,1)],1);
    clear k1 k2
    A(1,1) = 0;
    A(size(points,1),size(points,1))=0;
    
    %Get the transpose of the matrix and add it back to it.  This is because
    %[k1 k2] is the "forward" connection between nodes, i.e. [1 2; 1 3; 2 3; 3 4].
    %However, it does not have the "return" paths [2 1; 3 1; 3 2; 4 3].
    A = A+A';
    % spy(A)
    
    %Add the identity matrix so we have ones along the diagonal.
    I = speye(size(A));
    A = A+I;
    clear I
    
    %Use dmperm (Dulmage-Mendelsohn decomposition) to compute the connected
    %components, see http://blogs.mathworks.com/steve/2007/03/20/connected-component-labeling-part-3/
    %for example.
    [p,q,r] = dmperm(A);
    
  20. Steve replied on :

    David—I’m glad you found it useful!

  21. A replied on :

    Hi, i am facing a problem. i have a network data which starts like this

    10010 17693
    10010 18139
    10010 23625
    10010 23785
    10010 23795
    10010 2497
    10010 48326
    10010 7690
    10010 9615
    10012 23720
    10013 7514
    and after 45000 lines of this array it ends up with a number
    9989 2647
    9989 3320
    9989 5384
    9989 5511
    9989 6453
    9989 6762
    9989 7568
    9989 8966
    9993 2497

    now problem is this that i tried to use the accumarray command but that is good only for few data. Actually i have to first find the adjacency matrix of this data and then i have to plot the connection graph between these nodes.

    Can anyone please help me in this?

  22. ronit replied on :

    Is there a given code for component algorithm that accepts a predicate to evaluale on each pair of neighboring pixels (not from binary image but for matrix with varies valus?)

  23. Steve replied on :

    Ronit—Not in the Image Processing Toolbox.

  24. ronit replied on :

    and in other place?

  25. Steve replied on :

    Ronit—Not that I personally know about. But you should look.

  26. Steve replied on :

    A—It sounds like you are trying to create an extremely large adjacency matrix. You’ll probably need to construct a sparse matrix. Read this post over again; it tells you how to construct a sparse adjacency matrix.

  27. Amey Dharwadker replied on :

    Sir,
    I am working on text extraction in colored images. I want to know if there is a connected component labeling function similar to bwlabel for grayscale images? I want to label the components and then threshold and binarize them. I do no want to operate on binary images initially.

  28. Steve replied on :

    Amey—The Image Processing Toolbox does not have such a function.

  29. stefano replied on :

    hi Steve! how can you “find” the number of nodes with the “find” function?

    Respect your dmperm ( dulmage-mendelsohn descomposition) I dont understand. Can you explain that algorithm? thank you in advance

  30. Steve replied on :

    Stefano—I don’t understand your first question. As to the second, I am not personally familiar the details of the dmperm algorithm; you’ll need to research that yourself.

  31. stefano replied on :

    ok Steve thanks anyway. but why do you have to put 1s on the diagonal of A so that you put dmperm? . what is the propose of dmperm? why do you use this command?

  32. Steve replied on :

    Stefano—To compute the connected components of the graph represented by the adjacency matrix.

  33. jorge replied on :

    do you know about K nearest neighbors? can you explain this into a post?
    http://www.mathworks.com/matlabcentral/fileexchange/15562-k-nearest-neighbors

  34. Steve replied on :

    Jorge—What do want to know about it?

  35. huambrillo replied on :

    Mr. Steve , why did you have to put 1s on the diagonal of A?

  36. Gulcan replied on :

    Hello Steve,
    I have a kinda-similar problem. I am trying to detect buildings and obtain a labeled image after segmentation. I have also binary mask of buildings and I want to train my system for each building object. Thus, I have a labeled image and a binary mask for the desired objects. However, there are several labels in the region of corresponding component in binary mask and I want to merge these regions in labeled image and give them a single label for each object. I mean, I want to merge building segment labels which I know they are part of a building object indeed. I guess traversing all the pixels of objects in the binary mask is the most cumbersome way, but I can’t think of any other thing. Any suggestions will be appreciated. Thanks a lot.

  37. Gulcan replied on :

    Sorry for the previous comment. I thought I was writing for the post titled “Almost-connected-component labeling”. I posted my comment there as well, thanks.


MathWorks
Steve Eddins is a software development manager in the MATLAB and image processing areas at MathWorks. Steve coauthored Digital Image Processing Using MATLAB. He writes here about image processing concepts, algorithm implementations, and MATLAB.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.