Steve on Image Processing

May 25th, 2007

Connected component labeling – Part 6

In this part of the connected component labeling series, I'll finally get to one of the algorithms actually used in the Image Processing Toolbox. It's based on a technique called union-find, as described in Sedgewick's Algorithms in C, Addison-Wesley, 1998, pp. 11-20. This algorithm is designed to be able to quickly form the union of two sets, and also to be able to quickly find which set contains a given element.

Here's the example I'll use to illustrate the method. On the left is a small binary image containing a single eight-connected component. We start by making each nonzero pixel a node in a graph. The nodes are numbered as shown on the right.

In its initial state, the graph contains 7 unconnected nodes, one for each nonzero pixel:

We can represent this graph in MATLAB by a vector:

nodes = 1:7
nodes =

     1     2     3     4     5     6     7

In this representation, nodes(k) gives the parent of node k. A node that is its own parent (that is, nodes(k) equals k) is a root node.

As we scan the image columnwise, we encounter pairs of pixels that are connected to each other. The first such pair is that pixel 2 is connected to pixel 1. We find the root node for the set containing pixel 2 (node 2) and the root node for the set containing pixel 1 (node 1). Since they are different, we form the union of those two sets simply by making the root node of one of the sets point to the root node of the other.

nodes(2) = 1
nodes =

     1     1     3     4     5     6     7

The second connection encountered in the scan is that pixel 3 is connected to pixel 2. The root node for the set containing pixel 3 is node 3; the root node for the set containing pixel 2 is node 1. So we connect node 3 to node 1.

nodes(3) = 1
nodes =

     1     1     1     4     5     6     7

Now continue to process connections. Pixel 4 is connected to pixel 3, so connect node 4 to node 1:

nodes(4) = 1
nodes =

     1     1     1     1     5     6     7

Pixel 6 is connected to pixel 5, so connect node 6 to node 5:

nodes(6) = 5
nodes =

     1     1     1     1     5     5     7

Pixel 7 is connected to pixel 6, so connect node 7 to node 5:

nodes(7) = 5
nodes =

     1     1     1     1     5     5     5

Now for the moment of truth. So far our graph still contains two sets (trees). But now we see that pixel 7 is connected to pixel 4. When we connect their corresponding roots (5 to 1), we are finally left with a single set:

nodes(5) = 1
nodes =

     1     1     1     1     1     5     5

According to Sedgewick, there are two important things to do to the graph while processing pairs:

  • When forming a union by connecting two trees in the graph, always connect the smaller of the two trees to the larger one. This prevents the growth of long paths in the trees.
  • When "walking" up a chain of nodes to find the root node, "flatten out" the tree using a technique called path compression. Here's an example showing path compression by halving. The tree on the left has a long path from node 7 up to node 1. As you follow the path, take two nodes at a time and make the bottom node point to the same parent as the top node. That give you the tree on the right.

If you keep doing this, then over time the trees in the graph stay almost completely flat. The flat tree is what makes the "find" operation as quick as the "union" operation.

The Image Processing Toolbox function bwlabeln, which performs connected component labeling for arbitrary dimension and arbitrary connectivity, uses the union-find technique. If you have the product, you can see the key parts of the implementation in the source files bwlabelnmex.cpp and unionfind.c, which are in toolbox\images\images\private.

I'll try to wrap this series up with one more post, which will be about the two-dimensional algorithm used by the function bwlabel.


Get the MATLAB code

Published with MATLAB® 7.4

25 Responses to “Connected component labeling – Part 6”

  1. Martin Isenburg replied on :

    hi steve,

    i understand that bwlabeln uses union-find to resolve the equivalences and that bwlabel uses dmperm for this task. but you write that bwlabel also uses a decomposition of pixels into runs. does bwlabeln also do this decomposition into runs (and therefore performs union-find on runs of pixels rather than on individual pixels)?

    cheers

    martin

  2. Steve replied on :

    Martin—Good question. No, bwlabeln does not use runs. It processes pixels individually.

  3. Martin Isenburg replied on :

    May I ask what is the rational behind the design decision to have bwlabeln work on individual pixels rather than on runs of pixels?

  4. Steve replied on :

    Martin—Excellent question. I just posted a detailed answer.

  5. Ashish Anant Saxena replied on :

    I am working to recognize handwritten characters .
    I need to extract out the individual characters from the entire page of text image. I try out image component labeling to get the connected components……but it is showing the region of 0′s bounded by 1′s as the components too. But what i wanna have is only the different components of 1′s …….plz help me .

  6. Ashish Anant Saxena replied on :

    I hope i can get it by modifying the code of bwlabel a bit to search only the different components of 1′s……will try that way

  7. Steve replied on :

    Ashish—I don’t understand what you mean. The bwlabel function labels only the nonzero elements of the input matrix. It does not label interior zero-valued elements.

  8. Ashish Anant Saxena replied on :

    Thank you Steve. I am sorry I wrongly interpreted the components. Please tell me the order in which the components are labelled
    …?? For my scanned text image it numbered the components
    at the bottom left as component no. 1 and the one at the top right corner as n0.2……I need to have the components labeled in the sequence as they appear on scanning from top left to right(in rows) moving to next row after reaching the last column of current row.
    I tried to manipulate the result of labeled components i got ……but i need a function to sort cell
    array using a particular dimension of cell array as key.
    Is there any inbuilt function for that.?

  9. Ashish Anant Saxena replied on :

    correction in previous comment…….line 4th “one at the top right corner as the last component”

  10. Steve replied on :

    Ashish—bwlabel scans image pixels columnwise, since that’s the way matrix elements are stored in memory. If you want a different order, you could transpose your matrix, or you could sort based on some other criterion. To sort by a different criterion, create a vector containing the values to be sorted, sort them and capture the 2nd output of the sort, and then use the 2nd output to rearrange your cell array.

  11. Jackey replied on :

    hello man, i am chinese, i just want to know some detail about the algorithm of the path compression by halving in C,thanks!

  12. Steve replied on :

    Jackey—See Algorithms in C by Sedgewick, 3rd edition, pp. 18–21.

  13. Shannon replied on :

    just to clarify, this can be used to represent cell connectivity, after you have segmented an image? like you have an image of red blood cells or something and want to figure out how many cells each cell is in contact with… bwlabeln will do it? this might just solve my problems.

    and i’m also doing porosity of a foam. after i do bwlabel and segmentation and all that, how can i subtract the area of the cells (L, right?) from the “nothing” black background to get the dead space/air/pores?

    if you can help or suggest any other code to help, that would be wonderful!

  14. Steve replied on :

    Shannon—No, bwlabel and bwlabeln compute the connected components of pixels, not cells. If cells are touching in the output of your segmentation step, then bwlabel will consider them to be part of the same object.

    You can count background pixels in a binary image bw with something like this:

    sum(~bw(:))
    
  15. Ardy replied on :

    Hi, i would like to know more details about union find.

    like in an image, may be:

    note 1 can be connected to 2, 3, 5
    note 2 can connect to 4, 6
    note 3 can connect to 2, 6

    how u put all the linked note together effectively?

  16. Steve replied on :

    Ardy—There is a lot information available online about the union find algorithm. I can also recommend Sedgewick, Algorithms in C, 3rd ed, Parts 1-4, pp. 10-23.

  17. anand replied on :

    Hi,
    I need to detect text (font) region in a image (a high resolution text imgae).. I just want to know , will this algo help me out?

    thanks
    Anand

  18. Steve replied on :

    Anand—Connected-component labeling will probably be useful to you, but you don’t need to implement the algorithm yourself. Just use bwlabel (or bwconncomp if you already have the R2009a release) from the Image Processing Toolbox.

  19. Evan replied on :

    Hi,
    I am currently working on writing my own connected components code in MATLAB. I have read through all of your posts and find them very helpful in the theory part of the code. I have tried searching online for a example of the algorithm that I can implement in MATLAB but i cant find any example of the equivalence matrix or union find codes. My question is what the pseudocode for doing the union-find/equivalence matrix. I have also not had any luck in getting my hands on any of the recommended books through interlibrary loan. Any help would be awesome! Thanks.

  20. Steve replied on :

    Evan—See the pseudocode and external links on this page.

  21. Hilda replied on :

    Hello,
    I have a set of code here that I need to translate to C by using the RTW. Most of it adheres to the Embedded MatLab subset, but the function bwlabel is included. In RTW C code generation you can insert functions already written in C. So my question is, is it feasible to include the .cpps in the image toolbox/private folder that perform most of the bwlabel operations to bypass having to write the algorithm with embedded matlab compliant functions?

  22. Steve replied on :

    Hilda—No, I doubt that it is feasible.

  23. Yang replied on :

    Hi Steve:

    I am a little confused. Do you use disjoint sets in bwlabel? I have read the source code, and it seems that you use dmperm instead. Why? Is dmperm faster than disjoint sets?

    Yang

  24. Steve replied on :

    Yang—dmperm was already available as a fast, built-in MATLAB operation. Also, bwlabel is based the idea of labeling runs of pixels instead of labeling individual pixels, so the equivalence-class determination step is a relatively small fraction of the overall performance picture of bwlabel.

  25. Yang replied on :

    OK, thanks for your reply! ^_^


MathWorks
Steve Eddins is a software development manager in the MATLAB and image processing areas at MathWorks. Steve coauthored Digital Image Processing Using MATLAB. He writes here about image processing concepts, algorithm implementations, and MATLAB.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.