Steve on Image Processing

May 25th, 2007

Connected component labeling - Part 6

In this part of the connected component labeling series, I'll finally get to one of the algorithms actually used in the Image Processing Toolbox. It's based on a technique called union-find, as described in Sedgewick's Algorithms in C, Addison-Wesley, 1998, pp. 11-20. This algorithm is designed to be able to quickly form the union of two sets, and also to be able to quickly find which set contains a given element.

Here's the example I'll use to illustrate the method. On the left is a small binary image containing a single eight-connected component. We start by making each nonzero pixel a node in a graph. The nodes are numbered as shown on the right.

In its initial state, the graph contains 7 unconnected nodes, one for each nonzero pixel:

We can represent this graph in MATLAB by a vector:

nodes = 1:7
nodes =

     1     2     3     4     5     6     7

In this representation, nodes(k) gives the parent of node k. A node that is its own parent (that is, nodes(k) equals k) is a root node.

As we scan the image columnwise, we encounter pairs of pixels that are connected to each other. The first such pair is that pixel 2 is connected to pixel 1. We find the root node for the set containing pixel 2 (node 2) and the root node for the set containing pixel 1 (node 1). Since they are different, we form the union of those two sets simply by making the root node of one of the sets point to the root node of the other.

nodes(2) = 1
nodes =

     1     1     3     4     5     6     7

The second connection encountered in the scan is that pixel 3 is connected to pixel 2. The root node for the set containing pixel 3 is node 3; the root node for the set containing pixel 2 is node 1. So we connect node 3 to node 1.

nodes(3) = 1
nodes =

     1     1     1     4     5     6     7

Now continue to process connections. Pixel 4 is connected to pixel 3, so connect node 4 to node 1:

nodes(4) = 1
nodes =

     1     1     1     1     5     6     7

Pixel 6 is connected to pixel 5, so connect node 6 to node 5:

nodes(6) = 5
nodes =

     1     1     1     1     5     5     7

Pixel 7 is connected to pixel 6, so connect node 7 to node 5:

nodes(7) = 5
nodes =

     1     1     1     1     5     5     5

Now for the moment of truth. So far our graph still contains two sets (trees). But now we see that pixel 7 is connected to pixel 4. When we connect their corresponding roots (5 to 1), we are finally left with a single set:

nodes(5) = 1
nodes =

     1     1     1     1     1     5     5

According to Sedgewick, there are two important things to do to the graph while processing pairs:

  • When forming a union by connecting two trees in the graph, always connect the smaller of the two trees to the larger one. This prevents the growth of long paths in the trees.
  • When "walking" up a chain of nodes to find the root node, "flatten out" the tree using a technique called path compression. Here's an example showing path compression by halving. The tree on the left has a long path from node 7 up to node 1. As you follow the path, take two nodes at a time and make the bottom node point to the same parent as the top node. That give you the tree on the right.

If you keep doing this, then over time the trees in the graph stay almost completely flat. The flat tree is what makes the "find" operation as quick as the "union" operation.

The Image Processing Toolbox function bwlabeln, which performs connected component labeling for arbitrary dimension and arbitrary connectivity, uses the union-find technique. If you have the product, you can see the key parts of the implementation in the source files bwlabelnmex.cpp and unionfind.c, which are in toolbox\images\images\private.

I'll try to wrap this series up with one more post, which will be about the two-dimensional algorithm used by the function bwlabel.


Get the MATLAB code

Published with MATLAB® 7.4

18 Responses to “Connected component labeling - Part 6”

  1. Martin Isenburg replied on :

    hi steve,

    i understand that bwlabeln uses union-find to resolve the equivalences and that bwlabel uses dmperm for this task. but you write that bwlabel also uses a decomposition of pixels into runs. does bwlabeln also do this decomposition into runs (and therefore performs union-find on runs of pixels rather than on individual pixels)?

    cheers

    martin

  2. Steve replied on :

    Martin—Good question. No, bwlabeln does not use runs. It processes pixels individually.

  3. Martin Isenburg replied on :

    May I ask what is the rational behind the design decision to have bwlabeln work on individual pixels rather than on runs of pixels?

  4. Steve replied on :

    Martin—Excellent question. I just posted a detailed answer.

  5. Ashish Anant Saxena replied on :

    I am working to recognize handwritten characters .
    I need to extract out the individual characters from the entire page of text image. I try out image component labeling to get the connected components……but it is showing the region of 0’s bounded by 1’s as the components too. But what i wanna have is only the different components of 1’s …….plz help me .

  6. Ashish Anant Saxena replied on :

    I hope i can get it by modifying the code of bwlabel a bit to search only the different components of 1’s……will try that way

  7. Steve replied on :

    Ashish—I don’t understand what you mean. The bwlabel function labels only the nonzero elements of the input matrix. It does not label interior zero-valued elements.

  8. Ashish Anant Saxena replied on :

    Thank you Steve. I am sorry I wrongly interpreted the components. Please tell me the order in which the components are labelled
    …?? For my scanned text image it numbered the components
    at the bottom left as component no. 1 and the one at the top right corner as n0.2……I need to have the components labeled in the sequence as they appear on scanning from top left to right(in rows) moving to next row after reaching the last column of current row.
    I tried to manipulate the result of labeled components i got ……but i need a function to sort cell
    array using a particular dimension of cell array as key.
    Is there any inbuilt function for that.?

  9. Ashish Anant Saxena replied on :

    correction in previous comment…….line 4th “one at the top right corner as the last component”

  10. Steve replied on :

    Ashish—bwlabel scans image pixels columnwise, since that’s the way matrix elements are stored in memory. If you want a different order, you could transpose your matrix, or you could sort based on some other criterion. To sort by a different criterion, create a vector containing the values to be sorted, sort them and capture the 2nd output of the sort, and then use the 2nd output to rearrange your cell array.

  11. Jackey replied on :

    hello man, i am chinese, i just want to know some detail about the algorithm of the path compression by halving in C,thanks!

  12. Steve replied on :

    Jackey—See Algorithms in C by Sedgewick, 3rd edition, pp. 18–21.

  13. Shannon replied on :

    just to clarify, this can be used to represent cell connectivity, after you have segmented an image? like you have an image of red blood cells or something and want to figure out how many cells each cell is in contact with… bwlabeln will do it? this might just solve my problems.

    and i’m also doing porosity of a foam. after i do bwlabel and segmentation and all that, how can i subtract the area of the cells (L, right?) from the “nothing” black background to get the dead space/air/pores?

    if you can help or suggest any other code to help, that would be wonderful!

  14. Steve replied on :

    Shannon—No, bwlabel and bwlabeln compute the connected components of pixels, not cells. If cells are touching in the output of your segmentation step, then bwlabel will consider them to be part of the same object.

    You can count background pixels in a binary image bw with something like this:

    sum(~bw(:))
    
  15. Ardy replied on :

    Hi, i would like to know more details about union find.

    like in an image, may be:

    note 1 can be connected to 2, 3, 5
    note 2 can connect to 4, 6
    note 3 can connect to 2, 6

    how u put all the linked note together effectively?

  16. Steve replied on :

    Ardy—There is a lot information available online about the union find algorithm. I can also recommend Sedgewick, Algorithms in C, 3rd ed, Parts 1-4, pp. 10-23.

  17. anand replied on :

    Hi,
    I need to detect text (font) region in a image (a high resolution text imgae).. I just want to know , will this algo help me out?

    thanks
    Anand

  18. Steve replied on :

    Anand—Connected-component labeling will probably be useful to you, but you don’t need to implement the algorithm yourself. Just use bwlabel (or bwconncomp if you already have the R2009a release) from the Image Processing Toolbox.

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Steve Eddins manages the Image & Geospatial development team at The MathWorks and coauthored Digital Image Processing Using MATLAB. He writes here about image processing concepts, algorithm implementations, and MATLAB.

  • Steve: Kezia—Try imrotate.
  • kezia: steve, how to perform rotation of structuring element by 15 degrees. kindly answer my question. thank u kezia...
  • Steve: Tasha—I only accept comments that are relevant to the particular blog post or are questions or comments...
  • Tasha: Steve,I send you a comment here but still didn’t get any reply yet.I did not see my comment posted here...
  • Steve: Carsten—Thanks for your input.
  • Carsten: Another vote for either imtranslate.m, or at least a blurb in the imtransform help why pure translation...
  • Loren Shure: If you look towards the end of the fftfilt program, you will see that there’s a check to see if...
  • Steve: Sonja—My imwritesize submission on the MATLAB Central File Exchange might be helpful. It was posted...
  • Steve: Grant—Sorry, but it won’t be for R2010a. That development deadline has already passed.
  • Sonja: My publisher is wanting images for a new book to be 300 dpi. Only 5 of the 19 images are 300, the rest are...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.