Steve on Image Processing

May 25th, 2007

Connected component labeling - Part 6

In this part of the connected component labeling series, I'll finally get to one of the algorithms actually used in the Image Processing Toolbox. It's based on a technique called union-find, as described in Sedgewick's Algorithms in C, Addison-Wesley, 1998, pp. 11-20. This algorithm is designed to be able to quickly form the union of two sets, and also to be able to quickly find which set contains a given element.

Here's the example I'll use to illustrate the method. On the left is a small binary image containing a single eight-connected component. We start by making each nonzero pixel a node in a graph. The nodes are numbered as shown on the right.

In its initial state, the graph contains 7 unconnected nodes, one for each nonzero pixel:

We can represent this graph in MATLAB by a vector:

nodes = 1:7
nodes =

     1     2     3     4     5     6     7

In this representation, nodes(k) gives the parent of node k. A node that is its own parent (that is, nodes(k) equals k) is a root node.

As we scan the image columnwise, we encounter pairs of pixels that are connected to each other. The first such pair is that pixel 2 is connected to pixel 1. We find the root node for the set containing pixel 2 (node 2) and the root node for the set containing pixel 1 (node 1). Since they are different, we form the union of those two sets simply by making the root node of one of the sets point to the root node of the other.

nodes(2) = 1
nodes =

     1     1     3     4     5     6     7

The second connection encountered in the scan is that pixel 3 is connected to pixel 2. The root node for the set containing pixel 3 is node 3; the root node for the set containing pixel 2 is node 1. So we connect node 3 to node 1.

nodes(3) = 1
nodes =

     1     1     1     4     5     6     7

Now continue to process connections. Pixel 4 is connected to pixel 3, so connect node 4 to node 1:

nodes(4) = 1
nodes =

     1     1     1     1     5     6     7

Pixel 6 is connected to pixel 5, so connect node 6 to node 5:

nodes(6) = 5
nodes =

     1     1     1     1     5     5     7

Pixel 7 is connected to pixel 6, so connect node 7 to node 5:

nodes(7) = 5
nodes =

     1     1     1     1     5     5     5

Now for the moment of truth. So far our graph still contains two sets (trees). But now we see that pixel 7 is connected to pixel 4. When we connect their corresponding roots (5 to 1), we are finally left with a single set:

nodes(5) = 1
nodes =

     1     1     1     1     1     5     5

According to Sedgewick, there are two important things to do to the graph while processing pairs:

  • When forming a union by connecting two trees in the graph, always connect the smaller of the two trees to the larger one. This prevents the growth of long paths in the trees.
  • When "walking" up a chain of nodes to find the root node, "flatten out" the tree using a technique called path compression. Here's an example showing path compression by halving. The tree on the left has a long path from node 7 up to node 1. As you follow the path, take two nodes at a time and make the bottom node point to the same parent as the top node. That give you the tree on the right.

If you keep doing this, then over time the trees in the graph stay almost completely flat. The flat tree is what makes the "find" operation as quick as the "union" operation.

The Image Processing Toolbox function bwlabeln, which performs connected component labeling for arbitrary dimension and arbitrary connectivity, uses the union-find technique. If you have the product, you can see the key parts of the implementation in the source files bwlabelnmex.cpp and unionfind.c, which are in toolbox\images\images\private.

I'll try to wrap this series up with one more post, which will be about the two-dimensional algorithm used by the function bwlabel.


Get the MATLAB code

Published with MATLAB® 7.4

16 Responses to “Connected component labeling - Part 6”

  1. Martin Isenburg replied on :

    hi steve,

    i understand that bwlabeln uses union-find to resolve the equivalences and that bwlabel uses dmperm for this task. but you write that bwlabel also uses a decomposition of pixels into runs. does bwlabeln also do this decomposition into runs (and therefore performs union-find on runs of pixels rather than on individual pixels)?

    cheers

    martin

  2. Steve replied on :

    Martin—Good question. No, bwlabeln does not use runs. It processes pixels individually.

  3. Martin Isenburg replied on :

    May I ask what is the rational behind the design decision to have bwlabeln work on individual pixels rather than on runs of pixels?

  4. Steve replied on :

    Martin—Excellent question. I just posted a detailed answer.

  5. Ashish Anant Saxena replied on :

    I am working to recognize handwritten characters .
    I need to extract out the individual characters from the entire page of text image. I try out image component labeling to get the connected components……but it is showing the region of 0’s bounded by 1’s as the components too. But what i wanna have is only the different components of 1’s …….plz help me .

  6. Ashish Anant Saxena replied on :

    I hope i can get it by modifying the code of bwlabel a bit to search only the different components of 1’s……will try that way

  7. Steve replied on :

    Ashish—I don’t understand what you mean. The bwlabel function labels only the nonzero elements of the input matrix. It does not label interior zero-valued elements.

  8. Ashish Anant Saxena replied on :

    Thank you Steve. I am sorry I wrongly interpreted the components. Please tell me the order in which the components are labelled
    …?? For my scanned text image it numbered the components
    at the bottom left as component no. 1 and the one at the top right corner as n0.2……I need to have the components labeled in the sequence as they appear on scanning from top left to right(in rows) moving to next row after reaching the last column of current row.
    I tried to manipulate the result of labeled components i got ……but i need a function to sort cell
    array using a particular dimension of cell array as key.
    Is there any inbuilt function for that.?

  9. Ashish Anant Saxena replied on :

    correction in previous comment…….line 4th “one at the top right corner as the last component”

  10. Steve replied on :

    Ashish—bwlabel scans image pixels columnwise, since that’s the way matrix elements are stored in memory. If you want a different order, you could transpose your matrix, or you could sort based on some other criterion. To sort by a different criterion, create a vector containing the values to be sorted, sort them and capture the 2nd output of the sort, and then use the 2nd output to rearrange your cell array.

  11. Jackey replied on :

    hello man, i am chinese, i just want to know some detail about the algorithm of the path compression by halving in C,thanks!

  12. Steve replied on :

    Jackey—See Algorithms in C by Sedgewick, 3rd edition, pp. 18–21.

  13. Shannon replied on :

    just to clarify, this can be used to represent cell connectivity, after you have segmented an image? like you have an image of red blood cells or something and want to figure out how many cells each cell is in contact with… bwlabeln will do it? this might just solve my problems.

    and i’m also doing porosity of a foam. after i do bwlabel and segmentation and all that, how can i subtract the area of the cells (L, right?) from the “nothing” black background to get the dead space/air/pores?

    if you can help or suggest any other code to help, that would be wonderful!

  14. Steve replied on :

    Shannon—No, bwlabel and bwlabeln compute the connected components of pixels, not cells. If cells are touching in the output of your segmentation step, then bwlabel will consider them to be part of the same object.

    You can count background pixels in a binary image bw with something like this:

    sum(~bw(:))
    
  15. Ardy replied on :

    Hi, i would like to know more details about union find.

    like in an image, may be:

    note 1 can be connected to 2, 3, 5
    note 2 can connect to 4, 6
    note 3 can connect to 2, 6

    how u put all the linked note together effectively?

  16. Steve replied on :

    Ardy—There is a lot information available online about the union find algorithm. I can also recommend Sedgewick, Algorithms in C, 3rd ed, Parts 1-4, pp. 10-23.

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Steve Eddins manages the Image & Geospatial development team at The MathWorks and coauthored Digital Image Processing Using MATLAB. He writes here about image processing concepts, algorithm implementations, and MATLAB.

  • murat: Hi Steve, I have an rgb image of a kind of cream and it contains some small black particles (black dots). In...
  • Steve: Ernest—Look at setting the FaceColor property. The code for setting that is shown on the page you asked...
  • Ernest Miller: Hi Steve, Understood. However, can you explain how to change the colors? Thanks, Ernest
  • Jan: Hi Steve Very useful code, yet what if I parts of my rotated+translated object are outside the original...
  • Steve: MoHDa—It might be possible. You’ll need to use one of the options that produces closed edge...
  • MoHDa: I have one question about the ROIPOLY: I have an image with stripes, I use the “edge” command for...
  • Steve: Shahn—My November 17, 2006 post shows you how to do it.
  • Steve: Kay-Uwe—Thanks for following up. I am planning to make it easier to use test directories in a package....
  • shahn: Hello Steve Instead of superimposing a star on the image to show the centroide. How would you superimpose a...
  • Kay-Uwe: Having TestSuite.fromPackag e() would be nice to have, but so far using simple “test” subdirs...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.