In this part of the connected component labeling series, I'll finally get to one of the algorithms actually used in the Image Processing Toolbox. It's based on a technique called union-find, as described in Sedgewick's Algorithms in C, Addison-Wesley, 1998, pp. 11-20. This algorithm is designed to be able to quickly form the union of two sets, and also to be able to quickly find which set contains a given element.
Here's the example I'll use to illustrate the method. On the left is a small binary image containing a single eight-connected component. We start by making each nonzero pixel a node in a graph. The nodes are numbered as shown on the right.
In its initial state, the graph contains 7 unconnected nodes, one for each nonzero pixel:
We can represent this graph in MATLAB by a vector:
nodes = 1:7
nodes =
1 2 3 4 5 6 7
In this representation, nodes(k) gives the parent of node k. A node that is its own parent (that is, nodes(k) equals k) is a root node.
As we scan the image columnwise, we encounter pairs of pixels that are connected to each other. The first such pair is that pixel 2 is connected to pixel 1. We find the root node for the set containing pixel 2 (node 2) and the root node for the set containing pixel 1 (node 1). Since they are different, we form the union of those two sets simply by making the root node of one of the sets point to the root node of the other.
nodes(2) = 1
nodes =
1 1 3 4 5 6 7
The second connection encountered in the scan is that pixel 3 is connected to pixel 2. The root node for the set containing pixel 3 is node 3; the root node for the set containing pixel 2 is node 1. So we connect node 3 to node 1.
nodes(3) = 1
nodes =
1 1 1 4 5 6 7
Now continue to process connections. Pixel 4 is connected to pixel 3, so connect node 4 to node 1:
nodes(4) = 1
nodes =
1 1 1 1 5 6 7
Pixel 6 is connected to pixel 5, so connect node 6 to node 5:
nodes(6) = 5
nodes =
1 1 1 1 5 5 7
Pixel 7 is connected to pixel 6, so connect node 7 to node 5:
nodes(7) = 5
nodes =
1 1 1 1 5 5 5
Now for the moment of truth. So far our graph still contains two sets (trees). But now we see that pixel 7 is connected to pixel 4. When we connect their corresponding roots (5 to 1), we are finally left with a single set:
nodes(5) = 1
nodes =
1 1 1 1 1 5 5
According to Sedgewick, there are two important things to do to the graph while processing pairs:
- When forming a union by connecting two trees in the graph, always connect the smaller of the two trees to the larger one. This prevents the growth of long paths in the trees.
- When "walking" up a chain of nodes to find the root node, "flatten out" the tree using a technique called path compression. Here's an example showing path compression by halving. The tree on the left has a long path from node 7 up to node 1. As you follow the path, take two nodes at a time and make the bottom node point to the same parent as the top node. That give you the tree on the right.
If you keep doing this, then over time the trees in the graph stay almost completely flat. The flat tree is what makes the "find" operation as quick as the "union" operation.
The Image Processing Toolbox function bwlabeln, which performs connected component labeling for arbitrary dimension and arbitrary connectivity, uses the union-find technique. If you have the product, you can see the key parts of the implementation in the source files bwlabelnmex.cpp and unionfind.c, which are in toolbox\images\images\private.
I'll try to wrap this series up with one more post, which will be about the two-dimensional algorithm used by the function bwlabel.
Get
the MATLAB code
Published with MATLAB® 7.4



hi steve,
i understand that bwlabeln uses union-find to resolve the equivalences and that bwlabel uses dmperm for this task. but you write that bwlabel also uses a decomposition of pixels into runs. does bwlabeln also do this decomposition into runs (and therefore performs union-find on runs of pixels rather than on individual pixels)?
cheers
martin
Martin—Good question. No, bwlabeln does not use runs. It processes pixels individually.
May I ask what is the rational behind the design decision to have bwlabeln work on individual pixels rather than on runs of pixels?
Martin—Excellent question. I just posted a detailed answer.
I am working to recognize handwritten characters .
I need to extract out the individual characters from the entire page of text image. I try out image component labeling to get the connected components……but it is showing the region of 0’s bounded by 1’s as the components too. But what i wanna have is only the different components of 1’s …….plz help me .
I hope i can get it by modifying the code of bwlabel a bit to search only the different components of 1’s……will try that way
Ashish—I don’t understand what you mean. The bwlabel function labels only the nonzero elements of the input matrix. It does not label interior zero-valued elements.
Thank you Steve. I am sorry I wrongly interpreted the components. Please tell me the order in which the components are labelled
…?? For my scanned text image it numbered the components
at the bottom left as component no. 1 and the one at the top right corner as n0.2……I need to have the components labeled in the sequence as they appear on scanning from top left to right(in rows) moving to next row after reaching the last column of current row.
I tried to manipulate the result of labeled components i got ……but i need a function to sort cell
array using a particular dimension of cell array as key.
Is there any inbuilt function for that.?
correction in previous comment…….line 4th “one at the top right corner as the last component”
Ashish—bwlabel scans image pixels columnwise, since that’s the way matrix elements are stored in memory. If you want a different order, you could transpose your matrix, or you could sort based on some other criterion. To sort by a different criterion, create a vector containing the values to be sorted, sort them and capture the 2nd output of the sort, and then use the 2nd output to rearrange your cell array.
hello man, i am chinese, i just want to know some detail about the algorithm of the path compression by halving in C,thanks!
Jackey—See Algorithms in C by Sedgewick, 3rd edition, pp. 18–21.
just to clarify, this can be used to represent cell connectivity, after you have segmented an image? like you have an image of red blood cells or something and want to figure out how many cells each cell is in contact with… bwlabeln will do it? this might just solve my problems.
and i’m also doing porosity of a foam. after i do bwlabel and segmentation and all that, how can i subtract the area of the cells (L, right?) from the “nothing” black background to get the dead space/air/pores?
if you can help or suggest any other code to help, that would be wonderful!
Shannon—No, bwlabel and bwlabeln compute the connected components of pixels, not cells. If cells are touching in the output of your segmentation step, then bwlabel will consider them to be part of the same object.
You can count background pixels in a binary image bw with something like this:
Hi, i would like to know more details about union find.
like in an image, may be:
note 1 can be connected to 2, 3, 5
note 2 can connect to 4, 6
note 3 can connect to 2, 6
how u put all the linked note together effectively?
Ardy—There is a lot information available online about the union find algorithm. I can also recommend Sedgewick, Algorithms in C, 3rd ed, Parts 1-4, pp. 10-23.
Hi,
I need to detect text (font) region in a image (a high resolution text imgae).. I just want to know , will this algo help me out?
thanks
Anand
Anand—Connected-component labeling will probably be useful to you, but you don’t need to implement the algorithm yourself. Just use bwlabel (or bwconncomp if you already have the R2009a release) from the Image Processing Toolbox.