Contents

Conditional Area Opening

From MATLAB Techniques for Image Processing by Steve Eddins.

Here's a problem posed by Craig Doolittle, reader of the "Steve on Image Processing" blog. Craig was writing MATLAB scripts to clean up scanned pages from old manuscripts. Here's a portion of a sample page from "Fragmentation of Service Projectiles," N.F. Mott, J. H. Wilkinson, and T.H. Wise, Ministry of Supply, Armament Research Department, Theoretical Research Report No. 37/44, December 1944. (The authors include a winner of the Nobel Prize for Physics and a winner of the Turing Award!)

page = imread('scanned_page.png');
bw = page(735:1280,11:511);
imshow(bw)
xlabel('Image courtesy Craig Doolittle')

Craig asked for suggestions on how to clean up the "noise" dots without removing portions of the text.

Let's just try a simple area opening first.

bw2 = imcomplement(bw);  % We need to make the foreground white
                         % instead of black!
bw3 = bwareaopen(bw2,8);
imshow(imcomplement(bw3))

Unfortunately, this approach has removed portions of some of the characters. Let's explore the before-and-after comparison with linked zooming.

h1 = subplot(1,2,1); imshow(bw)
h2 = subplot(1,2,2); imshow(imcomplement(bw3))
linkaxes([h1 h2])

Here's a method using bwlabel and regionprops to highlight the pixels that were removed.

removed = xor(bw2,bw3);
L = bwlabel(removed);
s = regionprops(L,'Centroid');
centroids = cat(1,s.Centroid);
imshow(bw)
hold on
plot(centroids(:,1),centroids(:,2), 'ro', 'MarkerSize', 15)
hold off

You can see that some of the removed dots were noise, while others were parts of the characters "e", "i", "m", etc.

One approach is to restore the removed objects that are "close" to the characters remaining after the area opening. We can do this using dilation and some logical operators.

bw4 = imdilate(bw3,strel('disk',5));
imshow(bw4)

The size of the disk structuring element gives us our definition of "close."

Now do a logical AND of the dilated characters with the pixels removed by bwareaopen. These are the pixels we are going to put back.

overlaps = bw4 & removed;
imshow(overlaps)

Use a logical OR to restore the removed pixels.

bwout = imcomplement(bw3 | overlaps);
imshow(bwout)

You could think of this process as a "conditional area opening."