# Cleaning up scanned text – revisited

Have you ever used the distance transform?

For a binary image, the distance transform is the distance from every pixel to the nearest foreground (nonzero) pixel. (Sometimes you'll see it defined the other way around. It doesn't really matter that much; you just need to pay attention to whatever convention is being used and complement the image as needed.)

Last week I posted a method for cleaning up scanned text. The method distinguished between small dots that were far away from or were close to pixels belonging to large characters. In that post I used dilation and logical operators to identify small dots that were far away from characters. Later it occurred to me that one could also use the distance transform for this purpose.

Here again are the first few steps. Use bwareaopen to remove the small dots, and use a logical operator to identify the pixels that got removed.

url = 'https://blogs.mathworks.com/images/steve/186/scanned_page.png';
bw = page(735:1280, 11:511);
imshow(bw)
bw2 = imcomplement(bw);
bw3 = bwareaopen(bw2, 8);
removed = xor(bw2, bw3);

Next, use the distance transform to identify all pixels that are within a certain distance from foreground pixels in the image bw3.

D = bwdist(bw3);
within_hailing_distance = D <= 10;
imshow(within_hailing_distance)

Which removed pixels do we want to put back?

put_back_pixels = removed & within_hailing_distance;

Use a logical OR to put the pixels back.

gotta_go_the_holiday_party_is_about_to_start = bw3 | put_back_pixels;
imshow(~gotta_go_the_holiday_party_is_about_to_start)

I suspect that not many people know about the "extra" feature that's tucked inside bwdist. Not only can bwdist compute the distance transform for you, but it can also compute a result sometimes called the "feature transform." For each pixel, the feature transform tells you which foreground pixel is nearest. You get the feature transform simply by using a second output argument when you call bwdist.

Do you use this capability? Or do you think you might have a use for it? Please let me know. I'd love to hear about it.

Published with MATLAB® 7.5

|