Gray-scale dilation equation
A question came into tech support a month or so ago regarding the documentation for imdilate. The question concerned an apparent discrepancy in the equations for binary and grayscale dilation. Here's the formula given for binary dilation:
$$A \oplus B = \{ z | (\hat{B})_z \cap A \neq 0 \}$$
where $\hat{B}$ is the reflection of the structuring element $B$. The reference page goes on to say, "in other words, it is the set of pixel locations $z$ where the reflected structuring element overlaps with foreground pixels of $A$ when translated to $z$."
The reference page then gives another equation for gray-scale dilation.
$$(A \oplus B)(x,y) = \max \{ A(x - x', y - y') + B(x',y') | (x',y') \in D_B\}$$
where $D_B$ is the domain of the structuring element and $A(x,y)$ is assumed to be $-\infty$ outside the domain of the image.
The user who contacted tech support wondered if there might be an error in the equation for gray-scale dilation because the equation doesn't show a reflection of the structuring element. The case was escalated to me for comment.
The gray-scale dilation equation above is correct. (Well, it's correct for about half the world. The other half uses a slightly different form.)
But there are several different but equivalent mathematical equations that can be used to define dilation. Each of these equations corresponds to a different but equivalent geometric interpretation. The equation above can be interpreted as follows:
To compute the output at $(x,y)$, flip (or reflect) $A$ through the origin and then slide the origin pixel over to $(x,y)$. Form the sums of the $A$ pixels with the structuring element heights underneath. Find the maximum of these sums and record the result as the output at $(x,y)$.
As it turns out, dilation is commutative. That suggests that there is a form of the equation, and a corresponding geometric interpretation, in which the structuring element is reflected instead of the image. We can take the first step in that direction via a substitution of variables. Let $q = x - x'$ and $r = y - y'$. Then:
$$(A \oplus B)(x,y) = \max \{ A(q,r) + B(x-q, y-r) | (x-q, y-r) \in D_B \}$$
Since $q$ and $r$ are "dummy" variables, we can rewrite them as $x'$ and $y'$.
$$(A \oplus B)(x,y) = \max \{ A(x',y') + B(x-x',y-y') | (x-x',y-y') \in D_B \}$$
This second equation has the geometric interpretation of leaving the image in placing, flipping (reflecting) and sliding the structuring element, performing sums of the corresponding image pixels and structuring element heights, and then taking the maximum of the sums.
So which equation should we use? Well, both are correct. Which form to use for a real implementation is completely up to the implementer. And these are not the only two equations and geometric interpretations that are valid.
I do think, though, that there is some merit in modifying the gray-scale dilation equation in our documentation to make it more consistent with the form used for binary dilation.
Dear reader, I am curious: do you use nonflat grayscale dilation or erosion in your work? As far as I can tell, there are not many practical applications for using nonflat structuring elements. If you have a use for it, please leave me a comment below.
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.