Steve on Image Processing

Concepts, algorithms & MATLAB

Two-Dimensional Histograms 20

Posted by Steve Eddins,

Last week I showed this picture of M&Ms on my desk, and I used it to raise some vague questions about the definition of the color green. Today I'll use this picture as an opportunity to demonstrate how to construct and visualize a two-dimensional histogram.

url = '';
rgb = imread(url);

As a preparatory step, I'll first convert the pixels from the sRGB color space to the L*a*b* color space. I've written about the L*a*b* color before (09-May-2006, 19-Jun-2006). It does a couple of nice things. First, it separates perceived brightness (L*) from color (a* and b*). Second, as a rough approximation it is perceptually uniform. That is, Euclidean distances in the L*a*b* space correspond roughly uniformly to perceived color differences.

Here's how to do the conversion:

lab = applycform(rgb, makecform('srgb2lab'));

But I must note here that the values in lab aren't the normal mathematical quantities L*, a*, and b*. That's because those real values have been encoded (scaled, shifted, and quantized) to fit into unsigned 8-bit integers. So I'll use the function lab2double to convert these encoded values into the normal mathematical values.

lab = lab2double(lab);
a = lab(:,:,2);
b = lab(:,:,3);

I claimed above that the color information is in the a* and b* dimensions. Can we look at the a* and b* values in a way that lets us start to see the different M&M colors in the original image? I want to do it by constructing a two-dimensional histogram of the a* and b* values.

There are many different ways you could formulate and implement this in MATLAB. I've picked a method based on the functions interp1 and accumarray. The basic idea is to convert each a* value into a horizontal array subscript, convert each b* value into a vertical array subscript, and then use accumarray to accumulate values into a histogram.

First, let's set up a notion of our histogram bins in the a* and b* dimensions. I'll specify 101 bins according to their centers.

N = 101;
bin_centers = linspace(-100, 100, N);

I want to map a* and b* values such that bin centers become subscript values. I found it easy to think about this process in terms of the function interp1, but others might do it differently. I'll use linear interpolation and extrapolate to handle a* and b* values that are completely outside the range of the bin centers.

subscripts = 1:N;
ai = interp1(bin_centers, subscripts, a, 'linear', 'extrap');
bi = interp1(bin_centers, subscripts, b, 'linear', 'extrap');

Now I need to round the subscript matrices, ai and bi, and I also need to make sure all the subscript values lie between 1 and N.

ai = round(ai);
bi = round(bi);

ai = max(min(ai, N), 1);
bi = max(min(bi, N), 1);

Next we let accumarray do the heavy lifting of counting how many times each pair of subscript values occurs.

H = accumarray([bi(:), ai(:)], 1, [N N]);

H is our two-dimensional histogram. Here's my first attempt at visualizing it. The second argument to imshow, [], tells imshow to autoscale the range values in H for display.

imshow(H, [])

There are two issues I need to fix: the displayed image is a bit small, and the autoscaling didn't make the histogram detail very visible. Let's manually specify a different display scaling, and let's also ask imshow to display the image larger than normal.

imshow(H, [0 1000], 'InitialMagnification', 300)

How about axes tickmarks and labels? Can we get the real a* and b* ranges to show up? Sure. We tell imshow the real horizontal coordinates corresponding to the left column and right column (the "xdata"), as well as the vertical coordinates corresponding to the top row and bottom row (the "ydata").

xdata = [min(bin_centers), max(bin_centers)];
ydata = xdata;
imshow(H,[0 1000], 'InitialMagnification', 300, 'XData', xdata, 'YData', ydata)

Oops, still no axes tickmarks and labels. That's because imshow doesn't display them by default. But we can manually turn them on with the axis function.

axis on

That's our completed two-dimensional histogram and visualization.

I think that next time I'll explore segmenting in the a*-b* plane. But as usual I'm making up these blog posts as I go along, so no promises.

Get the MATLAB code

Published with MATLAB® 7.11


Comments are closed.

20 CommentsOldest to Newest

Vanessa replied on : 1 of 20

Hi Steve,

I just stumbled on these MATLAB blogs, and am enjoying them greatly. I use MATLAB primarily for neuroimaging with SPM, so working with color and graphics is slightly foreign to me. I was wondering if you might be able to, in a few sentences, explain exactly what the history is showing / teaching us, and how it relates to the original image? I would greatly appreciate a few details about this so I can think about how I might apply the technique to my own work.



Cmac replied on : 3 of 20

Vanessa — I know zip about the L*a*b* color scheme and really little about Matlab in general, but it looks like this plot is showing us the 7 distinct colors we perceive in the image, 6 different colors of M&Ms plus the color of the desk itself (which I’m guessing represents the largest, most intense blob). Each color can be represented by some pair of values (a*, b*), so the discrete intensities we see in this plot likely correspond to those colors. Is that about right? Love these posts, Steve, keep up the good work.

Patrick replied on : 5 of 20

Hello Steve,

first I want to thank you for writing this excellent blog, and then I want to ask you: why are you assuming that a* and b* are between -100 and 100? The lab2double(

function returns values in the range of -128 to 127 for a* and b*.

Yours sincerly Patrick

Steve replied on : 6 of 20

Patrick—No special reason. In the definition of the L*a*b* space, a* and b* are not bounded. The range -128 to 127 is an artifact of having gone through a uint8 encoding step.

kakxi replied on : 7 of 20

Hi steve, How do you get an N dimensional histogram? I work with hyperspectral images that contain 33 channels.

Steve replied on : 8 of 20

Kakxi—The same technique described here could be used to compute a 3-D histogram. Pass three subscripts and a 3-element size vector to accumarray.

Steve replied on : 10 of 20

Kakxi—That’s completely impractical. For example, if you used 10 bins along each dimension, you’d have a total of 10^33 bins.

Steve replied on : 12 of 20

Kakxi—Think carefully about what you are asking for. For example, if you wanted 10 bins along each dimension, you’d have 10^33 bins.

kakxi replied on : 13 of 20

Steve, I know it sounds impractical. I need to explain what I want to do. I have hyperspectral images with size 500x500x33. I want to obtain histograms which should be 10^33 for all 33 channels if i use 10 bins. I have spoken to my supervisor that it will not work as matlab has a maximum number of elements it accepts and also accumarray also has limits. He now wants me to use half of the channels i.e. 16. Most of the elements will obviously be zero. Is there a way i can do it with 16 channels? The best I have done is using 7 channels. I also heard I could implement it using a mex file but I dont know how to do this. Cheers.

Steve replied on : 14 of 20

Kakxi—I still can’t wrap my head around what you are trying to do. Even with “only” 16 channels, the idea of a histogram seems completely useless to me. Your comment that “most of the elements will be zero” is a vast understatement.

You’ve got 250,000 16-dimensional data samples. If you accumulate those values in a 16-dimensional space with 10 bins along each dimension, then by my calculation at least 99.9999999975% of the histogram values are going to be zero! And probably no two of those values will accumulate into the same bin, so the histogram accumulation process won’t tell you anything new.

Steve replied on : 15 of 20

Kakxi—I asked a statistics expert to look at this thread, and one of his comments was, “The list of 33-tuples is almost certainly 1:1 equivalent to the 33-dim cube of counts.” So why do it? Of what use will it be?

Kakxi replied on : 16 of 20

Steve- the aim is to show that using more than three channels (RGB) improves object recognition. Once I get the n- dimensional histograms I store them in a database then used histogram intersection to compare them. This will give a measure of similarity. The match values are between 0 and 1. As I said I have done it using 7 channels because that’s all I could get using accumarray and I want to use more channels as this should improve the matching results. I also intend to use real images which have sizes 1024x 1344 pixels.cheers.

Steve replied on : 17 of 20

Kakxi—With a 1024×1344 image, 16 channels per pixel, and 10 bins per dimension, your histogram array would require about 10 petabytes to store and would be about 99.999999986237441% empty. I’m not making that number up. This is going to be completely impractical and useless.

Steve replied on : 19 of 20

Kakxi—I don’t have a lot of experience in this sort of problem, but my feeling is that maybe seven is too high, not too low. Consider looking into dimensionality reducing techniques such as PCA.