By All Means
Ever find yourself wanting to get some sense of some data, but not sure the arithmetic mean is what you want? You might also consider the geometric mean (geomean from Statistics Toolbox). In the image processing world, I understand that some think that images look crisper often when the geometric mean is applied versus the arithmetic mean. Today I want to talk about how to get accurate results for the geometric mean.
Contents
Geometric Mean
Let's assume we have a vector x so we can ignore dealing with different dimensions. I will first create function handles for the mean and standard expression for the geometric mean. Here's the handle for the arithmetic mean
amn = @(x) mean(x)
amn = @(x)mean(x)
and for the geometric mean.
gmn = @(x) prod(x)^(1/numel(x))
gmn = @(x)prod(x)^(1/numel(x))
Some Data
Now let's create some data and compute the means.
xsmall = 100*rand(10,1); means = [amn(xsmall) gmn(xsmall)]
means = 42.403 27.898
More Challenging Data
Let's suppose we some data that are much larger in size and compute the means.
xlarge = 1e300*rand(1000,1); means = [amn(xlarge) gmn(xlarge)]
means = 5.1363e+299 Inf
While we got a finite answer for the arithmetic mean, we got Inf for the geometric mean. If you look at the expression for the geometric mean, we first calculated the product of all the numbers and then took the nth root. So we exceeded realmax in the calculation, hence the infinite result. Is there a way to circumvent this, at least for a while? Yes!
Safer Expression for Geometric Mean
We can recast the calculation of the product of some numbers to be the sum of their natural logs and then exponentiate that result. To get the nth root, we divide the sum by n, the number of elements. Here's a new expression for the geometric mean.
gm2 = @(x) exp(sum(log(x))/numel(x))
gm2 = @(x)exp(sum(log(x))/numel(x))
Here's the geometric mean applied to our two datasets.
[gm2(xsmall) gm2(xlarge)]
ans = 27.898 3.8763e+299
You can see that we get the same result for the perhaps more typical data, and have insulated ourselves from poor numerical results with the larger data values.
How Do You Average Data?
If you have data that may contain NaN values, you can use nanmean from Statistics Toolbox. Do you have other expressions that are appropriate for averaging your datasets. Let me know here.