File Exchange Pick of the Week

Our best user submissions

Comparing multiple histograms 1

Posted by Jiro Doke,

Jiro's pick this week is "Comparing Multiple Histograms" by Jonathan C. Lansey.

One of the things you may want to do when analyzing two sets of data is comparing their distributions. There are various ways to do this. One is to fit each data set to a particular distribution using the function fistdist from the Statistics and Machine Learning Toolbox.

A.mu_is_Zero = randn(10^5,1);   % mean of 0
A.mu_is_Two = randn(10^5,1)+2;  % mean of 2

% This assumes you know the distribution
dist1 = fitdist(A.mu_is_Zero,'Normal')  % fit to a normal distribution
dist2 = fitdist(A.mu_is_Two,'Normal')   % fit to a normal distribution

x = -10:0.1:10;
y1 = pdf(dist1,x);
y2 = pdf(dist2,x);

plot(x,y1,x,y2)

legend(fieldnames(A),'interpreter','none')
dist1 = 
  NormalDistribution

  Normal Distribution
       mu = 0.00281815   [-0.00337045, 0.00900676]
    sigma =    0.99848   [0.994123, 1.00288]

dist2 = 
  NormalDistribution

  Normal Distribution
       mu = 2.00247   [1.99627, 2.00868]
    sigma = 1.00081   [0.996446, 1.00522]

You can also use boxplot, also from Statistics and Machine Learning Toobox, to create a box and whisker plot that lets you visualize statistical information.

clf
boxplot([A.mu_is_Zero, A.mu_is_Two],'labels',fieldnames(A))

Jonathan's nhist lets you compare the histograms of the data sets easily.

clf
nhist(A)
ans = 
    mu_is_Zero: 'mu_is_Zero: mean=0.00, std=1.00, 3 points counted in the ...'
     mu_is_Two: 'mu_is_Two: mean=2.00, std=1.00, 1 points counted in the r...'

Note that it automatically uses the field names for the legend.

Of course, this is just the default behavior. This function comes with a wealth of options for controlling everything from line properties and graph orientations to histogram properties and statistics displayed (mode, median, standard error, etc.). For example, to create two separate histograms with a greenish color and same number of bins,

nhist(A,'color',[.3 .8 .3],'separate','samebins','maxbins',50)
ans = 
    mu_is_Zero: 'mu_is_Zero: mean=0.00, std=1.00, 3 points counted in the ...'
     mu_is_Two: 'mu_is_Two: mean=2.00, std=1.00, 1 points counted in the r...'

Comments

There are many other options and the function comes with a very detailed help. Give it a try, and let us know what you think here or leave a comment for Jonathan.


Get the MATLAB code

Published with MATLAB® R2015b

Note

Comments are closed.

1 CommentsOldest to Newest

Jonathan Lansey replied on : 1 of 1

Thanks for the feature!
Just a technicality – but ‘samebins’ makes it so that bins for both plots are not only the same in quantity – but actually identical. This could be more useful if the two distributions have different ‘N’ where the default actually uses the optimal number of bins for each distribution – so often the actual bins end up different.
Also – wow interesting color choice there, sort of a deep algae green.