Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
what's proper bin size for plotting using hist()?

Subject: what's proper bin size for plotting using hist()?

From: shinchan

Date: 10 Apr, 2009 19:32:02

Message: 1 of 2

I am a statistics newbie, so this may sound elementary but I jsut don't know it. What's the proper way to determine the bin size when using hist() to show a histogram? As the following code shows, bin size 5 gives a zero-mean, one 'Gaussian distributed' population. But at bin size 30, the same data looks completely different, and itappears that there are multiple populations.

r=randn([40, 1]); %Generate Gaussian distribution made of 40 random numbers.
figure(1); hist(r, 5);
figure(2); hist(r, 30);

So if r is my experiemnt data, which I don't know a priori how many different populations exist, what should I do?

Subject: what's proper bin size for plotting using hist()?

From: Image Analyst

Date: 10 Apr, 2009 23:11:02

Message: 2 of 2

"shinchan " <shinchan75034@gmail.com> wrote in message <gro6rh$9gq$1@fred.mathworks.com>...
> I am a statistics newbie, so this may sound elementary but I jsut don't know it. What's the proper way to determine the bin size when using hist() to show a histogram? As the following code shows, bin size 5 gives a zero-mean, one 'Gaussian distributed' population. But at bin size 30, the same data looks completely different, and itappears that there are multiple populations.
>
> r=randn([40, 1]); %Generate Gaussian distribution made of 40 random numbers.
> figure(1); hist(r, 5);
> figure(2); hist(r, 30);
>
> So if r is my experiemnt data, which I don't know a priori how many different populations exist, what should I do?
--------------------------------------------------------------------------------------------------------------
That's because with 40 numbers and 30 bins, you don't have enough counts in each bin to give a good shape. Look at the example below where I used 40,000 numbers in 30 bins instead of only 40 numbers. The shape is now nice. With only 40 numbers, you have many bins with only 0, 1, or 2 counts in them - nowhere near enough to visualize the true shape of the distribution.

If r is your data and you only have 40 observations, you're really dependent on how much spread there is in your observations if you want to determine if there is one population or two. Consult a standard college textbook.
Regards,
ImageAnalyst
clc;
close all;
% Display the original image.
figuresc(0.9, 0.8);
randomNumbers1 = randn([40, 1]); %Generate Gaussian distribution made of 40 random numbers.
counts1 = hist(randomNumbers1, 5);
subplot(1, 3, 1);
bar(counts);
title('40 numbers in 5 bins');

counts2 = hist(randomNumbers1, 30);
subplot(1, 3, 2);
bar(counts2);
title('40 numbers in 30 bins');

randomNumbers = randn([40000, 1]); %Generate Gaussian distribution made of 40000 random numbers.
counts3 = hist(randomNumbers, 30);
subplot(1, 3, 3);
bar(counts3);
title('40000 numbers in 30 bins');

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us