Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Histogram to evaluate a correlation

Subject: Histogram to evaluate a correlation

From: Nima Azar

Date: 19 Jan, 2010 18:55:04

Message: 1 of 10

Hi guys:
I have a question in regards to applying histograms to see correlation in obtained data with pre-identified diagnostic.
here's the story:
I have two sets of data. One sets are numbers obtained with Normal patients. (normals have been identified prior to taking data), the other set is for Non-normals (non-normals are also identified before the study). Now I'd like to use histogram to see how much my measurements correlate with the initial diagnostic.

I need help!

Subject: Histogram to evaluate a correlation

From: ImageAnalyst

Date: 19 Jan, 2010 19:00:10

Message: 2 of 10

And your question is......................?
You want to know how to do scatterplots?
You want to understand ROC curves? http://en.wikipedia.org/wiki/Roc_curve
What?

Subject: Histogram to evaluate a correlation

From: Nima Azar

Date: 19 Jan, 2010 19:16:07

Message: 3 of 10

so I'd like to know whether the command "hist" would to this? and also how I can quantify the correlation b/w my data and the pre-identified subjects?

thanks

Subject: Histogram to evaluate a correlation

From: ImageAnalyst

Date: 19 Jan, 2010 19:47:09

Message: 4 of 10

Is the results of your analysis a classification of either "normal" or
"abnormal"? Then you can use ROC curves to assess how well your
analysis works. This is the normal way it's done in the medical
field. See http://en.wikipedia.org/wiki/Roc_curve or elsewhere for
further explanation.

Or does your analysis produce some kind of number on a continuous
scale between some lower limit and an upper limit? For example you're
taking the patient's temperature and can get a number between 98 and
106 degrees F, and you have a patient's self diagnosis as to whether
they feel normal or abnormal, but the patient has only this binary
diagnosis and no continous measurement to offer.

Subject: Histogram to evaluate a correlation

From: Rob Campbell

Date: 19 Jan, 2010 22:40:20

Message: 5 of 10

Do you mean you want to plot two histograms??

 e.g.
r1=randn(1,100);r2=randn(1,100)+2;
hist(r1);
h=findobj(gca,'type','patch');set(h,'facecolor','k','edgecolor','k')
hold on
hist(r2);
h=findobj(gca,'type','patch');set(h(1),'facecolor','r','edgecolor','r')
hold off

If you want to see how distinct the histograms are then, as the other poster suggests, and ROC analysis would work. This will basically tell you how much overlap there is in the histograms. A t-test might also be relevant.

I'm not clear what you want to correlate with what, however. The way I understand it you have two lists of numbers (i.e. two vectors) obtained from data on different people drawn from two groups. You can't cross-correlate those.

Subject: Histogram to evaluate a correlation

From: Nima Azar

Date: 20 Jan, 2010 00:38:04

Message: 6 of 10

ImageAnalyst <imageanalyst@mailinator.com> wrote in message <08ed4cd9-65f3-4d43-ac9d-9a37d67121ca@m25g2000yqc.googlegroups.com>...
> Is the results of your analysis a classification of either "normal" or
> "abnormal"? Then you can use ROC curves to assess how well your
> analysis works. This is the normal way it's done in the medical
> field. See http://en.wikipedia.org/wiki/Roc_curve or elsewhere for
> further explanation.
>
> Or does your analysis produce some kind of number on a continuous
> scale between some lower limit and an upper limit? For example you're
> taking the patient's temperature and can get a number between 98 and
> 106 degrees F, and you have a patient's self diagnosis as to whether
> they feel normal or abnormal, but the patient has only this binary
> diagnosis and no continous measurement to offer.
.
.
.
So basically I have twenty numbers for normals and twenty numbers for abnormals. I'd like to see whether each set of data falls in the "right" category. Does it make sense?

Subject: Histogram to evaluate a correlation

From: Rob Campbell

Date: 20 Jan, 2010 01:20:04

Message: 7 of 10

>...like to see whether each set of data falls in the "right" category. Does it make sense?

Not entirely. How do you define "right"? What exact hypothesis are you testing?
Whether each /set/ of data falls into the right category? Do you mean each data point?
If it's only 20 numbers then why not send them to us in your reply:
data1=[23,34,23]; etc

Sorry, but have you plotted your data? What does that look like? How much overlap in the groups? Have you done a t-test? Did you read about ROC to see if it addresses your issue? You can also run a classifier (which is really telling you the same thing as the ROC):

r1=randn(20,1);r2=randn(20,1)+2;
rr=[r1;r2];
group=[ones(20,1);zeros(20,1)];
class=classify(rr,rr,group);
sum(class==group)/length(rr) %this is not cross-validated

If you do that you *need* to cross-validate it.
http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29

Subject: Histogram to evaluate a correlation

From: Nima Azar

Date: 20 Jan, 2010 01:45:40

Message: 8 of 10

"Rob Campbell" <matlab@robertREMOVEcampbell.removethis.co.uk> wrote in message <hj5lo4$c8i$1@fred.mathworks.com>...
> >...like to see whether each set of data falls in the "right" category. Does it make sense?
>
> Not entirely. How do you define "right"? What exact hypothesis are you testing?
> Whether each /set/ of data falls into the right category? Do you mean each data point?
> If it's only 20 numbers then why not send them to us in your reply:
> data1=[23,34,23]; etc
>
> Sorry, but have you plotted your data? What does that look like? How much overlap in the groups? Have you done a t-test? Did you read about ROC to see if it addresses your issue? You can also run a classifier (which is really telling you the same thing as the ROC):
>
> r1=randn(20,1);r2=randn(20,1)+2;
> rr=[r1;r2];
> group=[ones(20,1);zeros(20,1)];
> class=classify(rr,rr,group);
> sum(class==group)/length(rr) %this is not cross-validated
>
> If you do that you *need* to cross-validate it.
> http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29
.
.
.
So here's the data:
normal = 5,5.3,5.4,5,4.9,5.2,5.2,5.3,5.1,5,4.9,4.8,5.1,5.1,5.2,5.3,5.2,5,5.1,5.2
Abnormal = 4,4.1,4.2,4,4.1,4.2,4.4,4.5,4.4,4.1,4.2,4.5,4.6,4.7,4.1,4,4,4.1,4.1,4.3

It seems like ROC would put me on the right track. So as you can see there's not much overlap. What I need to see is whether each point would be in the pre-identified category? and get some quantified information on that. sorry I'm not very into statistical work .
very appreciate your help.

regards,

Subject: Histogram to evaluate a correlation

From: Rob Campbell

Date: 20 Jan, 2010 02:18:03

Message: 9 of 10

>and get some quantified information on that. sorry I'm not very into statistical work .
:-)
As it happens, you need to do almost zero statistics because your groups don't overlap at all! If they don't overlap then you can always tell who came from which group, right? There's no point even doing a t-test because that would test for a significant difference between *means*, which is obviously the case here. The plot below shows that the groups don't overlap *and that is the only statistical test you need*. It really is that simple. If anybody asks you for a p-value on this then they're talking out of their arse.

What you need to consider is whether the magnitude of the group difference is meaningful from a practical perspective. But that should be a common-sense question.

%Make box-plots
normal = [5,5.3,5.4,5,4.9,5.2,5.2,5.3,5.1,5,4.9,4.8,5.1,5.1,5.2,5.3,5.2,5,5.1,5.2];
Abnormal = [4,4.1,4.2,4,4.1,4.2,4.4,4.5,4.4,4.1,4.2,4.5,4.6,4.7,4.1,4,4,4.1,4.1,4.3];
d=[normal',Abnormal'];
clf
boxplot(d); %groups are well separated!
hold on %overlay jittered raw data
plot(1+randn(size(normal))*0.1,normal,'ok')
plot(2+randn(size(normal))*0.1,Abnormal,'ok')
hold off

%Hmmmm... really no overlap?
min(normal)

ans =

    4.8000

 
max(Abnormal)

ans =

    4.7000
%Yes!

Subject: Histogram to evaluate a correlation

From: Tom Lane

Date: 20 Jan, 2010 03:57:11

Message: 10 of 10

Nima, other posters had some good ideas.

But since you originally asked about histograms, if you have the Statistics
Toolbox, try this:

histfit(normal)
hold on; histfit(Abnormal)

This will show histograms of the two groups (same color, but you can tell
them apart because they don't overlap), plus fitted normal densities. You
can judge the amount of overlap from the two densities. You could edit this
to change the colors, of course.

Now, I have no idea if a normal distribution is a good model for your data.
The sample size is too small, in my opinion, to test that from the data. But
perhaps this is the kind of thing you wanted to do.

-- Tom

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us