Thread Subject: chi-square two sample test

Subject: chi-square two sample test

From: wang wang

Date: 10 Jun, 2008 12:31:01

Message: 1 of 10

I have a data set,and I estimate the parameters of a
specified distribution,say the lognormal distribution,from
that data set. Now I want to test the goodness of fit of
that distribution to the data(notice particularly that the
parameters of the distribution is estimated from the data
set).Can anyone tell me whether the chi-square two sample
test is adequate to do that or not? And is there a
exsiting function to do that work? I have known that there
exist a function named'chi2gof'which can do the chi-square
goodness-of-fit test,but this function only tests whether
the data comes from a normal distribution.

Subject: chi-square two sample test

From: Peter Perkins

Date: 10 Jun, 2008 13:24:39

Message: 2 of 10

wang wang wrote:
> I have a data set,and I estimate the parameters of a
> specified distribution,say the lognormal distribution,from
> that data set. Now I want to test the goodness of fit of
> that distribution to the data(notice particularly that the
> parameters of the distribution is estimated from the data
> set).Can anyone tell me whether the chi-square two sample
> test is adequate to do that or not?

Is there a reason why you want to use a two-sample test? Your description
sounds like you want the usual chi-squared test against a parametric
distribution for which you have estimated the parameters.



> And is there a
> exsiting function to do that work? I have known that there
> exist a function named'chi2gof'which can do the chi-square
> goodness-of-fit test,but this function only tests whether
> the data comes from a normal distribution.

That's not correct. CHI2GOF allows you to specify any distribution you want, in
  a couple fo different ways.

Subject: chi-square two sample test

From: wang wang

Date: 11 Jun, 2008 07:06:02

Message: 3 of 10

Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com>
wrote in message <g2lvan$1a3$1@fred.mathworks.com>...
> wang wang wrote:
> > I have a data set,and I estimate the parameters of a
> > specified distribution,say the lognormal
distribution,from
> > that data set. Now I want to test the goodness of fit
of
> > that distribution to the data(notice particularly that
the
> > parameters of the distribution is estimated from the
data
> > set).Can anyone tell me whether the chi-square two
sample
> > test is adequate to do that or not?
>
> Is there a reason why you want to use a two-sample
test? Your description
> sounds like you want the usual chi-squared test against
a parametric
> distribution for which you have estimated the parameters.
>
>
 Peter,thanks for your reply.The parametric distribution
which I used to fit the data set does not have an
analytical expression.Only it's characteristic function is
known.So I think maybe I can use the two sample test.I
donnot know whether this is correct, please tell me if
it's wrong.
>
> > And is there a
> > exsiting function to do that work? I have known that
there
> > exist a function named'chi2gof'which can do the chi-
square
> > goodness-of-fit test,but this function only tests
whether
> > the data comes from a normal distribution.
>
> That's not correct. CHI2GOF allows you to specify any
distribution you want, in
> a couple fo different ways.

yes,I made a mistake about this.

Subject: chi-square two sample test

From: Peter Perkins

Date: 11 Jun, 2008 14:35:09

Message: 4 of 10

wang wang wrote:

> Peter,thanks for your reply.The parametric distribution
> which I used to fit the data set does not have an
> analytical expression.Only it's characteristic function is
> known.So I think maybe I can use the two sample test.I
> donnot know whether this is correct, please tell me if
> it's wrong.

My question would be, "what's your second sample?"

Subject: chi-square two sample test

From: wang wang

Date: 12 Jun, 2008 05:08:01

Message: 5 of 10

Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com>
wrote in message <g2onqt$rmv$2@fred.mathworks.com>...
> wang wang wrote:
>
> > Peter,thanks for your reply.The parametric
distribution
> > which I used to fit the data set does not have an
> > analytical expression.Only it's characteristic
function is
> > known.So I think maybe I can use the two sample test.I
> > donnot know whether this is correct, please tell me if
> > it's wrong.
>
> My question would be, "what's your second sample?"

Thank you Peter,the second sample can be generated from
the specific distribution whose parameters are estimated
from the data set.This sample can be seen as come from the
specific distribution,then maybe the chi-square two sample
test can be used to test whether the generated sample and
the original data set come from a common distribution

Subject: chi-square two sample test

From: Peter Perkins

Date: 12 Jun, 2008 13:33:49

Message: 6 of 10

wang wang wrote:

> Thank you Peter,the second sample can be generated from
> the specific distribution whose parameters are estimated
> from the data set.This sample can be seen as come from the
> specific distribution,then maybe the chi-square two sample
> test can be used to test whether the generated sample and
> the original data set come from a common distribution

I don't know what to tell you. Your description of your context is exactly what
a one-sample chi-squared test is for. I don't know why you would want to
artificially introduce another sample.

How your distribution is defined makes no difference at all. You are fitting a
distribution to data by estimating its parameters. If you can compute
cumulative probabilities from that fitted distribution, then that's all you need
to use chi2gof. If your problem is that you can't compute cumulative
probabilites, then I wonder how useful your model will actually be from a
predictive point of view.

Subject: chi-square two sample test

From: Mastaneh

Date: 30 Dec, 2008 20:56:02

Message: 7 of 10

Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message
>
> I don't know what to tell you. Your description of your context is exactly what
> a one-sample chi-squared test is for. I don't know why you would want to
> artificially introduce another sample.
>
> How your distribution is defined makes no difference at all. You are fitting a
> distribution to data by estimating its parameters. If you can compute
> cumulative probabilities from that fitted distribution, then that's all you need
> to use chi2gof. If your problem is that you can't compute cumulative
> probabilites, then I wonder how useful your model will actually be from a
> predictive point of view.

Hi,
I'm solving a problem regarding finding the chi-square goodness of fit, and I was wondering if I was making the same mistake.
I have a data sample which is autoscaled (normalized) up to the 2nd order. First, I find a fit for the data using normfit and normpdf functions. Next, I use the first 4 moments of the sample to find a Pearson estimation. Now, If I want to compare the goodness of fit for each methods, should I use the two-sample Chi-square test to compare the pdf funtions? I mean, once to compare the pdf of the autoscaled sample with that of the normpdf, and then pdf of the autoscaled sample with that of the Pearson-generated sample?

Thanks,
Mastaneh

Subject: chi-square two sample test

From: Tom Lane

Date: 31 Dec, 2008 15:19:39

Message: 8 of 10

> I'm solving a problem regarding finding the chi-square goodness of fit,
> and I was wondering if I was making the same mistake.
> I have a data sample which is autoscaled (normalized) up to the 2nd order.
> First, I find a fit for the data using normfit and normpdf functions.
> Next, I use the first 4 moments of the sample to find a Pearson
> estimation. Now, If I want to compare the goodness of fit for each
> methods, should I use the two-sample Chi-square test to compare the pdf
> funtions? I mean, once to compare the pdf of the autoscaled sample with
> that of the normpdf, and then pdf of the autoscaled sample with that of
> the Pearson-generated sample?

Mastaneh, when you compare the sample to the normal distribution, that is a
one-sample test. You could use a chi-square test or any of several other
tests.

For the Pearson comparison, it sounds like you really want to do a
one-sample test again, comparing the observed sample with expected values
under the Pearson distribution. The Statistics Toolbox, though, has a
function for generating random Pearson values but not for computing the cdf
of this distribution. Is that the issue?

You may be able to poke around at the code for pearsrnd and figure out how
to compute the cdf for some cases. Alternatively I suppose you could
generate an enormous number of random values to estimate the expected bin
proportions, then regard them as fixed. There is not a function in the
toolbox for comparing two finite samples to see if they have the same
distribution via a chi-square test.

-- Tom

Subject: chi-square two sample test

From: Mastaneh

Date: 3 Jan, 2009 02:29:01

Message: 9 of 10

"Tom Lane" <tlane@mathworks.com> wrote in message <gjg2ic$foi$1@fred.mathworks.com>...
> > I'm solving a problem regarding finding the chi-square goodness of fit,
> > and I was wondering if I was making the same mistake.
> > I have a data sample which is autoscaled (normalized) up to the 2nd order.
> > First, I find a fit for the data using normfit and normpdf functions.
> > Next, I use the first 4 moments of the sample to find a Pearson
> > estimation. Now, If I want to compare the goodness of fit for each
> > methods, should I use the two-sample Chi-square test to compare the pdf
> > funtions? I mean, once to compare the pdf of the autoscaled sample with
> > that of the normpdf, and then pdf of the autoscaled sample with that of
> > the Pearson-generated sample?
>
> Mastaneh, when you compare the sample to the normal distribution, that is a
> one-sample test. You could use a chi-square test or any of several other
> tests.
>
> For the Pearson comparison, it sounds like you really want to do a
> one-sample test again, comparing the observed sample with expected values
> under the Pearson distribution. The Statistics Toolbox, though, has a
> function for generating random Pearson values but not for computing the cdf
> of this distribution. Is that the issue?
>
> You may be able to poke around at the code for pearsrnd and figure out how
> to compute the cdf for some cases. Alternatively I suppose you could
> generate an enormous number of random values to estimate the expected bin
> proportions, then regard them as fixed. There is not a function in the
> toolbox for comparing two finite samples to see if they have the same
> distribution via a chi-square test.
>
> -- Tom
>

Thanks Tom.
Yes, using the Pearson function from the Statistics Toolbox I could only generate the random distribution. I'ts not possible to find the pdf or cdf of that distribution directly so I used the histogram function to find the frequency counts and bin locations of both samples (my data and pearson). Then I used the algorithm in http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm to calculate the test statistics. Here it says that the 2-samples test is based on the binning of data so I thought it's what I should use anyway. Does it sound right?
I suppose it should be similar to your idea about having fixed bin proportions. Am I right?

Thanks once again,
Mastaneh

Subject: chi-square two sample test

From: Tom Lane

Date: 5 Jan, 2009 15:30:00

Message: 10 of 10

> Yes, using the Pearson function from the Statistics Toolbox I could only
> generate the random distribution. I'ts not possible to find the pdf or cdf
> of that distribution directly so I used the histogram function to find the
> frequency counts and bin locations of both samples (my data and pearson).
> Then I used the algorithm in
> http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm
> to calculate the test statistics. Here it says that the 2-samples test is
> based on the binning of data so I thought it's what I should use anyway.
> Does it sound right?
> I suppose it should be similar to your idea about having fixed bin
> proportions. Am I right?

Mastaneh, sure, you could use a two-sample test if you want. It's
introducing extra variability (from the Pearson random sample), so it would
probably be less sensitive than a one-sample test, but I suppose it would be
valid. You won't be able to use the chi2gof function, though -- that is for
one-sample tests.

-- Tom

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
chi square two ... Mastaneh 30 Dec, 2008 16:02:09
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com