Path: news.mathworks.com!not-for-mail
From: "Mastaneh" <mtorkama@iupui.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Re: chi-square two sample test
Date: Sat, 3 Jan 2009 02:29:01 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 38
Message-ID: <gjmihd$oap$1@fred.mathworks.com>
References: <g2ls65$p3u$1@fred.mathworks.com> <g2lvan$1a3$1@fred.mathworks.com> <g2ntgq$r5e$1@fred.mathworks.com> <g2onqt$rmv$2@fred.mathworks.com> <g2qavh$399$1@fred.mathworks.com> <g2r8jt$den$2@fred.mathworks.com> <gje1t2$lul$1@fred.mathworks.com> <gjg2ic$foi$1@fred.mathworks.com>
Reply-To: "Mastaneh" <mtorkama@iupui.edu>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1230949741 24921 172.30.248.37 (3 Jan 2009 02:29:01 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Sat, 3 Jan 2009 02:29:01 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 648205
Xref: news.mathworks.com comp.soft-sys.matlab:509592


"Tom Lane" <tlane@mathworks.com> wrote in message <gjg2ic$foi$1@fred.mathworks.com>...
> > I'm solving a problem regarding finding the chi-square goodness of fit, 
> > and I was wondering if I was making the same mistake.
> > I have a data sample which is autoscaled (normalized) up to the 2nd order. 
> > First, I find a fit for the data using normfit and normpdf functions. 
> > Next, I use the first 4 moments of the sample to find a Pearson 
> > estimation. Now, If I want to compare the goodness of fit for each 
> > methods, should I use the two-sample Chi-square test to compare the pdf 
> > funtions? I mean, once to compare the pdf of the autoscaled sample with 
> > that of the normpdf, and then pdf of the autoscaled sample with that of 
> > the Pearson-generated sample?
> 
> Mastaneh, when you compare the sample to the normal distribution, that is a 
> one-sample test.  You could use a chi-square test or any of several other 
> tests.
> 
> For the Pearson comparison, it sounds like you really want to do a 
> one-sample test again, comparing the observed sample with expected values 
> under the Pearson distribution.  The Statistics Toolbox, though, has a 
> function for generating random Pearson values but not for computing the cdf 
> of this distribution.  Is that the issue?
> 
> You may be able to poke around at the code for pearsrnd and figure out how 
> to compute the cdf for some cases.  Alternatively I suppose you could 
> generate an enormous number of random values to estimate the expected bin 
> proportions, then regard them as fixed.  There is not a function in the 
> toolbox for comparing two finite samples to see if they have the same 
> distribution via a chi-square test.
> 
> -- Tom 
> 

Thanks Tom. 
Yes, using the Pearson function from the Statistics Toolbox I could only generate the random distribution. I'ts not possible to find the pdf or cdf of that distribution directly so I used the histogram function to find the frequency counts and bin locations of both samples (my data and pearson). Then I used the algorithm in http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm to calculate the test statistics. Here it says that the 2-samples test is based on the binning of data so I thought it's what I should use anyway. Does it sound right? 
I suppose it should be similar to your idea about having fixed bin proportions. Am I right? 

Thanks once again,
Mastaneh