Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
KStest---statistical test

Subject: KStest---statistical test

From: Yueping Xu

Date: 11 Jun, 2003 11:32:55

Message: 1 of 11

Dear all,


I want to fit some distributions to my data and do some tests.I plan
to use kstest.If the distribution I want to fit is beta distribution
or exponential distribution,how can i use this function?THe point
here is how to define the cdf in KSTEST function if we can not
estimate from the data?


Hope you can help me!


Yueping

Subject: KStest---statistical test

From: carlos

Date: 11 Jun, 2003 23:48:44

Message: 2 of 11


"Yueping Xu" <y.p.xu@ctw.utwente.nl> wrote in message
news:eebf397.-1@WebX.raydaftYaTP...
> Dear all,
>
>
> I want to fit some distributions to my data and do some tests.I plan
> to use kstest.If the distribution I want to fit is beta distribution
> or exponential distribution,how can i use this function?THe point
> here is how to define the cdf in KSTEST function if we can not
> estimate from the data?
>

Usually you would either:
1. Assume a distribution with specific parameters
2. Assume a distribution with parameters estimated from the data (if you
have few data points then of course these parameters will be porly
estimated)

Why can't you estimate or just assume distribution parameters from your
data?

Carlos

Subject: KStest---statistical test

From: Pierpa

Date: 11 Jun, 2003 19:16:49

Message: 3 of 11

From the on-line help:


"The Kolmogorov-Smirnov test requires that cdf be predetermined. It
is not accurate if cdf is estimated from the data. To test X against
a normal distribution without specifying the parameters, use
lillietest instead"


An easy workaround is to bootstrap your data, fit your parametric
model (maximum log-like, method of moments, whatever...), evaluate
the KS-metric between the empirical distribution function (i.e. the
distrib. that assigns prob 1/N to each of your N data points) and the
fitted CDF, repeat to find the quantile needed to build the 1-alpha
KS-like test. Note that there're different methods to estimate this
quantile and a good text about bootstrap should enlight your
bootstrapped-way to testing statistical hypotheses (e.g. Davison &
Hinkley, Bootstrap Methods and Their Application).


As usual this is not the only way (and it's not even the neater one!)
but, at least, it's simple...although, at first glance, a bit
"hysteric" (...to test the adaptation we're using the KS-metric that
is quite different from the one used to fit the model).


HTH

Subject: KStest---statistical test

From: Yueping Xu

Date: 12 Jun, 2003 02:21:51

Message: 4 of 11

Yes, as Pierpa said, the on-line help told us it is not accurate if
cdf is estimated from the data.So that is why i can not use KSTEST.


So there is no easy way in Matlab to realize my goal?I never use
bootstrap before:(I will try...


Thanks.


Pierpa wrote:
>
>
> From the on-line help:
>
> "The Kolmogorov-Smirnov test requires that cdf be predetermined. It
> is not accurate if cdf is estimated from the data. To test X
> against
> a normal distribution without specifying the parameters, use
> lillietest instead"
>
> An easy workaround is to bootstrap your data, fit your parametric
> model (maximum log-like, method of moments, whatever...), evaluate
> the KS-metric between the empirical distribution function (i.e. the
> distrib. that assigns prob 1/N to each of your N data points) and
> the
> fitted CDF, repeat to find the quantile needed to build the 1-alpha
> KS-like test. Note that there're different methods to estimate this
> quantile and a good text about bootstrap should enlight your
> bootstrapped-way to testing statistical hypotheses (e.g. Davison &
> Hinkley, Bootstrap Methods and Their Application).
>
> As usual this is not the only way (and it's not even the neater
> one!)
> but, at least, it's simple...although, at first glance, a bit
> "hysteric" (...to test the adaptation we're using the KS-metric
> that
> is quite different from the one used to fit the model).
>
> HTH

Subject: KStest---statistical test

From: Peter Perkins

Date: 12 Jun, 2003 11:23:42

Message: 5 of 11

> So there is no easy way in Matlab to realize my goal?I never use
> bootstrap before:(I will try...

Hi Yueping -

If I might restate your goal: you are trying to do a goodness-of-fit test of
data against a specific parametric family of distributions.

The lack of simple procedures is not a MATLAB limitation, but rather a
limitation of statistical theory. The p-values for the K-S test assume that
either (1) you are testing a dataset against a _fully-specified_ distribution,
or (2), you are comparing two datasets to see if they come from the same
(unspecified) distribution. It is annoying that the test does not allow you
to test against a _family_ of distributions, but there you have it.

The problem is more or less that the estimated distribution is "closer" to the
data than the data are to their true distribution. That would seem to make
the "estimated" test more stringent, but I believe that that is not always the
case.

I'm not sure that bootstrapping will help you here. You need to find the
sampling distribution of the "estimated" K-S statistic (the value you get by
first estimating parameters and then computing the usual K-S stat) under the
_null hypothesis_, that is, for data that come from the family of
distributions. Unless I misunderstood Pierpa's suggestion, resampling from
your data does not give you that, because the your data may or may not be from
that family.

You'd need to something more like

1) pick some fixed parameter values
2) parametrically generate some data
3) fit a distribution
4) compute the usual K-S stat
5) repeat 2-4, and estimate the critical value by the 95% point of the replicates

That would give you the null distribution of the "estimated" K-S statistic for
those fixed parameter values, now do the same for some others. For a
location-scale family such as the normal or exponential, the null distribution
shouldn't change for other parameter values, and I think that's that's more or
less what Lilliefors did. But for other distributions it will.

Some references for all this are

    D'Agostino, D.B. and Stephans, M.A., Goodness-of_fit Techniques
    Linhart, H. and Zucchini, W., Model Selection.

Hope this helps.

- Peter Perkins
   The MathWorks, Inc.

Subject: KStest---statistical test

From: Pierpa

Date: 12 Jun, 2003 13:11:55

Message: 6 of 11

> I'm not sure that bootstrapping will help
> you here.
[snip]


> You'd need to something more like
> 1) pick some fixed parameter values
> 2) parametrically generate some data
[snip]


uhmmmm...this is usually called parametric bootstrap...funny, isn't
it?


Btw my suggestion was much more in the spirit of the following paper:


Babu, G. J.; Rao, C. R. Goodness-of-fit tests when parameters are
estimated.


that you can find here:


 <http://www.stat.psu.edu/~babu/pdfpaper/estim.pdf>


HTH

Subject: KStest---statistical test

From: Pierpa

Date: 12 Jun, 2003 14:05:25

Message: 7 of 11

> Btw my suggestion was much more in the
[snip]


oooopsss...a bit too egocentric *^_^*...substitute "my" with "your" =
Peter.


sorry!


--P

Subject: KStest---statistical test

From: Yueping Xu

Date: 12 Jun, 2003 14:11:25

Message: 8 of 11

Thank you very much for all your good suggestions.As there is no
direct and easy solution from matlab,the only thing I can do is to
read papers or books you suggested here before I can really
continue:-)

Subject: KStest---statistical test

From: Peter Perkins

Date: 12 Jun, 2003 14:28:17

Message: 9 of 11

> uhmmmm...this is usually called parametric bootstrap

As I understand the term, a "parametric bootstrap" is a simulation where one
estimates a distribution from data, and then generates data from that
estimated distribution. The whole point that I was trying to make is that
resampling from the data, whether parametrically or non-parametrically, is not
the right thing to do, because you need the sampling distribution of the K-S
statistic under the null hypothesis, and the data may not come from the null
hypothesis. An estimated distribution is part of the null, but it's only one
possible member. Thus, I was suggesting a MC simulation, but not a bootstrap,
to explore the sampling distribution of the K-S stat under the various members
of the null.


> Babu, G. J.; Rao, C. R. Goodness-of-fit tests when parameters are
> estimated.

Thanks, that looks like an interesting paper.

- Peter

Subject: KStest---statistical test

From: Peter Perkins

Date: 12 Jun, 2003 14:48:19

Message: 10 of 11

> Thank you very much for all your good suggestions.As there is no
> direct and easy solution from matlab,the only thing I can do is to
> read papers or books you suggested here before I can really
> continue:-)

The Stats Toolbox has the function LILLIETEST, which is Lilliefors test for
normality. He also wrote a paper (J. Am. Stat. Assoc. circa 1969) describing
a similar test for exponentiality. It should be simple enough to reproduce
his results in MATLAB, to calculate the critical value for such a test.
Something like:

% Critical values for the exponential
n = 15; % length of your data
alpha = .05;
reps = 10000;
KSstat = zeros(reps,1);
mu = 1; % any value will do, exponential is a scale family
for rep = 1:length(KSstat)
     xrep = exprnd(mu,n,1);
     [yCDF,xCDF] = ecdf(xrep);
     [h,p,KSstat(rep)] = kstest(xrep, [xCDF expcdf(xCDF, mean(xrep))]);
end
critVal = prctile(KSstat,100*(1-alpha))

The actual test is then to compute the observed K-S statistic (using the
estimated distribution), and compare it to the simulated critical value:

[yCDF,xCDF] = ecdf(x);
[h,p,KSstat] = kstest(x, [xCDF expcdf(xCDF, mean(x))]);
h = KSstat > critVal

You should definately check this against his paper, though.

- Peter

Subject: KStest---statistical test

From: Yueping Xu

Date: 12 Jun, 2003 14:57:06

Message: 11 of 11

Hello,


No one likes the chi-squared test?I found it is easy to understand
and I am implementing it in matlab(on my desktable there are only
materials about chi-squared test)...but i don't know if it is
appropriate in this case?At least it is suitable for all
distributions...


Thanks.


Yueping

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us