Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
More on K-S Test...

Subject: More on K-S Test...

From: Linda

Date: 18 Jun, 2004 04:25:34

Message: 1 of 7

Hi!

I have tried using the kstest as below:

x = gamrnd(1, 2, 100, 1);
kstest(x, [x chi2cdf(x, 1)])
kstest(x, [x chi2cdf(x, 2)])
kstest(x, [x expcdf(x, 1)])
kstest(x, [x gamcdf(x, 1, 2)])
kstest(x, [x logncdf(x, 1, 2)])
kstest(x, [x ncx2cdf(x, 1, 2)])
kstest(x, [x normcdf(x, 1, 2)])
kstest(x, [x raylcdf(x, 1)])
kstest(x, [x raylcdf(x, 2)])
kstest(x, [x unifcdf(x, 1, 2)])
kstest(x, [x wblcdf(x, 1, 2)])
ans =
     1
ans =
     0
ans =
     1
ans =
     0
ans =
     1
ans =
     1
ans =
     1
ans =
     1
ans =
     1
ans =
     1
ans =
     1

Results show that both chi2 and gamma distribution pass the KS test.
If this sistuation happens in some experiemental data, waht other test
should I perform to choose the most appropriate distribution?

Linda

Subject: More on K-S Test...

From: tax

Date: 18 Jun, 2004 05:21:16

Message: 2 of 7

> chi2 and gamma distribution pass the KS
> test. what should I do?
> Linda

First quick solution: look at the p-values not just at the 0/1 result
(choose the model that passed the test with the highest p-value)

Pretty often, however, you have to estimate the parameters of the
distribution family under the null. In this case the usual KStest is
not reliable and you might wanna look at the following thread:

"KStest---statistical test"
Date: 2003-06-11 11:32:55

Now if, for example, you wanna test against a normal with parameter
estimated from the data, then the (parametric) bootstrap procedure
described in the paper cited there goes, more or less, as follow
(assuming you have the stat TBX):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% X contains the data

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% perform the usual KS-test
% based on the estimates.
% We need the ksObs

[foo1, foo2, ksObs] = kstest(X, [X normcdf(X, muH, sigmaH)], alpha);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Parametric-bootstrap

ksBoot = zeros(nBoot,1);
for k=1:nBoot,
    % sample from the
    % null distribution
    Xboot = normrnd(muH, sigmaH, [nSample,1]);
    
    % calculate the estimates
    [muHboot, sigmaHboot] = normfit(Xboot);

    % KStest on the sample
    [foo1, foo2, ksBoot(k)] = kstest(Xboot, [Xboot normcdf(Xboot,
muHboot, sigmaHboot)]);
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% calculate the p-value

% smooth out the density
% of the KS-statistics
ksX = sort([linspace(0,max(ksBoot)+1,250), ksObs]);
idx = find(ksX == ksObs);
ksSmooth = ksdensity(ksBoot, ksX, 'support', 'positive');

% approximate the tail
Pvalue = trapz(ksX(idx:end), ksSmooth(idx:end));

% actual decision rule
if bPvalue < alpha,
   Reject = 1;
else
   Reject = 0;
end

HTH

Subject: More on K-S Test...

From: Peter Perkins

Date: 18 Jun, 2004 09:40:38

Message: 3 of 7

> Results show that both chi2 and gamma distribution pass the KS test.
> If this sistuation happens in some experiemental data, waht other test
> should I perform to choose the most appropriate distribution?

Hi Linda -

The chi-squared distribution is a special case of the gamma distribution,
where the shape and scale parameters are "tied" together. You can fit a
chi-squared to data, but the gamma is more general, and is the one you should
be using, unless there are theoretical reasons why your dare really chi-squared.

- Peter Perkins
   The MathWorks, Inc.

Subject: More on K-S Test...

From: Linda

Date: 18 Jun, 2004 11:59:15

Message: 4 of 7

Hi!

Thanks very much for your suggestions.
Do you mean when I use the parametric-boostrap, I can test for any types of
distributions?

Actually, I have some measurement data.
I do not know which distributions describe the data the best.
So, I need to test several distributions (e.g. Lognormal, Rayleigh, Gamma
and Weibull).
And I would like to use some kind of hypothesis test such as K-S test to
determine which distribution give the best fit (or highest percentage of
passing rate).

So, this is how I do:

% (1) Gamma distribution
gam_para = gamfit(data);
p_gam = kstest(data,[data gamcdf(data,gam_para(1),gam_para(2))]);

% (2) Lognormal distribution
logn_para = lognfit(data);
p_logn = kstest(data,[data logncdf(data,logn_para(1),logn_para(2))]);

% (3) Rayleigh distribution
rayl_para = raylfit(data);
p_rayl = kstest(data,[data raylcdf(data,rayl_para))]);

% (4) Weibull distribution
wbl_para = wblfit(data);
p_wbl = kstest(data,[data wblcdf(data,wbl_para(1),wbl_para(2))]);


However, I notice that "p" values from all distributions show "1" i.e. to
reject the distribution eventhough when I try to plot the empirical data CDF
and all the estimated CDF using estimated parameters set, they show pretty
close.

Can I know is it coz the technique I use is wrong?

Thanks.

Linda


"tax" <onemorenoisy@day.today> wrote in message
news:eee08f4.0@webx.raydaftYaTP...
> > chi2 and gamma distribution pass the KS
> > test. what should I do?
> > Linda
>
> First quick solution: look at the p-values not just at the 0/1 result
> (choose the model that passed the test with the highest p-value)
>
> Pretty often, however, you have to estimate the parameters of the
> distribution family under the null. In this case the usual KStest is
> not reliable and you might wanna look at the following thread:
>
> "KStest---statistical test"
> Date: 2003-06-11 11:32:55
>
> Now if, for example, you wanna test against a normal with parameter
> estimated from the data, then the (parametric) bootstrap procedure
> described in the paper cited there goes, more or less, as follow
> (assuming you have the stat TBX):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % X contains the data
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % perform the usual KS-test
> % based on the estimates.
> % We need the ksObs
>
> [foo1, foo2, ksObs] = kstest(X, [X normcdf(X, muH, sigmaH)], alpha);
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % Parametric-bootstrap
>
> ksBoot = zeros(nBoot,1);
> for k=1:nBoot,
> % sample from the
> % null distribution
> Xboot = normrnd(muH, sigmaH, [nSample,1]);
>
> % calculate the estimates
> [muHboot, sigmaHboot] = normfit(Xboot);
>
> % KStest on the sample
> [foo1, foo2, ksBoot(k)] = kstest(Xboot, [Xboot normcdf(Xboot,
> muHboot, sigmaHboot)]);
> end
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % calculate the p-value
>
> % smooth out the density
> % of the KS-statistics
> ksX = sort([linspace(0,max(ksBoot)+1,250), ksObs]);
> idx = find(ksX == ksObs);
> ksSmooth = ksdensity(ksBoot, ksX, 'support', 'positive');
>
> % approximate the tail
> Pvalue = trapz(ksX(idx:end), ksSmooth(idx:end));
>
> % actual decision rule
> if bPvalue < alpha,
> Reject = 1;
> else
> Reject = 0;
> end
>
> HTH

Subject: More on K-S Test...

From: Linda

Date: 18 Jun, 2004 12:03:17

Message: 5 of 7

BTW, what should be the value of "nBoot" and "ksObs"?

Thanks.

Linda


"tax" <onemorenoisy@day.today> wrote in message
news:eee08f4.0@webx.raydaftYaTP...
> > chi2 and gamma distribution pass the KS
> > test. what should I do?
> > Linda
>
> First quick solution: look at the p-values not just at the 0/1 result
> (choose the model that passed the test with the highest p-value)
>
> Pretty often, however, you have to estimate the parameters of the
> distribution family under the null. In this case the usual KStest is
> not reliable and you might wanna look at the following thread:
>
> "KStest---statistical test"
> Date: 2003-06-11 11:32:55
>
> Now if, for example, you wanna test against a normal with parameter
> estimated from the data, then the (parametric) bootstrap procedure
> described in the paper cited there goes, more or less, as follow
> (assuming you have the stat TBX):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % X contains the data
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % perform the usual KS-test
> % based on the estimates.
> % We need the ksObs
>
> [foo1, foo2, ksObs] = kstest(X, [X normcdf(X, muH, sigmaH)], alpha);
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % Parametric-bootstrap
>
> ksBoot = zeros(nBoot,1);
> for k=1:nBoot,
> % sample from the
> % null distribution
> Xboot = normrnd(muH, sigmaH, [nSample,1]);
>
> % calculate the estimates
> [muHboot, sigmaHboot] = normfit(Xboot);
>
> % KStest on the sample
> [foo1, foo2, ksBoot(k)] = kstest(Xboot, [Xboot normcdf(Xboot,
> muHboot, sigmaHboot)]);
> end
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % calculate the p-value
>
> % smooth out the density
> % of the KS-statistics
> ksX = sort([linspace(0,max(ksBoot)+1,250), ksObs]);
> idx = find(ksX == ksObs);
> ksSmooth = ksdensity(ksBoot, ksX, 'support', 'positive');
>
> % approximate the tail
> Pvalue = trapz(ksX(idx:end), ksSmooth(idx:end));
>
> % actual decision rule
> if bPvalue < alpha,
> Reject = 1;
> else
> Reject = 0;
> end
>
> HTH

Subject: More on K-S Test...

From: tax

Date: 18 Jun, 2004 18:53:57

Message: 6 of 7

> I need to test several distributions
> (e.g. Lognormal, Rayleigh...
> So, this is how I do:
>
> % Gamma distribution
> gam_para = gamfit(data);
> p_gam = kstest(data,[data gamcdf...
> (data,gam_para(1),gam_para(2))]);
[snip]

This is exactly what I was saying: you can *not* use the plain KStest
when you *estimate* the parameters as you're doing here! KStest
assumes that the parameters are fixed in advance and do NOT depend on
the data in any way. In fact, if this is not the case, the asymptotic
distribution of the test statistics under the null can be pretty
different from the usual one (...the one kstest.m is based on). The
(parametric)bootstrap is a way to build an approximation to this
distribution.

> Do you mean when I use the parametric-
> boostrap, I can test for any
> types of distributions?

Uhmmmm, not clear what you mean. You MUST fix in both cases the
FAMILY (Gamma, Gaussian, etc etc) but bootstrapping you can estimate
the parameters still having a meaningfull test.

Now two final remarks:
1) Once you've selected via KS (+bootstrap) a certain number of
families and *if* looking at the p-values is not enough to you, well
you can always build a sequence of likelihood ratio tests!

2) remember that you're using the data multiple times to perform a
certain number of tests => you must bear in mind that the overall
level of your testing procedure is NOT the usual one => you should
consider Bonferroni-like adjustments or more refined techniques (e.g.
Tube formulas or False Discovery Rate).

HTH

P.S.:
nBoot = # of bootstrap samples (the higher the better...500 usually
works fine)

ksObs = observed KS statistics (the p-values associated to this
quantity determine the result of the test)

Subject: More on K-S Test...

From: tax

Date: 18 Jun, 2004 19:00:59

Message: 7 of 7

> However, I notice that "p" values from
> all distributions show "1"
[snip]

[H,P,KSSTAT] = KSTEST(...)

The p-value is NOT the first output of KSTEST! H is just the binary
decision
   H = 1 -> rejected;
   H = 0 -> not rejected
The p-value is the SECOND output P.

HTH

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us