Thread Subject: Statistical Best-Fit Testing

Subject: Statistical Best-Fit Testing

From: Linda

Date: 8 Jun, 2004 01:59:27

Message: 1 of 6

Hi!

I have some empirical data measured from real-time measurement.
I would like to fit my data to an appropriate distribution.
I have using the various statistical distribution fitting functions like
gamfit, lognfit, normfit. raylfit and wblfit available from the Statistical
toolbox to find the relevant parameters (such as mean, standard deviation
etc). Then, I am using the corresponding random number generator e.g.
gamrnd, lognrnd, normrnd, raylrnd and wblrnd to generate a set of random
numbers in order me to plot their CDFs on top of the empirical CDF.

However by doing such a way, I can only compare the best-fit by just looking
a these series of graphs, which one looks closest to the empirical CDF.
My question is how do I compare the goodness of fit numerically?
Any functions in Matlab allows me to do that?

Many thanks.

Regards,
Linda

Subject: Statistical Best-Fit Testing

From: Polsak

Date: 8 Jun, 2004 03:34:18

Message: 2 of 6

try help kstest

kstest = kolmogorov-smirnov test

Subject: Statistical Best-Fit Testing

From: David

Date: 8 Jun, 2004 11:40:49

Message: 3 of 6

On Tue, 8 Jun 2004 01:59:27 -0400, "Linda" <lindah74uk@yahoo.co.uk>
wrote:

>However by doing such a way, I can only compare the best-fit by just looking
>a these series of graphs, which one looks closest to the empirical CDF.
>My question is how do I compare the goodness of fit numerically?
>Any functions in Matlab allows me to do that?
>
>Many thanks.
You are using the wrong tool. Curve fitting toolbox is the right tool.
"cftool" from "curve fitting toolbox" is a complete data importing,
preprocessing, fitting, postprocessing GUI.


regards
David

Subject: Statistical Best-Fit Testing

From: Tom Lane

Date: 8 Jun, 2004 10:12:17

Message: 4 of 6

Linda, others have given you two pieces of advice:

1. Use the Curve Fitting toolbox. I don't recommend this. That toolbox is
intended to fit curves to 2-dimensional scatter plots, not to 1-dimensional
distributions. While some might try to use a curve fitting technique on a
histogram or other 2-dimensional representation of a distribution, in my
opinion it's not the best way.

2. Use kstest. This is better as long as you don't take the p-value
seriously. You could use kstest just to get the Kolmogorov-Smirnov distance
KSSTAT, and pick the one that is smallest. You shouldn't use the p-value,
because that requires that you test against a pre-determined distribution
without estimating the parameters.

Another alternative would be for you to evaluate the likelihood function and
compute AIC, BIC, or some similar thing that adjusts for the number of
parameters you estimated. These quantities are not computed directly by
Statistics Toolbox functions, but they should be relatively simple to
calculate.

One more alternative would be to abandon the idea of fitting a named
distribution, and just use the ksdensity function or a histogram to
represent the distribution.

Finally, you mention use generate random numbers in order to compare a cdf
to the empirical cdf of your data. Actually you can just compute the
theoretical cdf directly using gamcdf, logncdf, etc. and the parameters you
estimate.

Let me know if you need more information on any of these topics.

-- Tom

"Linda" <lindah74uk@yahoo.co.uk> wrote in message
news:B36F26EAD3B47275087D044085872D14@in.webx.raydaftYaTP...
> Hi!
>
> I have some empirical data measured from real-time measurement.
> I would like to fit my data to an appropriate distribution.
> I have using the various statistical distribution fitting functions like
> gamfit, lognfit, normfit. raylfit and wblfit available from the
Statistical
> toolbox to find the relevant parameters (such as mean, standard deviation
> etc). Then, I am using the corresponding random number generator e.g.
> gamrnd, lognrnd, normrnd, raylrnd and wblrnd to generate a set of random
> numbers in order me to plot their CDFs on top of the empirical CDF.
>
> However by doing such a way, I can only compare the best-fit by just
looking
> a these series of graphs, which one looks closest to the empirical CDF.
> My question is how do I compare the goodness of fit numerically?
> Any functions in Matlab allows me to do that?
>
> Many thanks.
>
> Regards,
> Linda
>
>

Subject: Statistical Best-Fit Testing

From: Peter Perkins

Date: 8 Jun, 2004 10:27:37

Message: 5 of 6

> You are using the wrong tool. Curve fitting toolbox is the right tool.
> "cftool" from "curve fitting toolbox" is a complete data importing,
> preprocessing, fitting, postprocessing GUI.

Actually, the Curve Fitting Toolbox is probably not what the OP wants. He is
(I think) doing "distribution fitting", i.e., fitting a univariate
distribution to a single variable. "Curve fitting" means that you have two
variables, a predictor and a response, and you want to model one as a function
of the other.

But the R14 release of the Stats Toolbox has a new GUI tool, called DFITTOOL,
that is similar to the CFTOOL, but designed for distribution fitting. It has
things like CDF plots and probability plots built in.

Polsak suggested KSTEST, and that's right, but with a caveat. The theory
behind K-S tests only allows (1) testing data against a fully-specified model,
such as Weibull(1,1), and (2) testing two sets of data against each other,
without specifying any model. The first is KSTEST, the second is KSTEST2.
There is no theory in general to test goodness-of-fit of a set of data against
a distribution _family_.

The distributions the OP mentioned are all either location-scale families, or
exponentials of that, except the Gamma. There is a convenient Monte-Carlo
strategy for testing against such a family, and in fact LILLIETEST implements
that for the normal. I can probably provide something more general. Coping
with a family like the Gamma is more difficult.

Hope this helps.

- Peter Perkins
   The MathWorks, Inc.

Subject: Statistical Best-Fit Testing

From: Xiaoxiao Mao

Date: 24 Feb, 2009 09:52:02

Message: 6 of 6

i am facing the same problem ,but how to calculate AIC?

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com