Path: news.mathworks.com!not-for-mail
From: "Tom Lane" <tlane@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: fitting a Gamma cdf to my data
Date: Wed, 8 Oct 2008 15:49:24 -0400
Organization: The MathWorks, Inc
Lines: 45
Message-ID: <gcj2s4$b8l$1@fred.mathworks.com>
References: <gcge9p$sfv$1@fred.mathworks.com>
Reply-To: "Tom Lane" <tlane@mathworks.com>
NNTP-Posting-Host: lanet.dhcp.mathworks.com
X-Trace: fred.mathworks.com 1223495364 11541 172.31.57.151 (8 Oct 2008 19:49:24 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 8 Oct 2008 19:49:24 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579
Xref: news.mathworks.com comp.soft-sys.matlab:494206


> I'm interested in fitting a Gamma cdf to my data which looks like:
> particle_size=[2 50 ... 2000]
> particle_fraction=[0.08 0.2 ... 1]

 Reza, the following demo contains a section showing one way to fit a gamma 
distribution:

http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html

The point of this demo is to offer an alternative to maximum likelihood 
estimation, especially in cases where mle doesn't work.

If the variables you describe above are just a summary of some other data, 
and if you can get at that other data, I recommend you do that instead. 
Then you can use gamfit to get the maximum likelihood estimates.

Even if you can't get raw data, you could maximize the likelihood based on 
this summary data.  Here's an example, where I generate some gamma data but 
retain only bin centers and cumulative frequencies.  I can use fminsearch to 
mimimize the negative log likelihood weighted by the frequencies.  The blue 
things are the raw data and a fit based on them, the red things are the 
binned data and a fit based on the bin frequencies.

% Raw data with fit
x = gamrnd(2,100,100,1);
p1 = gamfit(x)

% Summary data with fit
[n,c] = hist(x);       % c = rounded data
F = cumsum(n/sum(n));  % F = cumulative proportions
f = diff([0,F]);       % f = proportions
p2 = fminsearch(@(ab) -sum(f.*log(gampdf(c,ab(1),ab(2)))),[1,mean(c)])

% How does it look?
ecdf(x); line(xx,gamcdf(xx,p1(1),p1(2)),'linestyle',':')
hold on; set(stairs(c,F),'color','r'); hold off
line(xx,gamcdf(xx,p2(1),p2(2)),'color','r','linestyle',':')

This doesn't give you any measures of uncertainty for the estimates.  But 
I'm not sure how I'd compute those anyway, given that you don't seem to have 
a notion of a "sample size" in your example.

-- Tom