Path: news.mathworks.com!not-for-mail
From: "Tom Lane" <tlane@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: fitting the norm CDF
Date: Mon, 7 Jan 2008 17:00:57 -0500
Organization: The MathWorks, Inc.
Lines: 43
Message-ID: <flu7ep$hpl$1@fred.mathworks.com>
References: <flu5ld$42j$1@fred.mathworks.com>
Reply-To: "Tom Lane" <tlane@mathworks.com>
NNTP-Posting-Host: lanet.dhcp.mathworks.com
X-Trace: fred.mathworks.com 1199743257 18229 172.31.57.120 (7 Jan 2008 22:00:57 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 7 Jan 2008 22:00:57 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
Xref: news.mathworks.com comp.soft-sys.matlab:444743



>I tried using the glmfit to estimate the parameters for the
> normal distribution. However, the fit didn't look right. I
> presume that I set up the GLMFIT wrongly. Any suggestions
> would be appreciate it.
> x =
> [0.0246;0.04;0.0585;0.0801;0.0926;0.1019;0.1083;0.1178;0.1243;0.1277;0.1342;0.1376;0.1441;0.1509;0.1544;0.1642;0.1709;0.1806;0.1874;0.194;0.1975;0.204;0.2075;0.217;0.2234;0.2452;0.2577;0.2701;0.2887;0.3103;0.3318;0.3534;0.3749;];
> y =
> [0.0045;0.0045;0.009;0.0224;0.0404;0.0583;0.0852;0.1166;0.148;0.1883;0.2242;0.2601;0.3004;0.3677;0.4126;0.4843;0.5381;0.5964;0.6592;0.7175;0.7623;0.8027;0.843;0.8744;0.9013;0.9372;0.9552;0.9686;0.9821;0.991;1;1;1;];

Pete, the problem as you set it up estimates the parameters for a linear fit 
of y as a function of x, under the assumption that the errors in y come from 
a normal distribution.

When I look at a plot of your raw data, the relationship is not linear.  It 
appears to resemble the situation where y is the proportion of responses of 
some sort (call these "successes") for various x values.  That proportion 
generally seems to follow the shape of a normal cdf.  Is that accurate?

If it is, you need to enter your response Y as a two-column matrix with each 
row giving the number of successes and the number of trials for one 
observation.  For the purposes of the following, I set the number of trials 
to 100 for each observation, and I used your y to compute the number of 
responses.  It turns out glmfit doesn't force this number to be an integer; 
maybe non-integer values make sense in some situations.  The "probit" option 
requests a curve in the form of a normal cdf.

Try this and see if it does what you need:

Y = [y*100, repmat(100,size(y))];
b = glmfit( x, Y, 'binomial','link','probit');
refx = linspace( min(x), max(x), 50);
Xb = b(1)+ b(2).*refx;
yfit = glmval(b,refx,'probit');
figure
hold on
plot( x, y, 'ob')
th = plot( refx, yfit, '--r', 'linewidth', 2.5);
box on
hold off

-- Tom