Path: news.mathworks.com!not-for-mail
From: "Tom Lane" <tlane@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: glmfit 'offset' parameter
Date: Wed, 30 Jul 2008 21:43:44 -0400
Organization: The MathWorks, Inc
Lines: 36
Message-ID: <g6r5cg$7hr$1@fred.mathworks.com>
References: <g6qpk6$egh$1@fred.mathworks.com>
Reply-To: "Tom Lane" <tlane@mathworks.com>
NNTP-Posting-Host: vpn-client-122-145-ah.mathworks.com
X-Trace: fred.mathworks.com 1217468624 7739 144.212.122.145 (31 Jul 2008 01:43:44 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 31 Jul 2008 01:43:44 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
Xref: news.mathworks.com comp.soft-sys.matlab:482813



> I don't get the purpose/usage of the 'offset' parameter in
> the glmfit function doc.

Joe, in regular regression suppose you want to fit a model like this:

   y = a + x1 + b*x2 + error

In other words, you know the coefficient of x1 (we can assume it is 1 here). 
It's easy enough to fit this by least squares by re-writing

   y - x1 = a + b*x + error

For generalized linear models, the response isn't a simple sum of a linear 
function of the predictors with additive errors, so it's not possible to 
re-write in the same way.

Here's a semi-realistic example where this would be useful.  Suppose the 
number of defects on a surface should be proportional to the surface area, 
or the number of events in an interval of time should be proportional to the 
length of time.  The count of defects or events might reasonably be modeled 
by a Poisson distribution.  If we subtracted or divided off the area or 
time, we'd get something that might not even be integer valued.  Instead, if 
we model the expected value as

    E[y] = area * exp(a + b*x)

we can take logs to get

   log(E[y]) = log(area) + a + b*x

The offset parameter allows us to handle the term that doesn't have a 
coefficient to be estimated.

-- Tom