Thread Subject: prior information in mean and variance in regression

Subject: prior information in mean and variance in regression

From: soms

Date: 25 Nov, 2009 00:59:22

Message: 1 of 11

Greetings !

I have a small sample available for regression analysis of the form y=f
(x). As the sample size increases the regression equation based on
small sample would not be appropriate.

We have prior knowledge about the mean and variance of y of larger
sample size. How can we use these information to update regression
parameters (coefficients) derived from the small samples?

I am new but just trying to get into bayesian regression. Does
Bayesian regression can use such prior information on mean and
variance and not on the regression coefficients? If so, is there any
pointer to any example in any article, book, code implementing this ?

Thanks
Soms

Subject: prior information in mean and variance in regression

From: ImageAnalyst

Date: 25 Nov, 2009 02:15:51

Message: 2 of 11

I'm not sure I understand, but I'm not a statistician. If you know
the "true" values (or at least more accurate ones from a larger
sample), why not just use them if you want to? Otherwise you have the
sample statistics. Which do you want to use? Are you looking for
some kind of equation that gives error as a function of the number of
samples?

I also don't understand your statement "As the sample size increases
the regression equation based on small sample would not be
appropriate." Usually if you've chosen the correct model the model
coefficients get better with increased sample size, not worse.

Subject: prior information in mean and variance in regression

From: soms

Date: 25 Nov, 2009 03:19:14

Message: 3 of 11

 Sorry, for not unclear statements.

On Nov 24, 6:15 pm, ImageAnalyst <imageanal...@mailinator.com> wrote:
> I'm not sure I understand, but I'm not a statistician.  If you know
> the "true" values (or at least more accurate ones from a larger
> sample), why not just use them if you want to?  Otherwise you have the
> sample statistics.  Which do you want to use?  Are you looking for
> some kind of equation that gives error as a function of the number of
> samples?
>

Yes, I have a small set of (x,y) data for regression. But, I know the
distribution of y i.e. mean and variance from large sample set (a
prior knowledge from some source). Now I would like to develop a
regression model for the available small set of (x,y) data in such a
way that the prediction would respect the known mean and variance of
y.


> I also don't understand your statement "As the sample size increases
> the regression equation based on small sample would not be
> appropriate."  Usually if you've chosen the correct model the model
> coefficients get better with increased sample size, not worse.

Sorry for the unclear statement. The regression equation based on
small sample size (x,y data) may be biased one and may not give good
result for the larger data set. I mean to say that the model based on
small sample size would not be good one.

Thanks

Subject: prior information in mean and variance in regression

From: ImageAnalyst

Date: 25 Nov, 2009 03:34:37

Message: 4 of 11

I still don't understand. The mean of your small sample is what it
is. You can't change it. Like I said, it is what it is. If you make
it something different, then it's not the actual mean anymore. Yes,
it won't equal the mean of your much larger sample set but that's the
way it goes. You just have to live with it. That's just basic
statistics.

And like I hinted at before, there are formulas that can predict how
much different the mean of your small sample set is from the "true"
mean. See "standard error of the mean" on this page:
http://en.wikipedia.org/wiki/Standard_error_(statistics)

Subject: prior information in mean and variance in regression

From: Richard Startz

Date: 25 Nov, 2009 04:44:53

Message: 5 of 11

On Tue, 24 Nov 2009 16:59:22 -0800 (PST), soms <soms.sharma@gmail.com>
wrote:

>Greetings !
>
>I have a small sample available for regression analysis of the form y=f
>(x). As the sample size increases the regression equation based on
>small sample would not be appropriate.
>
>We have prior knowledge about the mean and variance of y of larger
>sample size. How can we use these information to update regression
>parameters (coefficients) derived from the small samples?
>
>I am new but just trying to get into bayesian regression. Does
>Bayesian regression can use such prior information on mean and
>variance and not on the regression coefficients? If so, is there any
>pointer to any example in any article, book, code implementing this ?
>
>Thanks
>Soms

The mean and variance of y are almost irrelevant to regression
analysis. Further, there is no reason why the mean or variance of y
should be the same in different samples. Both depend on f(x).
-Dick Startz

Subject: prior information in mean and variance in regression

From: aruzinsky

Date: 26 Nov, 2009 15:29:02

Message: 6 of 11

On Nov 24, 6:59 pm, soms <soms.sha...@gmail.com> wrote:
> Greetings !
>
> I have a small sample available for regression analysis of the form y=f
> (x). As the sample size increases the regression equation based on
> small sample would not be appropriate.
>
> We have prior knowledge about the mean and variance of y of larger
> sample size. How can we use these information to update regression
> parameters (coefficients) derived from the small samples?
>
> I am new but just trying to get into bayesian regression. Does
> Bayesian regression can use such prior information  on mean and
> variance and not on the regression coefficients? If so, is there any
> pointer to any example in any article, book, code implementing this ?
>
> Thanks
> Soms

In

Yi = A*Xi + B + ei

, if you know the value of B, you would be a damn fool not to subtract
B from Yi before using regression

Yi' = A*Xi + ei

where

Yi' = Yi - B

to find A. If E(Xi) = 0 and E(ei) = 0, then B = E(Yi), your "mean".

Subject: prior information in mean and variance in regression

From: Rich Ulrich

Date: 26 Nov, 2009 19:06:32

Message: 7 of 11

On Tue, 24 Nov 2009 16:59:22 -0800 (PST), soms <soms.sharma@gmail.com>
wrote:

>Greetings !
>
>I have a small sample available for regression analysis of the form y=f
>(x). As the sample size increases the regression equation based on
>small sample would not be appropriate.

As others have mentioned, the meaning here is unclear.

A "small sample" is improved by larger N drawn randomly from
the same population.

>
>We have prior knowledge about the mean and variance of y of larger
>sample size. How can we use these information to update regression
>parameters (coefficients) derived from the small samples?

 ? again. If the "larger sample" does not represent the same
population, then you ought to have great doubts about the validity
of extrapolating. It is *always* hazardous to extrapolate outside
of the *numeric* range of the data that a regression is based on --
and that makes me wonder, "WHY does the larger sample have a
different mean or variance?" But one of the nice things about
regression coefficients is that when they are linear across the
range (as we assume), then a restricted range has the same
coefficient as the full range. That is one reason for preferring
a regression coefficient over a Pearson r for describing
associations.

>
>I am new but just trying to get into bayesian regression. Does
>Bayesian regression can use such prior information on mean and
>variance and not on the regression coefficients? If so, is there any
>pointer to any example in any article, book, code implementing this ?

Well, I don't know what "Bayesian regression" is.

But I don't think you have a sensible parallel in mind.
Even for means and variances, I would be surprised if
there was ever a recommendation to take a subsample
in order to set up Priors before using the full data.

And the means and variances only affect the regression
coefficients when you have failures of your basic assumptions
of linearity and homogeneity -- that is, "bad design".


--
Rich Ulrich

Subject: prior information in mean and variance in regression

From: Bruce Weaver

Date: 26 Nov, 2009 19:42:44

Message: 8 of 11

On Nov 24, 11:44 pm, Richard Startz <richardsta...@comcast.net> wrote:

> The mean and variance of y are almost irrelevant to regression
> analysis. Further, there is no reason why the mean or variance of y
> should be the same in different samples. Both depend on f(x).
> -Dick Startz

Dick, I've been trying for a while to understand what you're getting
at in that first sentence, but so far no luck. I don't understand how
the mean and variance of Y can be "almost irrelevant" when linear
regression entails partitioning the sum of the squared deviations
about the MEAN of Y (i.e., the numerator of the variance) into
SS_regression and SS_residual. What am I missing?

Thanks,
Bruce

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Subject: prior information in mean and variance in regression

From: Richard Startz

Date: 26 Nov, 2009 20:58:11

Message: 9 of 11

On Thu, 26 Nov 2009 11:42:44 -0800 (PST), Bruce Weaver
<bweaver@lakeheadu.ca> wrote:

>On Nov 24, 11:44 pm, Richard Startz <richardsta...@comcast.net> wrote:
>
>> The mean and variance of y are almost irrelevant to regression
>> analysis. Further, there is no reason why the mean or variance of y
>> should be the same in different samples. Both depend on f(x).
>> -Dick Startz
>
>Dick, I've been trying for a while to understand what you're getting
>at in that first sentence, but so far no luck. I don't understand how
>the mean and variance of Y can be "almost irrelevant" when linear
>regression entails partitioning the sum of the squared deviations
>about the MEAN of Y (i.e., the numerator of the variance) into
>SS_regression and SS_residual. What am I missing?
>
>Thanks,
>Bruce

Bruce:

Let me try to be more clear. The regression models the CONDITIONAL
mean of y, which depends on X. The intercept isn't the mean of y and
the standard error of the regression estimates the standard deviation
of the error term, not of y. That's what I meant by saying the mean
and variance of y are almost irrelevant. (I guess the "almost" because
there might be some way of using the information to place a constraint
on the coefficients, although I can't see how.)

In fact, samples drawn for the same regression equation but with
different X values will have different means and variances of y.

Best,
Dick

Subject: prior information in mean and variance in regression

From: Rich Ulrich

Date: 27 Nov, 2009 23:43:41

Message: 10 of 11

On Thu, 26 Nov 2009 12:58:11 -0800, Richard Startz
<richardstartz@comcast.net> wrote:

>On Thu, 26 Nov 2009 11:42:44 -0800 (PST), Bruce Weaver
><bweaver@lakeheadu.ca> wrote:
>
>>On Nov 24, 11:44 pm, Richard Startz <richardsta...@comcast.net> wrote:
>>
>>> The mean and variance of y are almost irrelevant to regression
>>> analysis. Further, there is no reason why the mean or variance of y
>>> should be the same in different samples. Both depend on f(x).
>>> -Dick Startz
>>
>>Dick, I've been trying for a while to understand what you're getting
>>at in that first sentence, but so far no luck. I don't understand how
>>the mean and variance of Y can be "almost irrelevant" when linear
>>regression entails partitioning the sum of the squared deviations
>>about the MEAN of Y (i.e., the numerator of the variance) into
>>SS_regression and SS_residual. What am I missing?
>>
>>Thanks,
>>Bruce
>
>Bruce:
>
>Let me try to be more clear. The regression models the CONDITIONAL
>mean of y, which depends on X. The intercept isn't the mean of y and
>the standard error of the regression estimates the standard deviation
>of the error term, not of y. That's what I meant by saying the mean
>and variance of y are almost irrelevant. (I guess the "almost" because
>there might be some way of using the information to place a constraint
>on the coefficients, although I can't see how.)
>
>In fact, samples drawn for the same regression equation but with
>different X values will have different means and variances of y.
>
>Best,
>Dick

Bruce, Dick,
I was thinking of what Dick wrote when I wrote my post -
I thought that I knew what he was intending, and I tried
to sharpen it a little. But I didn't do a great job of saying
it, either. Dick does somewhat better, above.

Here is another attempt.

If you take a subset based on y, you *should* get the
same intercept and regression line. A test on intercept
is what we use to test for mean-differences in samples;
a test on regression slopes is what we use to test for an
assumption of homogeneity of regressions. Thus, if
assumptions are met - mainly, for linearity of regressions -
you conclude what Dick posted, "The mean and variance of
y are almost irrelevent." True, the tests are affected.

Reduced precision or power is what you get from a reduced
N or a smaller range of y. If you pick extreme y's, you get
more power for a specific N.

I was trying to say all that, in addition to warning against
extrapolating to predicting y's when the new x's are outside
the range of what you have seen before. And trying to
warn that if you don't have an explanation for "different"
means and variances in what is supposed to be a wider
version of your original subsample, you should probably
cast a suspicious eye at your sampling procedures.



--
Rich Ulrich

Subject: prior information in mean and variance in regression

From: Tom Lane

Date: 30 Nov, 2009 16:47:23

Message: 11 of 11

> I am new but just trying to get into bayesian regression. Does
> Bayesian regression can use such prior information on mean and
> variance and not on the regression coefficients? If so, is there any
> pointer to any example in any article, book, code implementing this ?

Soms, I'm not sure this is what you want, but I am aware of research in
which the software gets prior information about the coefficients by asking
questions about the response. I saw a preliminary version of the paper years
ago, but I believe it was published as "Interactive elicitation of opinion
for a normal linear model" by Kadane et al., Journal of the American
Statistical Association, Dec 1980.

-- Tom

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com