Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Linear Regression without a constant term

Subject: Linear Regression without a constant term

From: Vivek Saxena

Date: 24 Mar, 2010 12:03:05

Message: 1 of 12

Hi,

Is it possible to perform a linear regression in MATLAB with no constant term?

I have data for 9 regressors and I have to fit a multiple linear regression model of Y (the response) on these 9 regressors without an intercept. That is,

Y = x_1*gamma_1 + x_2*gamma_2 + ..... + x_9*gamma_9 + epsilon

I noticed that regstats automatically appends a column of 1s to the X matrix (corresponding to the 0th regression coefficient being the intercept in the usual formulation), whereas regress assumes that the input X matrix already has such a structure. The documentation states that regress will produce an incorrect model if the constant term is not present.

Thanks

Cheers
Vivek.

Subject: Linear Regression without a constant term

From: Peter Perkins

Date: 24 Mar, 2010 12:18:35

Message: 2 of 12

On 3/24/2010 8:03 AM, Vivek Saxena wrote:
> I noticed that regstats automatically appends a column of 1s to the X
> matrix (corresponding to the 0th regression coefficient being the
> intercept in the usual formulation),

It's true that by passing in 'linear' to REGSTATS, you do get an intercept term, but you can specify any model you want using a terms matrix. In you case, you want a linear term for each of 9 predictors, no intercept or interactions, and no higher order terms, so the terms matrix is just eye(9).


> whereas regress assumes that the
> input X matrix already has such a structure. The documentation states
> that regress will produce an incorrect model if the constant term is not
> present.

I think you're referring to this:

     X should include a column of ones so that the model contains a constant
     term. The F statistic and p value are computed under the assumption
     that the model contains a constant term, and they are not correct for
     models without a constant. The R-square value is one minus the ratio of
     the error sum of squares to the total sum of squares. This value can
     be negative for models without a constant, which indicates that the
     model is not appropriate for the data.

The model itself, i.e., the estimated coefficients and their CIs, are estimated correctly when the model does not include an intercept. It's only the F statistic and the R^2 that become invalid when there's no intercept. Both of these goodness-of-fit statistics assume that the model y = constant + error is a special case of the model you're fitting, and if there's no intercept, it isn't.

Another possibility is to use LSCOV.

Hope this helps.

Subject: Linear Regression without a constant term

From: Jos (10584)

Date: 24 Mar, 2010 12:22:07

Message: 3 of 12

"Vivek Saxena" <maverick280857@yahoo.com> wrote in message <hocv1p$gep$1@fred.mathworks.com>...
> Hi,
>
> Is it possible to perform a linear regression in MATLAB with no constant term?
>
> I have data for 9 regressors and I have to fit a multiple linear regression model of Y (the response) on these 9 regressors without an intercept. That is,
>
> Y = x_1*gamma_1 + x_2*gamma_2 + ..... + x_9*gamma_9 + epsilon
>
> I noticed that regstats automatically appends a column of 1s to the X matrix (corresponding to the 0th regression coefficient being the intercept in the usual formulation), whereas regress assumes that the input X matrix already has such a structure. The documentation states that regress will produce an incorrect model if the constant term is not present.
>
> Thanks
>
> Cheers
> Vivek.

Construct a regression matrix without a column of ones. Example:

% data
  x1 = cumsum(rand(1,10)) ;
  x2 = cumsum(rand(size(x1))) ;
  CF = [20 50] ;
  y = CF(1) * x1 + CF(2) * x2 + randn(size(x1))/10 ;

%engine
  M = [x1(:) x2(:)]
  fittedCF = M \ y(:)

hth
Jos

Subject: Linear Regression without a constant term

From: Torsten Hennig

Date: 24 Mar, 2010 12:32:51

Message: 4 of 12

> Hi,
>
> Is it possible to perform a linear regression in
> MATLAB with no constant term?
>
> I have data for 9 regressors and I have to fit a
> multiple linear regression model of Y (the response)
> on these 9 regressors without an intercept. That is,
>
> Y = x_1*gamma_1 + x_2*gamma_2 + ..... + x_9*gamma_9 +
> epsilon
>
> I noticed that regstats automatically appends a
> column of 1s to the X matrix (corresponding to the
> 0th regression coefficient being the intercept in the
> usual formulation), whereas regress assumes that the
> input X matrix already has such a structure. The
> documentation states that regress will produce an
> incorrect model if the constant term is not present.
>
> Thanks
>
> Cheers
> Vivek.

Say you have measurements
(x_1)_i,...,(x_9)_i, y_i (i=1,...,n).
Define a matrix A with n rows and 9 columns by
A(i,j) = (x_j)_i (j=1,...,9 ; i=1,...,n))
Define a vector b by
b(i) = y_i (i=1,...,n).
Then the MATLAB command
gamma = A\b
gives your regression coefficients gamma_j.

Best wishes
Torsten.

Subject: Linear Regression without a constant term

From: Vivek Saxena

Date: 24 Mar, 2010 12:43:08

Message: 5 of 12

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <hocvur$15d$1@fred.mathworks.com>...
>
> The model itself, i.e., the estimated coefficients and their CIs, are estimated correctly when the model does not include an intercept. It's only the F statistic and the R^2 that become invalid when there's no intercept. Both of these goodness-of-fit statistics assume that the model y = constant + error is a special case of the model you're fitting, and if there's no intercept, it isn't.

Thanks for your reply Peter. Usually when multicollinearity is to be detected and removed, one begins with a unit length model (centered and scaled), which contains no constant term. [At least that is what we have been taught.] Does MATLAB include a command for standardizing the regression model?

Also, if the design matrix input to REGSTATS is of the form [x11, x12, ...; x21, x22, ...], how does REGSTATS know whether or not a constant term exists? You say that the estimated coefficients and their CIs are estimated correctly even when the model does not include an intercept. But, the models are entirely different in the two cases. How do I know that beta(1) is not an intercept, but the regression coefficient for x1?

Subject: Linear Regression without a constant term

From: Vivek Saxena

Date: 24 Mar, 2010 13:16:05

Message: 6 of 12

Torsten Hennig <Torsten.Hennig@umsicht.fhg.de> wrote in message <897885874.433029.1269434001451.JavaMail.root@gallium.mathforum.org>...
> Say you have measurements
> (x_1)_i,...,(x_9)_i, y_i (i=1,...,n).
> Define a matrix A with n rows and 9 columns by
> A(i,j) = (x_j)_i (j=1,...,9 ; i=1,...,n))
> Define a vector b by
> b(i) = y_i (i=1,...,n).
> Then the MATLAB command
> gamma = A\b
> gives your regression coefficients gamma_j.
>
> Best wishes
> Torsten.

Torsten, that is not correct. The regression coefficients are solutions to the least square equation, not A^-1b. The latter approach is simply not applicable because of the presence of error in each measurement (statistical, not deterministic).

Subject: Linear Regression without a constant term

From: Torsten Hennig

Date: 24 Mar, 2010 13:35:10

Message: 7 of 12

> Torsten Hennig <Torsten.Hennig@umsicht.fhg.de> wrote
> in message
> <897885874.433029.1269434001451.JavaMail.root@gallium.
> mathforum.org>...
> > Say you have measurements
> > (x_1)_i,...,(x_9)_i, y_i (i=1,...,n).
> > Define a matrix A with n rows and 9 columns by
> > A(i,j) = (x_j)_i (j=1,...,9 ; i=1,...,n))
> > Define a vector b by
> > b(i) = y_i (i=1,...,n).
> > Then the MATLAB command
> > gamma = A\b
> > gives your regression coefficients gamma_j.
> >
> > Best wishes
> > Torsten.
>
> Torsten, that is not correct. The regression
> coefficients are solutions to the least square
> equation, not A^-1b. The latter approach is simply
> not applicable because of the presence of error in
> each measurement (statistical, not deterministic).

gamma = A\b
is the least-squares solution to the (overdetermíned)
linear system A*gamma = b.

Best wishes
Torsten.

Subject: Linear Regression without a constant term

From: Vivek Saxena

Date: 24 Mar, 2010 13:52:05

Message: 8 of 12

Torsten Hennig <Torsten.Hennig@umsicht.fhg.de> wrote in message <911486177.433479.1269437740916.JavaMail.root@gallium.mathforum.org>...
> > Torsten Hennig <Torsten.Hennig@umsicht.fhg.de> wrote
> gamma = A\b
> is the least-squares solution to the (overdetermíned)
> linear system A*gamma = b.
>
> Best wishes
> Torsten.

Oh, isn't it just A^-1 b? Hmm, I didn't know. Thanks for pointing out. I use A\b for A^-1 b because MATLAB warns me if I use inv(A)*b. I didn't know its the least square solution.

Subject: Linear Regression without a constant term

From: dpb

Date: 24 Mar, 2010 14:34:36

Message: 9 of 12

Vivek Saxena wrote:
...

> Oh, isn't it just A^-1 b? Hmm, I didn't know. ...

doc mldivide

--

Subject: Linear Regression without a constant term

From: Peter Perkins

Date: 24 Mar, 2010 18:34:17

Message: 10 of 12

On 3/24/2010 8:43 AM, Vivek Saxena wrote:

> Thanks for your reply Peter. Usually when multicollinearity is to be
> detected and removed, one begins with a unit length model (centered and
> scaled), which contains no constant term. [At least that is what we have
> been taught.] Does MATLAB include a command for standardizing the
> regression model?

I don't know about "usually", but you can certainly call ZSCORES on your data before fittgin the regression. RIDGE, which does ridge regression, does this automatically for you, but not functions like REGRESS.


> Also, if the design matrix input to REGSTATS is of the form [x11, x12,
> ...; x21, x22, ...], how does REGSTATS know whether or not a constant
> term exists? You say that the estimated coefficients and their CIs are
> estimated correctly even when the model does not include an intercept.
> But, the models are entirely different in the two cases. How do I know
> that beta(1) is not an intercept, but the regression coefficient for x1?

Because REGSTATS does not take a design matrix as an input. It takes a data matrix, and it's the third input that determines how that is turned into a design matrix.

>> help regstats
  REGSTATS Regression diagnostics for linear models.
[snip]
     The optional input MODEL specifies how the design matrix is created
     from DATA. The design matrix is the matrix of term values for each
     observation. MODEL can be any of the following strings:
  
       'linear' Constant and linear terms (the default)
       'interaction' Constant, linear, and interaction terms
       'quadratic' Constant, linear, interaction, and squared terms
       'purequadratic' Constant, linear, and squared terms
  
     Alternatively, MODEL can be a matrix of model terms accepted by the
     X2FX function. See X2FX for a description of this matrix and for
     a description of the order in which terms appear. You can use this
     matrix to specify other models including ones without a constant term.

Subject: Linear Regression without a constant term

From: Renwen Lin

Date: 11 Sep, 2012 08:12:08

Message: 11 of 12

mdl = LinearModel.fit(X,y,'Intercept',false);


"Vivek Saxena" wrote in message <hocv1p$gep$1@fred.mathworks.com>...
> Hi,
>
> Is it possible to perform a linear regression in MATLAB with no constant term?
>
> I have data for 9 regressors and I have to fit a multiple linear regression model of Y (the response) on these 9 regressors without an intercept. That is,
>
> Y = x_1*gamma_1 + x_2*gamma_2 + ..... + x_9*gamma_9 + epsilon
>
> I noticed that regstats automatically appends a column of 1s to the X matrix (corresponding to the 0th regression coefficient being the intercept in the usual formulation), whereas regress assumes that the input X matrix already has such a structure. The documentation states that regress will produce an incorrect model if the constant term is not present.
>
> Thanks
>
> Cheers
> Vivek.

Subject: Linear Regression without a constant term

From: Greg Heath

Date: 19 Sep, 2012 03:30:21

Message: 12 of 12

"Vivek Saxena" wrote in message <hocv1p$gep$1@fred.mathworks.com>...
> Hi,
>
> Is it possible to perform a linear regression in MATLAB with no constant term?
>
> I have data for 9 regressors and I have to fit a multiple linear regression model of Y (the response) on these 9 regressors without an intercept. That is,
>
> Y = x_1*gamma_1 + x_2*gamma_2 + ..... + x_9*gamma_9 + epsilon
>
> I noticed that regstats automatically appends a column of 1s to the X matrix (corresponding to the 0th regression coefficient being the intercept in the usual formulation), whereas regress assumes that the input X matrix already has such a structure. The documentation states that regress will produce an incorrect model if the constant term is not present.

Remove the mean from Y and X and use backslash. The resulting constant coefficient will be negligible.

If you use regstats or regress, the resulting R^2 and other summary statistics are not applicable.

Hope this helps.

Greg

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us