Thread Subject: Regress() command

Subject: Regress() command

From: Shirley Zheng

Date: 22 Mar, 2010 09:30:08

Message: 1 of 2

Sorry, I am very new to Matlab and I know that its a very simple question but I don't really understand: For the regress() command, it says 'If the columns of X are linearly dependent, regress obtains a basic solution by setting the maximum number of elements of b to zero'. Can anybody explain what is '..setting the maximum number of elements of b to zero'?

Thank you very much!

Subject: Regress() command

From: Peter Perkins

Date: 22 Mar, 2010 13:16:21

Message: 2 of 2

On 3/22/2010 5:30 AM, Shirley Zheng wrote:
> Sorry, I am very new to Matlab and I know that its a very simple
> question but I don't really understand: For the regress() command, it
> says 'If the columns of X are linearly dependent, regress obtains a
> basic solution by setting the maximum number of elements of b to zero'.
> Can anybody explain what is '..setting the maximum number of elements of
> b to zero'?

When X has columns that are linearly dependent, there is no unique solution to the least squares problem -- in fact, there are an infinite number of solutions (that comes from basic linear algebra). REGRESS is based on MATLAB's backslash operator "\", and out of the infinite possible solutions, backslash returns the "basic solution", i.e., the one that has "as many zero coefficients as possible". By setting some of the coefs to zero, backslash in effect ignores the corresponding columns of X, and it can do that because they provide no additional information beyond that give by the other columns -- the ignored columns can be constructed as linear combinations of the others. If X has m columns and only q of them are linearly independent, then m-q coefs in b can be set to zero.

For example, construct a full column rank X, and a random y, and regress y on X:

>> n = 7;
>> X1 = [ones(n,1) randn(n,2)];
>> y = randn(n,1);
>> b1= regress(y,X1)
b1 =
       -1.1733
      -0.21145
      -0.65243

Now add two columns to X that are linearly dependent on the existing columns, and regress y on that:
>> X2 = [X1 X1(:,1)+X1(:,3) X1(:,2)+X1(:,3)];
>> b2 = regress(y,X2)
Warning: X is rank deficient to within machine precision.
> In regress at 82
b2 =
       -1.1733
      -0.21145
      -0.65243
             0
             0

Notice that REGRESS has returns exact zeros as the coefficients associated with those two new columns, i.e., it has ignored those two columns. Thus, the remaining coefs are the same as in the first regression. That usually does _not_ happen. This is more typical:

>> X1 = [ones(n,1) randn(n,2)];
>> b1= regress(y,X1)
b1 =
      -0.47449
       0.98807
       -0.1507
>> X2 = [X1 X1(:,1)+X1(:,3) X1(:,2)+X1(:,3)];
>> b2 = regress(y,X2)
Warning: X is rank deficient to within machine precision.
> In regress at 82
b2 =
       0.66429
             0
             0
       -1.1388
       0.98807

This time, REGRESS has ignored columns 2 and 3. That choice of where the zeros go is not based on anything statistically meaningful, it is simple choices that backslash makes based on numerical concerns.

In either case, if you multiply X1*b1 and X2*b2, you'll find that the fitted values are the same.

There are other possible ways to deal with co-linearity. One is to use PINV to get what's known as the "minimum norm solution". Another is to choose columns of X to ignore based on criteria that are more statistical. STEPWISEFIT, for example.

Hope this helps.

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com