No BSD License  

Highlights from
mregress

4.0

4.0 | 13 ratings Rate this file 13 Downloads (last 30 days) File Size: 7.35 KB File ID: #202

mregress

by Tony Reina

 

25 Jan 1999

Performs multiple linear regression.

| Watch this File

File Information
Description

Performs multiple linear regression. Includes option for setting the
y-intercept to zero. Returns the F-statistic, p-value for the F,
t-distribution for the coefficients, and covariance matrix for the
regression.

Acknowledgements
This submission has inspired the following:
QSAR/QSPR Search Algorithms Toolbox
MATLAB release MATLAB 5.2 (R10)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (15)
11 Jan 2002 tjeerd boonstra

I couldn't get it to work, because of unknown function tcdf.

27 Apr 2003 M First

Same problem - also fcdf

20 Oct 2003 Gerardo Rosiles

If you just need a good regression function that returns slope and intercept, edit the function and take out most all of the code lines towards the end. Edit the function output parameters and that's it. Most of the output parameters are confidence measures on the slope and intercept parameters. So if the data you are trying to fit looks pretty linear yo can do with most of the other stuff.

It would be good if the author provided its own versions of the missing functions.

07 Apr 2004 Jeroen Hogema

The fcdf and tcdf are F and t cumulative distributions from the Statistics
Toolbox.
You can replace these functions by the corrresponding ones from http://www.maths.lth.se/matstat/stixbox/

05 Oct 2004 hasan taherkhani  
28 Mar 2005 Boris Bernhardt

An adjusted multiple coefficient of determination is missing. It is better for assessing overall model
utility than a non-adjusted R^2 since
it takes into account the relationship
between number of data points and
number of predictors.
  
However, it can easily be added. Just paste
the following code below the formula for R_sq:

 R_sq_ad = 1 - ((n-1)/(n-(k+1))) * (1 - R_sq)

12 Jul 2005 Ned Dochtermann

Almost perfect for what I needed.

However how does one work in interactions into the model statement?
I'm a low skill novice matlab user so my lone solution failed.
Thanks

13 Feb 2006 sp plan

This is what I am looking for. It is different output from the Matlab Statistics Toolbox function called 'regress.m'. So thanks, for sharing the codes.

22 Jan 2007 ananda sarkar

qsar analysis

30 Aug 2007 Teresa Zhuo

thank you!

28 Feb 2008 LQ Jiang

Great code. Thank you so much!

10 Apr 2008 Matan Beery

Saved me a lot of time. Thank you so much!

15 May 2008 Mustafa A.

Perfect Thanks alot

14 Jun 2009 Muhammad Tauha Ali

Thanks alot for the formulae

14 Jun 2009 John D'Errico

Ok. I had to look here. The problem is, the formulas are poor. Her is a paste of part of the code:

===========================================

% Solve for the regression coefficients using ordinary least-squares
% ------------------------------------------------------------------

   if (INTCPT == 1)
     X = [ ones(n,1) X ] ;
   end
 
    XTXI = inv(X' * X);
    Coefficients = XTXI * X' * Y ;

===========================================

See that the normal equations are used, as well as (shudder) inv.

This is the WRONG way to solve this problem. It is a mistake that many novices to regression make. It is a mistake that many people who have done regression for years make, simply because they have never been told they are doing it the wrong way. It is a mistake that many authors of statistics texts make, simply because nobody has ever told them either.

The normal equations, are

   C = inv(X' * X) * X' * Y

are ONE way to solve the over-determined problem

   X*C = Y

They are a POOR solution. First of all, this formula uses inv. Somewhat better would be to write it as

   C = (X' * X) \ (X' * Y)

The use of backslash here avoids the use of inv. But it still fails because the normal equations are still involved, just in another form.

Look at a better solution.

   C = X \ Y;

This uses the built-in backslash to solve the problem. How would backslash solve it internally? (I'll assume that no pivoting is done to make things simple.) Suppose we choose to form a QR factorization of X.

   X = Q*R

Here, Q is an orthogonal matrix, R is upper triangular. (Again, I'll ignore any pivoting considerations.) Our problem is now to solve

  Q*R*C = Y

Since Q is orthogonal, then we can left multiply by the transpose of Q, to reduce the problem to

  R*C = Q' * Y

Remember, Q was orthogonal, so that (Q' * Q) was an identity matrix.

We now solve for C as

  C = R \ (Q' * Y)

Backslash is smart here. It knows that R is upper triangular, so backslash will do the solution as a back-substitution, an O(n^2) operation. No explicit inverse is required at all.

Again, backslash does all of this internally. But it is a far better solution than the normal equations. What happens in the normal equations to make that solution poor? The problem is the condition number of the matrix. If your matrix is at all poorly conditioned (something that is very common in multiple regression problems) then when you form (X'*X), the condition number of the matrix will be SQUARED. So the normal equations are a terribly poor way to solve the system posed.

We can get the condition number of a matrix from the cond function. It is defined as the ratio of the largest to the smallest singular value of the matrix. When this number is large, then you should expect to loose accuracy in the solution of your equations. Look at a simple, random matrix. (By the way, the condition number will be small here. But most regression problems have a much larger condition number.)

X = rand(10,5);
svd(X)
ans =
          3.70360600537412
          1.16305761722647
         0.819599668934314
          0.77149157109882
         0.359840732709897

See that the largest and smallest singular values would have a ratio of a wee bit over 10.

cond(X)
ans =
          10.2923478881418

What happens when I form X'*X?

cond(X'*X)
ans =
          105.932425050538

Yes. As expected, the condition number was squared. Try this again with a more typical regression problem. Here, just a simple 10th order polynomial in one variable.

x = rand(20,1);
X = bsxfun(@power,x,[10 9 8 7 6 5 4 3 2 1 0]);

cond(X)
ans =
          32218698.6067888

cond(X'*X)
ans =
      1.03682639151296e+15

See that the condition number of (X' * X) is roughly 1e15. In fact, if we use the normal equations to solve this problem, the coefficients will have few correct digits in them, because (X' * X) is nearly a numerically singular matrix.

Now try an 11th order polynomial. Make some random data too. Since the actual numbers don't matter, just see if there is a difference in the solution.

X = bsxfun(@power,x,[11 10 9 8 7 6 5 4 3 2 1 0]);
Y = rand(20,1);

X\Y
ans =
          213864.116016404
         -1151194.24570325
          2686678.25926054
         -3564335.59383269
          2961378.38376311
         -1600304.51887482
          565978.359556758
          -128793.21499217
          18153.6837583098
         -1482.77560777251
          58.7870441102109
        0.0478989431356096

Now compare that to the normal equations. See that inv complains here, because the system is now effectively singular.

inv(X'*X)*X'*Y
Warning: Matrix is close to singular or badly scaled.
         Results may be inaccurate. RCOND = 8.102118e-18.
ans =
          223829.921190437
         -1205573.39200913
          2815035.41413421
         -3736079.41837729
          3104808.97033064
         -1677906.49043969
          593313.714534699
         -134931.578398688
          18988.6654796978
         -1545.18491246855
           60.905800953616
        0.0268631240254516

See that many of the resulting coefficients do not even agree in a single digit of the result. The problem is when we formed (X' * X).

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
statistics Tony Reina 22 Oct 2008 06:33:53
probability Tony Reina 22 Oct 2008 06:33:53
multiple Tony Reina 22 Oct 2008 06:33:53
linear Tony Reina 22 Oct 2008 06:33:53
regression Tony Reina 22 Oct 2008 06:33:53

Contact us at files@mathworks.com