Rank: 3195 based on 11 downloads (last 30 days) and 2 files submitted
photo

Tony Reina

E-mail

Personal Profile:
Professional Interests:

 

Watch this Author's files

 

Files Posted by Tony View all
Updated   File Tags Downloads
(last 30 days)
Comments Rating
05 Nov 2001 calculate_eccentricity.m Calculates the eccentricity of an ellipse Author: Tony Reina approximation, interpolation, eccentricity, ellipse, calculate, tilt 3 2
  • 4.0
4.0 | 2 ratings
25 Jan 1999 mregress Performs multiple linear regression. Author: Tony Reina statistics, probability, multiple, linear, regression 8 14
  • 4.0
4.0 | 13 ratings
Comments and Ratings on Tony's Files View all
Updated File Comment by Comments Rating
14 Jun 2009 mregress Performs multiple linear regression. Author: Tony Reina D'Errico, John

Ok. I had to look here. The problem is, the formulas are poor. Her is a paste of part of the code:

===========================================

% Solve for the regression coefficients using ordinary least-squares
% ------------------------------------------------------------------

   if (INTCPT == 1)
     X = [ ones(n,1) X ] ;
   end
 
    XTXI = inv(X' * X);
    Coefficients = XTXI * X' * Y ;

===========================================

See that the normal equations are used, as well as (shudder) inv.

This is the WRONG way to solve this problem. It is a mistake that many novices to regression make. It is a mistake that many people who have done regression for years make, simply because they have never been told they are doing it the wrong way. It is a mistake that many authors of statistics texts make, simply because nobody has ever told them either.

The normal equations, are

   C = inv(X' * X) * X' * Y

are ONE way to solve the over-determined problem

   X*C = Y

They are a POOR solution. First of all, this formula uses inv. Somewhat better would be to write it as

   C = (X' * X) \ (X' * Y)

The use of backslash here avoids the use of inv. But it still fails because the normal equations are still involved, just in another form.

Look at a better solution.

   C = X \ Y;

This uses the built-in backslash to solve the problem. How would backslash solve it internally? (I'll assume that no pivoting is done to make things simple.) Suppose we choose to form a QR factorization of X.

   X = Q*R

Here, Q is an orthogonal matrix, R is upper triangular. (Again, I'll ignore any pivoting considerations.) Our problem is now to solve

  Q*R*C = Y

Since Q is orthogonal, then we can left multiply by the transpose of Q, to reduce the problem to

  R*C = Q' * Y

Remember, Q was orthogonal, so that (Q' * Q) was an identity matrix.

We now solve for C as

  C = R \ (Q' * Y)

Backslash is smart here. It knows that R is upper triangular, so backslash will do the solution as a back-substitution, an O(n^2) operation. No explicit inverse is required at all.

Again, backslash does all of this internally. But it is a far better solution than the normal equations. What happens in the normal equations to make that solution poor? The problem is the condition number of the matrix. If your matrix is at all poorly conditioned (something that is very common in multiple regression problems) then when you form (X'*X), the condition number of the matrix will be SQUARED. So the normal equations are a terribly poor way to solve the system posed.

We can get the condition number of a matrix from the cond function. It is defined as the ratio of the largest to the smallest singular value of the matrix. When this number is large, then you should expect to loose accuracy in the solution of your equations. Look at a simple, random matrix. (By the way, the condition number will be small here. But most regression problems have a much larger condition number.)

X = rand(10,5);
svd(X)
ans =
          3.70360600537412
          1.16305761722647
         0.819599668934314
          0.77149157109882
         0.359840732709897

See that the largest and smallest singular values would have a ratio of a wee bit over 10.

cond(X)
ans =
          10.2923478881418

What happens when I form X'*X?

cond(X'*X)
ans =
          105.932425050538

Yes. As expected, the condition number was squared. Try this again with a more typical regression problem. Here, just a simple 10th order polynomial in one variable.

x = rand(20,1);
X = bsxfun(@power,x,[10 9 8 7 6 5 4 3 2 1 0]);

cond(X)
ans =
          32218698.6067888

cond(X'*X)
ans =
      1.03682639151296e+15

See that the condition number of (X' * X) is roughly 1e15. In fact, if we use the normal equations to solve this problem, the coefficients will have few correct digits in them, because (X' * X) is nearly a numerically singular matrix.

Now try an 11th order polynomial. Make some random data too. Since the actual numbers don't matter, just see if there is a difference in the solution.

X = bsxfun(@power,x,[11 10 9 8 7 6 5 4 3 2 1 0]);
Y = rand(20,1);

X\Y
ans =
          213864.116016404
         -1151194.24570325
          2686678.25926054
         -3564335.59383269
          2961378.38376311
         -1600304.51887482
          565978.359556758
          -128793.21499217
          18153.6837583098
         -1482.77560777251
          58.7870441102109
        0.0478989431356096

Now compare that to the normal equations. See that inv complains here, because the system is now effectively singular.

inv(X'*X)*X'*Y
Warning: Matrix is close to singular or badly scaled.
         Results may be inaccurate. RCOND = 8.102118e-18.
ans =
          223829.921190437
         -1205573.39200913
          2815035.41413421
         -3736079.41837729
          3104808.97033064
         -1677906.49043969
          593313.714534699
         -134931.578398688
          18988.6654796978
         -1545.18491246855
           60.905800953616
        0.0268631240254516

See that many of the resulting coefficients do not even agree in a single digit of the result. The problem is when we formed (X' * X).

14 Jun 2009 mregress Performs multiple linear regression. Author: Tony Reina Ali, Muhammad Tauha

Thanks alot for the formulae

15 May 2008 mregress Performs multiple linear regression. Author: Tony Reina A., Mustafa

Perfect Thanks alot

10 Apr 2008 mregress Performs multiple linear regression. Author: Tony Reina Beery, Matan

Saved me a lot of time. Thank you so much!

28 Feb 2008 mregress Performs multiple linear regression. Author: Tony Reina Jiang, LQ

Great code. Thank you so much!

Top Tags Applied by Tony
approximation, calculate, eccentricity, ellipse, interpolation
Files Tagged by Tony View all
Updated   File Tags Downloads
(last 30 days)
Comments Rating
05 Nov 2001 calculate_eccentricity.m Calculates the eccentricity of an ellipse Author: Tony Reina approximation, interpolation, eccentricity, ellipse, calculate, tilt 3 2
  • 4.0
4.0 | 2 ratings
25 Jan 1999 mregress Performs multiple linear regression. Author: Tony Reina statistics, probability, multiple, linear, regression 8 14
  • 4.0
4.0 | 13 ratings

Contact us at files@mathworks.com