See that the normal equations are used, as well as (shudder) inv.
This is the WRONG way to solve this problem. It is a mistake that many novices to regression make. It is a mistake that many people who have done regression for years make, simply because they have never been told they are doing it the wrong way. It is a mistake that many authors of statistics texts make, simply because nobody has ever told them either.
The normal equations, are
C = inv(X' * X) * X' * Y
are ONE way to solve the over-determined problem
X*C = Y
They are a POOR solution. First of all, this formula uses inv. Somewhat better would be to write it as
C = (X' * X) \ (X' * Y)
The use of backslash here avoids the use of inv. But it still fails because the normal equations are still involved, just in another form.
Look at a better solution.
C = X \ Y;
This uses the built-in backslash to solve the problem. How would backslash solve it internally? (I'll assume that no pivoting is done to make things simple.) Suppose we choose to form a QR factorization of X.
X = Q*R
Here, Q is an orthogonal matrix, R is upper triangular. (Again, I'll ignore any pivoting considerations.) Our problem is now to solve
Q*R*C = Y
Since Q is orthogonal, then we can left multiply by the transpose of Q, to reduce the problem to
R*C = Q' * Y
Remember, Q was orthogonal, so that (Q' * Q) was an identity matrix.
We now solve for C as
C = R \ (Q' * Y)
Backslash is smart here. It knows that R is upper triangular, so backslash will do the solution as a back-substitution, an O(n^2) operation. No explicit inverse is required at all.
Again, backslash does all of this internally. But it is a far better solution than the normal equations. What happens in the normal equations to make that solution poor? The problem is the condition number of the matrix. If your matrix is at all poorly conditioned (something that is very common in multiple regression problems) then when you form (X'*X), the condition number of the matrix will be SQUARED. So the normal equations are a terribly poor way to solve the system posed.
We can get the condition number of a matrix from the cond function. It is defined as the ratio of the largest to the smallest singular value of the matrix. When this number is large, then you should expect to loose accuracy in the solution of your equations. Look at a simple, random matrix. (By the way, the condition number will be small here. But most regression problems have a much larger condition number.)
X = rand(10,5);
svd(X)
ans =
3.70360600537412
1.16305761722647
0.819599668934314
0.77149157109882
0.359840732709897
See that the largest and smallest singular values would have a ratio of a wee bit over 10.
cond(X)
ans =
10.2923478881418
What happens when I form X'*X?
cond(X'*X)
ans =
105.932425050538
Yes. As expected, the condition number was squared. Try this again with a more typical regression problem. Here, just a simple 10th order polynomial in one variable.
x = rand(20,1);
X = bsxfun(@power,x,[10 9 8 7 6 5 4 3 2 1 0]);
cond(X)
ans =
32218698.6067888
cond(X'*X)
ans =
1.03682639151296e+15
See that the condition number of (X' * X) is roughly 1e15. In fact, if we use the normal equations to solve this problem, the coefficients will have few correct digits in them, because (X' * X) is nearly a numerically singular matrix.
Now try an 11th order polynomial. Make some random data too. Since the actual numbers don't matter, just see if there is a difference in the solution.
X = bsxfun(@power,x,[11 10 9 8 7 6 5 4 3 2 1 0]);
Y = rand(20,1);
Now compare that to the normal equations. See that inv complains here, because the system is now effectively singular.
inv(X'*X)*X'*Y
Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate. RCOND = 8.102118e-18.
ans =
223829.921190437
-1205573.39200913
2815035.41413421
-3736079.41837729
3104808.97033064
-1677906.49043969
593313.714534699
-134931.578398688
18988.6654796978
-1545.18491246855
60.905800953616
0.0268631240254516
See that many of the resulting coefficients do not even agree in a single digit of the result. The problem is when we formed (X' * X).