Given a set of x,y,z coordinates, find the best planar fit to the points via a least squares regression. Testing script "t_fitNormal" is also included to verify functionality and demonstrate usage.
Thank you for your quick reply. Now I can understand your logic.
By the way, Niloy Mitra used the third principal component of point clouds as the surface normal. The logic is that the first two represent the biggest variation which is on the plane. and the third one is the surface normal.
But I doubt that the robustness check for determinant is necessary. I used this before, due to the precision in Maltab, it will be very rare to have exact 'zero'. I also use some range to define 'zero', but it is data sensitive because that would be related the scaling (range) of data.
Yeah, with low noise, any method works reasonably well. With higher noise, the points on the high end that have error that brings them in have less leverage than those points that get moved out. This tends to introduce a bias.
Of course, it is your choice which method you choose. Definitely use backslash for any linear regressions instead of the normal equations and avoid det, and I will be happier.
You might offer both methods as an option, because whenever I make a choice that limits my code, I always find someone who would have preferred I made the alternative choice.
Anyway, I'm always willing to review submissions when they have been improved.
Agreed, additional context for error handling is beneficial. I'll add that in.
Thanks for the key word suggestions.
SVD vs linear regression:
Thanks for the high noise example illustrating the bias. The comparisons I run previously had relatively low noise ~ 0.1%. I used the following to compare results between SVD and linear regression:
function test_normals(N)
N = N/norm(N);
data = randn(1000,3);
data = data - (data*N)*N';
data = data + randn(size(data))/2;
%plot3(data(:,1),data(:,2),data(:,3),'o')
%input("Press Enter")
end
for i = 1:10
N = [1 1 1]';
% or
N = rand(3,1);
% or
N = [0 0 1]';
test_normals(N)
end
For the worst case scenario for linear regressions N=[1 1 1]', SVD shows a significant improvement with ~ 1 degree error vs ~ 16 degree error for the regression. On the other hand, the best case scenario for regressions such as N=[0 0 1]' the regression techniques shows a modest improvement in accuracy over SVD ( ~1.1 degree error for regression vs ~1.3 degree error for svd )
Going back to the main reason I wrote this module. In the spectrum of brute force to elegant techniques SVD is clearly a more elegant and robust technique. For students and programmers that are early on in their mathematics or have been a while since they have broken out their TI's a more rudimentary technique using simple matrix operations may be easier to comprehend. I'm planning on incorporating many of the notes you suggested, but will stick with the regression method as I believe code sample should be available for multiple points along the spectrum of mathematics. I do like the idea of taking the opportunity to reference SVD and it's advantages over regression and will be adding notes to the description and comments to that point.
The logic behind better error checks is that when it does throw an error, you can tell the user explicitly what you expected, and why their call failed. It seems a more friendly way to handle it, rather than some obscure error that they will then need to figure out.
The H1 line is not too bad as it is. An H1 line is a big help to help someone find your code when they cannot remember the name or where they put it a few months after downloading it. So some other words I might have considered are: regression, least squares, total least squares. The last one only if you use the SVD approach I suggested.
Really, the issues I have are the code. DET is just a bad thing to use. It is never a better choice than rank or cond to determine the singularity status of a matrix. The fact is, it is never even a remotely good choice.
But the fact, is, you never need to compute the determinant, or even to use rank or cond! This is because SVD does not care. But first, let me expand on the issues here.
One of the problems with errors in variables estimates when you do a basic fit with a simple linear regression is that they are biased estimators. For example, consider the simple linear regression estimator to fit a straight line to some data, where the data has errors in both variables.
x = randn(100,1);
y = 2*x + 3;
Now, I'll add noise to both variables.
x = x + randn(size(x))/2;
y = y + randn(size(x))/2;
What happens when I do a simple linear regression fit to this data? I'll use the model
y = a*x + b + error
So now a simple linear regression fit:
ab = [x,ones(size(x))]\y
ab =
1.59690079069019
3.01312808997413
See that the larger the noise is, the more bias we will see. The slope will be biased low. A way to understand why this happens is because the points in x that are near the ends of the data have more leverage than the points near the center. So imagine one data point near the top end of the set. If that point has a positive error in x, then it has MORE leverage than it deserves. It tends to drag the slope down. Likewise, the points near the bottom end with negative errors in x also tend to bias the slope to be lower than it should have been.
So when you fit regression models to data where the errors are in the independent variables, you tend to get biased results.
mu = [mean(x),mean(y)];
[U,S,V] = svd([x-mu(1),y-mu(2)]);
If we scale it so that the coefficient of y was 1 to see how we did compared to the biased estimator, we did quite nicely:
y = 2.13417*x + 3.04655
The error in the slope term is considerably lower than it was before.
You also asked about a case where singularity becomes important. One of the things that people do not appreciate about the normal equations (the scheme you used) is they are only an issue when the problem is ill-conditioned. Here ill-conditioning will occur when the points all fall VERY close to a straight line, and you are trying to fit a plane to that line. Clearly the plane will be poorly defined. But use of the normal equations will be worse.
The example I showed in my last response was not designed to show any difference. And, the fact is, it takes a some effort to make double precision fail. Here is an example of how forming X'*X causes issues though
x = linspace(0,1,100)';
xyz = [x,x,x] + randn(100,3)/10000;
cond(xyz)
ans =
10944.616334719
cond(xyz'*xyz)
ans =
119784627.10645
What does the condition number tell us? One thing it tells us is that any noise in the data when you solve a linear system may be essentially amplified by that condition number. So a condition number of 1e8 is far more a problem than is 1e4. The time to seriously worry is when you start seeing condition numbers that verge on 1/eps.
You may think that nobody will ever throw data that bad at your code, but believe me, I've seen some incredibly bad sets of data passed into my various codes on the file exchange. People don't realize how degenerate their data is. Hey, computers can solve anything, right?
I'll stop here for now, because my guess is I will overload the site limits on the size of my response if I go any further.
Thanks for taking the time to put together such detailed notes. It seems like there were 5 main points.
Summary
------------------------------------------------------
1) H1 line
* good, but would prefer keywords
2) Error checking
* data array should be checked for Nx3
* show_graph should be checked for boolean type
3) In code comments and variable names
* approved
4) Check for determinant = 0
* Useless with floating point error
* should use cond() or rank()
5) Algorithm
* A/b is better than x = inv(A'*A)*A'*b
* SVD is the way it should be done
------------------------------------------------------
1) H1 line
Thank you. I'll look into adding some keywords. Any suggestions?
2) Error checking
I'll admit, allowing the code to throw an exception and expect it to be handled by the calling function is a personal preference, but I can see the merit in explicitly checking prior to use.
3) In code comments and variable names
Thanks again
4) Check for determinant = 0
I agree that this check is largely gratuitous as float values and noise will prevent the determinant from reaching zero. This check does prevent against errors in some edge cases though such as (if I can borrow from your sample)
N = [2 0 0]';
N = N/norm(N);
data = randn(10,3) * 1000;
data = data - (data*N)*N';
where there is no variation along the x axis and the determinant does evaluate to zero exactly.
I do like your suggestion of cond or rank vs det and will incorporate the change.
5) Algorithm
A cursory skim of your comment gave the impression that fitNormal was returning incorrect results. I compared results between the fitNormal and the SVD method you provided (below) and the returned vectors were equivalent.
mu = mean(data,1);
[U,S,V] = svd(bsxfun(@minus,data,mu));
normal_vector = V(:,3)
Do you have a sample data set where the results do not agree?
I believe what your main point is that the SVD method is a far more efficient method for calculating this this fit. To that, I wholeheartedly agree. The reason I stuck with the regression method, although less efficient, is that it only requires basic knowledge of matrix algebra (product, inverse, transpose, ...) that could be verified by hand by high school algebra students. Since I believe this module to be mainly used in exploration of data rather than in high performance situations I erred on the side of a less efficient method to allow a wider audience of coders that took the time to read the source to be able to verify the results themselves.
With that said, I am always in favor of exposing people to more sophisticated and rigorous mathematical techniques. I'll modify the comments in the header to clarify the circuitous approach used and refer them to the SVD method if greater efficiency is required.
Better is to use tools like rank or cond to tell if a matrix is numerically singular. LEARN TO USE THEM! Learn about the condition number of a matrix. Small numbers are good here. Bad condition numbers in double precision will be on the order of 1e15 or larger.
In this case, see that scaling the data array does not change the condition number at all.
cond(data)
ans =
3.9491
cond(data*10000)
ans =
3.9491
Next, Dan uses what are known as the normal equations, which try to solve the problem
A*x = b
using the form
x = inv(A'*A)*A'*b
This form squares the condition number of A. So if A was nearly singular before we started, then A'*A is definitely singular. It is simply bad linear algebra. Again, yeah, I know your teacher or mentor taught you that. I had people teach me that trick too, 35 years ago. I'm sorry, but it is the wrong thing to do. Use backslash here:
x = A\b;
The backslash operator uses good numerical methods to solve the system, while never explicitly squaring the condition number.
Regardless, it is STILL wrong to do that in the first place, because the approach of solving three different problems is a bad one. Instead, use SVD. The general idea is that IF our data falls on a plane in 3 dimensions, then the SVD will tell us that, and will do it intelligently. In fact, SVD will give us the plane that fits the data that minimizes the total sum of squares, using a nice trick.
Lets first create some fake data that falls on a plane, adding some noise to it. Here the expected normal vector will be [1;2;3]. Of course, I can scale a normal vector by any arbitrary non-zero factor.
N = [1;2;3];
N = N/norm(N);
data = randn(10,3);
data = data - (data*N)*N';
data = data + randn(size(data))/100;
You can even plot the data, then rotate it around to see that it falls roughly on a plane.
plot3(data(:,1),data(:,2),data(:,3),'o')
The added noise is pretty small. Now lets do the fit.
Subtract off the mean of our data. This centers the problem, so that the plane passes through that mean point. This will always be our goal for any orthogonal regression.
mu = mean(data,1);
Next, compute the SVD. We want to use the third singular vector to give us the normal to the plane. This is the vector in the direction of least variability. So we can think of it in terms of an ellipsoid that would contain the data. The ellipsoid will have a very small axis length in the direction perpendicular to the plane of interest.
A simple test will show that we did indeed recover the original normal vector to within some noise.
normal_vector/normal_vector(1)
ans =
1
2.0302
3.0751
We can also look at the singular values of the mean subtracted data matrix, to see that the third singular value is significantly smaller than those before. This is a good thing if the data really does fall nearly on a plane.
diag(S)
ans =
4.1098
1.6883
0.019637
So the fitted plane passes through the point mu in 3-d. It has the given normal vector to the plane. And it took me about 3 lines of code to compute, though a bit more in terms of comments.
We can write the fitted plane in a simple form. A plane equation is generally (in any number of dimensions)
dot(X - P,N) = 0
Here P is a vector of length 3 that defines a point on the plane, and N is the normal vector. We can expand that to be
dot(X,N) = dot(P,N)
In our case, the point on the plane is mu, and the computed normal vector is normal_vector.
format long g
normal_vector
normal_vector =
-0.261906153571535
-0.531730910491868
-0.805398910819262
dot(mu,normal_vector)
ans =
-0.00553142802240343
So the fitted plane equation, created by a numerically valid method is:
-0.26191*x - 0.53173*y - 0.8054*z = -0.0055314
(Be careful to not use such truncated values in real life. But for display purposes, short numbers are just easier to read.)
In the end, I would give the code 5 stars for the help.
I would give it about 3.5 stars because it has internal comments and at least a default value for the second parameter, although no error checks. Error checks on the inputs make for friendly code.
However, and this is important, I would give it only 1 star for the code itself, and in my opinion, that is a terribly important factor. So replace much of the code in fitNormal with a just few lines, and add some error checks, and my rating would go up to 5 stars.
I'm sorry, but this submission has some serious problems. The replacement code is a simple one too, that would take virtually only a couple of lines of code to fit a plane, and would do it in far better numerical fashion.
So sit down and relax as I go into some lengthy discourse about just what is bad about this. I'll start out and say what I like, because there are some good things done here.
I like the help. It has an H1 line, that first line that enables lookfor, with a few keywords, and describes the code. I might wish that line included a couple more good keywords, but I won't complain.
The help is quite nice. It tells me what the parameters are, what shape they should be, what I need to pass in. It tells me the second parameter is optional.
I'd like to see more complete error checks. For example, it should test that the data array is an Nx3 array. And since the second parameter is indicated as a logical flag, it should test that it really is a scalar value. It might even check for 0 or 1.
I am happy to see comments internal to the code, and intelligently chosen variable names. These things make your code easier to read and understand.
So what is it that I don't like? The code, and its basis on some numerical methods that are just plain poor. The author has made an effort here, but sadly, is using some terribly poor algorithms. So, what is he doing? Essentially, he fits three models to the data, apparently trying to fit planes of the form z(x,y), x(y,z), and y(x,z) using a poor choice of regression technique.
The above general idea is itself a poor one, and does not compute the plane which minimizes a total sum of squares. I'll give those equations later. But first, he uses the determinant, testing det(X'*X)==0 to see if a matrix is singular.
Sorry, but this is just BAD numerical methods. Yeah, I know that he read it somewhere. Maybe even his teacher told him that, or a mentor of his. The determinant is just a bad way to test for singularity. Worse, once you form X'*X, you square the condition number of the matrix, so testing for singularity that way is still poor thing. I am sorry, but people read these things in old textbooks, written by authors who don't know any better. Or they learn it from a mentor who sadly knows no better either. In any case, it is simply poor numerics. You can get ANY value for a determinant that you want, simply by arbitrarily scaling your matrix.
For example, suppose I start with a simple, 3x3 matrix, that has a fairly reasonable determinant.
XPX = (100*data)'*(100*data);
det(XPX)
ans =
4.0564e+12
XPX = (0.01*data)'*(0.01*data);
det(XPX)
ans =
4.0564e-12
So without changing anything significant about the singularity of the matrix, I can make the determinant as large or small as I want it. And all that was with data that really will never generate a matrix that is close to singular!
In fact, you will almost never see an exactly zero determinant given data that is in floating point, unless the data is composed of only small integers. So testing for ANY such computed value to be exactly zero is just a bad thing to do in MATLAB. You need to use a tolerance. And since I just showed that an arbitrary scaling can make that determinant as small as I want, using the determinant is just a bad thing here.
(I've stopped here, as such a long comment seems to hang the site. I'll add second comment to finish up.)
That first `for i = 1:3` section generates the solutions for the least squares regression via matrix algebra for the three equations of x=f(y,z)+c, y=f(x,z)+c, and z=f(x,y)+c.
The standard way the equation is written is:
b =(X'X)-1 X'y
where X is a (n by m) matrix of input values and y is a (n by 1) vector.
Since the inverse of a matrix with a determinant of zero doesn't exist, checking for `det(X_m)==0` prevents an error being thrown at `coeff = (X_m)^-1 * X' * data(:,i);`
You are correct the regression generates a normal vector that describes the plane. Each of these vectors (for i=1,2,3) is later used to quantify the residuals from the data points and determine which is the best fit.
Hi Dan,
I think you replace each column and try to solve a least square problem from x, y, z respectively and select a best one?
But I am still not sure about the equation
Sorry I am not very good at Maths. But could you explain to me that why you replace each dimension by 1 and calculate the determinant of (X'*X) to see whether or not it is solvable?
And the 'coeff' seems like a surface normal but what does the equation mean there?
for i = 1:3
X = data;
X(:,i) = 1;
X_m = X' * X;
if det(X_m) == 0
can_solve(i) = 0;
continue
end
can_solve(i) = 1;
% Construct and normalize the normal vector
coeff = (X_m)^-1 * X' * data(:,i);
c_neg = -coeff;
c_neg(i) = 1;
coeff(i) = 1;
n(:,i) = c_neg / norm(coeff);
No Problem. Thanks for the feedback. A new version was just submitted and should be accepted soon. If you need to you can also pull it from git https://github.com/MathYourLife/Matlab-Tools
Thank for the feedback Amir. It's likely that your data doesn't fit in the window set by the code. It's hard coded now for [-1, 1] as you can see on this line.
I'll see if I can get that updated so that it jumps to wherever you data set is. I'll probably keep the range of all the axes equal though or it will make the plot appear as though the solution does not come out to be normal to the plane.
Thanks nicely does the job, I checked the normal on my data and it is correct. I just got an empty axis for plot feature which I do not have any idea why! Any thing to take care of, that I missed?
Thanks for the question. Could you elaborate a little on the angle you are measuring? Is it between the normal and one of the coordinate axes? or the angle between multiple normal vectors? Since the planes being fit have two normal vectors the function will produce only one of them. If possible, could you provide a sample data set that you would expect to give you an angle greater than 135 degrees?
This utility uses a least squares regression in which the formula minimizes the sum of the squares of the residuals. The residual is the difference between what the model expects and the actual measured value. So in most cases this would be the difference between the predicted z value and the actual z value.
That is all fine and dandy until your data starts to have a steep slope. In the worse case, for a plane with normal in the x-y plane the residuals of z_meas - z_exp trend to +-infinity.
The way the script handles this issue is to perform the regression three times. Each time using residuals measured in the x, y, and z directions (You'll see those as the for i=1:3 loops in the source). After the residuals are performed, if we run into a case similar to the one just described the z residual value will be very large and therefore not trustworthy.
The decision on which residual to use is made on this line.
best_fit = find(residual_sum == min(residual_sum));