Code covered by the BSD License

Highlights from Total Least Squares Method

4.66667
4.7 | 6 ratings Rate this file 56 Downloads (last 30 days) File Size: 14.8 KB File ID: #31109 Version: 1.2

Total Least Squares Method

Ivo Petras (view profile)

19 Apr 2011 (Updated )

Mathematical method known as total least squares or orthogonal regression or error-in-variables.

File Information
Description

We present a Matlab toolbox which can solve basic problems related to the Total Least Squares (TLS) method in the modeling. By illustrative examples we show how to use the TLS method for solution of:
- linear regression model
- nonlinear regression model
- fitting data in 3D space
- identification of dynamical system

This toolbox requires another two functions, which are already published in Matlab Central File Exchange. Those functions will be installed to computer via supporting install package 'requireFEXpackage' included in TLS package. For more details see ReadMe.txt file.

Authors: Ivo Petras, Dagmar Bednarova, Tomas Skovranek, and Igor Podlubny
(Technical University of Kosice, Slovakia)

Detailed description of the functions, examples and demos can be found at the link:

Ivo Petras and Dagmar Bednarova: Total Least Squares Approach to Modeling: A Matlab Toolbox, Acta Montanistica Slovaca, vol. 15, no. 2, 2010, pp. 158-170.
(http://actamont.tuke.sk/pdf/2010/n2/8petras.pdf)

Acknowledgements
Required Products Statistics and Machine Learning Toolbox
MATLAB release MATLAB 7.11 (R2010b)
21 Apr 2016 Zhonghua Hu

Zhonghua Hu (view profile)

Undefined function or variable "Yhat".

Error in ==> fit_2D_data at 49
alpha = atan(abs((Yhat-YData)./(Xhat-XData)));

for this problem, it was caused by the 'B' is -Inf,which means 'ky.*sx.*sy-sxy' is zero.
For example, x =[ 1,2,3], y=[1,2,1], run the fit_2D_data will encounter this problem.

28 Oct 2015 Eugen

Eugen (view profile)

For dynamic identification the modelstructure must agree with the canonical form (see equation 11 in the paper provided by the autor).

- dynamic identification only works with step(!) and impulsresponse(!) together with a provided model structure.

I hope I could help...

Greetings, Eugen

27 Sep 2015 arnold

arnold (view profile)

I can't seem to find a way to simply replace the fitting function by a custom function.

any hint?

thanks
Arnold

Comment only
14 Aug 2015 Eric

Eric (view profile)

Arggh...I got the comment wrong:

Lines 41-42 should be:

l1=V(1,2);
l2=V(2,2);

This consistent with, for example, Algorithm 4.1 at http://www2.math.ethz.ch/education/bachelor/lectures/hs2014/other/linalg_INFK/svdneu.pdf and the Wikipedia TLS example code.

Comment only
14 Aug 2015 Eric

Eric (view profile)

I believe the code in fit_2D_data_SVD.m is wrong.

Lines 41-42 are written:

l1=V(1,1);
l2=V(1,2);

and should be:

l1=V(2,1);
l2=V(2,2);

Comment only
17 Apr 2014 haifeng zhang

haifeng zhang (view profile)

Extremely useful!

15 Jan 2014 Chi-Fu

Chi-Fu (view profile)

Hi John, very appreciate for your detailed explain and pointing out my misunderstanding of R^2 in regression. Your suggestions really helps me, thanks you very much.
Chi-Fu

Comment only
15 Jan 2014 John D'Errico

John D'Errico (view profile)

I'm sorry, but you seem to think that even for a normal regression, that R^2 offers some "objective" measure of the regression, that it can offer a pass/fail rule for the regression. This is a classic misimpression I think.

There is no simple measure that I know of (including R^2) that will reliably tell you that a regression model is adequate for a general set of data. This will still be true for the TLS case. Nothing changes in that respect. That applies to a general model, and for a completely arbitrary set of data.

Nothing stops you from constructing some simple measure however, based on a subset of your many sets of data. For example, extract some subset of size N of those sets at random, and perform a careful analysis of the resulting models. Look at the residual plots for every one of those models. Compute various measures of fit (I'll offer a couple of example measures later.)

Choose a large enough subset of models such that there will be at least a few of the regression models that would be deemed inadequate. Now find a decision point for some parameter P, such that it reasonably predicts when the regression would have failed, in YOUR opinion, based on that subset of analyses.

Essentially I m suggesting that you choose some parameter P and a set point such that it offers a way to discriminate between "good" and "bad" models.

I can't tell you in advance that any given R^2 analogue, if it is greater than perhaps 0.975 (or any magic number) that it will offer the discriminant you desire. But the point is that you can choose some parameter, and essential calibrate things so that it might indicate when there MIGHT be a problem. Even so, any single number can be fooled by lack of fit masquerading as noise, but you seem insistent on having a magic measure.

So two such measures that would be appropriate for a TLS model might be:

1. Root Mean Square Total Error.

2. One minus the ratio of singular values. (Or maybe squared singular values. It is early in the morning for me, and my brain is a bit foggy now about which would be the proper analogue.)

So the first measure is simply an analogue of mean square error, so you compute the squared normal distance to the line for each point. Compute the sqrt of the mean of all of those squared distances, so RMSTE.

The second measure I listed is based on the idea of how the SVD works to generate a TLS model itself. You can think of the singular values as measuring the size of the point cloud in each of two directions, along the regression line, and normal to that line. So the ratio of the smaller singular value divided by the larger one will yield a number between 0 and 1. Subtract that from 1, and we get a number that will mimic R^2. A visual way of thinking about it is as if we are measuring the ratio of the axis lengths of an ellipse around your data. If that ellipse around your data formed a perfect circle, then this measure would return 0. If the data fell on a perfect straight line with no error, then this measure would return 1.

I'm now done consulting in the comments, with my apologies to the authors of this submission.

Comment only
14 Jan 2014 Chi-Fu

Chi-Fu (view profile)

Hi John,
Thanks for letting me know to trust my eye more than a number. My problem is I have more than ten thousands curves and I want to rule out some data that fit really bad before I start to select data manually. So if there is any objective value that represent the goodness of fitting in TLS, it would be useful to me.
Best,
Chi-Fu

Comment only
14 Jan 2014 John D'Errico

John D'Errico (view profile)

@Chi-Fu: The classic R^2 formula is not appropriate for TLS, since it fails to utilize the errors in both variables.

You COULD cobble up a scheme that mimics R^2, in the sense that as it approaches 1, the fit becomes perfect, and at zero the fit is random. However, IF you are a chronic user of R^2 such that you rely on it to know when your fit is adequate, any such measure will not satisfy your internal calibration for the resulting number. You will simply get a number out, but without any meaning behind that number. For example, is 0.9 a good enough value for R^2? For some systems, maybe any R^2 over 0.5 might be as good as we can expect. For others, any R^2 under 0.99 might reflect a worthless model.

Is the error in the model lack of fit? Or is it noise? Looking at the residuals can help you learn things like this, but a simple number will be completely fooled.

In the end, DON'T rely on a simple measure like R^2 anyway. It is a crutch. Instead, look at the fit. Look at the residuals. If the result seems adequate for your purposes, then it is. Your eyes and your brain will provide a far better measure of the fit than any single number.

Look at it another way. I (or any random person) might fit your set of data with a line. Without knowing your goals, without your knowledge of the process, your knowledge of the noise structure, I might deem the fit adequate. Hey, it might look like a reasonable fit to me! Only you know how good of a fit you need to obtain. WHAT ARE YOUR GOALS FOR THIS FIT? Only you can use your knowledge of the system to know if the fit is adequate.

Of course, if you have no knowledge of the process that generated your data, if you have no understanding of that system, if you are just pushing numbers through a formula to get a number out, then you are the wrong person to be doing the job! Don't use a number to provide a substitute for good judgment. If you lack that skill or any basis for judgment, then find someone who has the necessary knowledge.

So throw away the crutch of R^2. Please.

I'll get off my soapbox now, end my rant.

Comment only
14 Jan 2014 Chi-Fu

Chi-Fu (view profile)

Great code! it works perfectly to me.
One little thing need help. After fitting, how to evaluate the goodness of fitting? I am dividing the err by number of data points (like MSE). I would like to calculate R-square value, can someone give me a hint how to do that? thanks in advance!

22 May 2013 Scarlett

Scarlett (view profile)

Fantastic code, thank you! I am running into a bug when I run it for my model. If you could help that would be amazing.

??? Subscripted assignment dimension mismatch.

Error in ==> numerFminS>calculation at 80
points(:,2)=y';

Error in ==> fminsearchbnd>intrafun at 259
fval =
feval(params.fun,reshape(xtrans,params.xsize),params.args{:});

Error in ==> fminsearch at 205
fv(:,1) = funfcn(x,varargin{:});

Error in ==> fminsearchbnd at 229
[xu,fval,exitflag,output] =
fminsearch(@intrafun,x0u,options,params);

Error in ==> numerFminS at 63
[a,fval,exitflag,output] =
fminsearchbnd(@calculation,a0,LBa,UBa,options);
%#ok<*ASGLU>

Error in ==> inFxnEst at 257
[err, fun] =
numerFminS(@expHandle,2,[0,0],[maxODabs+1,2],(0:length(A1(:,3))-1),A1(:,3));

The input parameters are the same size so I'm not sure what the error would be. Any help is greatly appreciated, thank you!

Comment only
25 Mar 2012 Alex Vaughan

Alex Vaughan (view profile)

Lovely code. I found one bug, which I believe is the same as the one Martin points out; his fix didn't work for me, though. In numerFminS, the 'fun' parameter is never used, and the fit is always evaluated based on whatever is in model.m. This can be fixed as:

- 65: [yy]=model(xdata, a);
+ 65: [yy] = @(xdata,a)fun(xdata,a)

Following this, I can then call the function without using model.m as:

fun = @(a,x)(a(1) .* ( x.^a(2) ) ./ (x.^a(2) + a(3).^a(2))); %or whatever.
[ErrTLS,P_TLS] = numerFminS( fun , 3 , lowerBounds, upperBounds, x, y)

Cheers,
Alex

10 Jan 2012 Martin

Martin (view profile)

Hi Ivo!

I used numerFminS.m which works very well.

I just want to suggest a minor correction: When you replace 'model' with 'fun' in lines 65 and 77, you can make use of the 'fun' parameter and need no longer rely on the 'model.m'-file. I think that's what you intended anyway, when you created the 'fun' parameter.

Cheers,

Martin

Comment only
08 Oct 2011 trison

trison (view profile)

Thank you very much for you sharing with your package on TLS.
I tested it, and found some mistakes on function [Err, P] = orm(X, Y, order).
They were listed the following:
line 21 kx=length(X); instead of kx=length(Xdata);
line 22 ky=length(Y); instead of ky=length(Ydata);
line 28 P = R/(Q'.*X.^order); When matlab run it, matlab always sent an Error message"
Matrix dimensions must agree".
And the followed line 29 30 may meet the same Error. But I can not get your mean and have not any advise.

Comment only
27 Aug 2011 Charles Nelatury

Charles Nelatury (view profile)

Works great!

10 Jun 2011 Oleg Nazarevych

Oleg Nazarevych (view profile)

??? Undefined function or variable "Yhat".

Error in ==> fit_2D_data at 49
alpha = atan(abs((Yhat-YData)./(Xhat-XData)));

Comment only
09 Oct 2011 1.1

Updated function orm(). Fixed error discovered by 'Trison'.

11 Apr 2013 1.2

Fixed bug discovered by Martin and Alex.