File Exchange

## Total Least Squares Method

version 1.2.0.0 (14.8 KB) by Ivo Petras

### Ivo Petras (view profile)

Mathematical method known as total least squares or orthogonal regression or error-in-variables.

Updated 11 Apr 2013

We present a Matlab toolbox which can solve basic problems related to the Total Least Squares (TLS) method in the modeling. By illustrative examples we show how to use the TLS method for solution of:
- linear regression model
- nonlinear regression model
- fitting data in 3D space
- identification of dynamical system

This toolbox requires another two functions, which are already published in Matlab Central File Exchange. Those functions will be installed to computer via supporting install package 'requireFEXpackage' included in TLS package. For more details see ReadMe.txt file.

Authors: Ivo Petras, Dagmar Bednarova, Tomas Skovranek, and Igor Podlubny
(Technical University of Kosice, Slovakia)

Detailed description of the functions, examples and demos can be found at the link:

Ivo Petras and Dagmar Bednarova: Total Least Squares Approach to Modeling: A Matlab Toolbox, Acta Montanistica Slovaca, vol. 15, no. 2, 2010, pp. 158-170.
(http://actamont.tuke.sk/pdf/2010/n2/8petras.pdf)

### Cite As

Ivo Petras (2019). Total Least Squares Method (https://www.mathworks.com/matlabcentral/fileexchange/31109-total-least-squares-method), MATLAB Central File Exchange. Retrieved .

eashe

### eashe (view profile)

How do you incorporate the uncertainty in the data, which is the purpose of using an errors-in-variable framework?

James Whittaker

### James Whittaker (view profile)

Is there a limit of the number of parameters you can solve for in the nonlinear model? I have a 3 parameter problem you see and the example is in 2 parameters.

Thanks

Jiayue Wang

### Jiayue Wang (view profile)

Hi, Ivo. I have download your file but I don't know how to use it in Matlab. I use matlab2016b, but I can not use ADD-ON explorer to add it. Can you tell me how to use it?

jasongrig

### jasongrig (view profile)

Hi Ivo,

I re-downloaded your package and the fit is fine now. I think the problem was that the first time I took the files from a computer with a different OS and an older version of matlab.

Iason

jasongrig

### jasongrig (view profile)

Hi Ivo,

I downloaded the codes, ran the "demoNRM.m" and I get a black line that is shifted upwards compared to what I see in Fig. 2 of "Petráš, I., & Bednárová, D. (2010). Total least squares approach to modeling: a Matlab toolbox. Acta Montanistica Slovaca, 15(2), 158."
[ http://actamont.tuke.sk/pdf/2010/n2/8petras.pdf ]

Is there a bug?

Thanks,
Iason

jasongrig

### jasongrig (view profile)

*correction
"With my data, RMSE gives me on average 64% larger values than RMSTE"

jasongrig

### jasongrig (view profile)

Hi John,

thank you for sharing your work and knowledge.

i) you recommend using RMSTE and not RMSE and I agree. To calculate RMSTE I used Petras' function "dist_dsearch":

for v=1:length(Ydata)
point= Ydata(v);
[min_dist(v),~]=dist_dsearch(Yhat, point,'off');
end
RMSTE=sqrt( sum( min_dist.^2 )/length(Ydata) );

Is it correct? With my data, RMSE gives me on average 64% larger values than RMSE. It seemed reasonable to me.

ii) you mention "You can think of the singular values as measuring the size of the point cloud in each of two directions, along the regression line, and normal to that line.". I do not understand what you mean by "singular value". Is it the "min_dist" like above?

Thanks a lot,
Iason

Zhonghua Hu

### Zhonghua Hu (view profile)

Undefined function or variable "Yhat".

Error in ==> fit_2D_data at 49
alpha = atan(abs((Yhat-YData)./(Xhat-XData)));

for this problem, it was caused by the 'B' is -Inf,which means 'ky.*sx.*sy-sxy' is zero.
For example, x =[ 1,2,3], y=[1,2,1], run the fit_2D_data will encounter this problem.

Eugen

### Eugen (view profile)

For dynamic identification the modelstructure must agree with the canonical form (see equation 11 in the paper provided by the autor).

- dynamic identification only works with step(!) and impulsresponse(!) together with a provided model structure.

I hope I could help...

Greetings, Eugen

arnold

### arnold (view profile)

I can't seem to find a way to simply replace the fitting function by a custom function.

any hint?

thanks
Arnold

Eric

### Eric (view profile)

Arggh...I got the comment wrong:

Lines 41-42 should be:

l1=V(1,2);
l2=V(2,2);

This consistent with, for example, Algorithm 4.1 at http://www2.math.ethz.ch/education/bachelor/lectures/hs2014/other/linalg_INFK/svdneu.pdf and the Wikipedia TLS example code.

Eric

### Eric (view profile)

I believe the code in fit_2D_data_SVD.m is wrong.

Lines 41-42 are written:

l1=V(1,1);
l2=V(1,2);

and should be:

l1=V(2,1);
l2=V(2,2);

haifeng zhang

### haifeng zhang (view profile)

Extremely useful!

Chi-Fu

### Chi-Fu (view profile)

Hi John, very appreciate for your detailed explain and pointing out my misunderstanding of R^2 in regression. Your suggestions really helps me, thanks you very much.
Chi-Fu

John D'Errico

### John D'Errico (view profile)

I'm sorry, but you seem to think that even for a normal regression, that R^2 offers some "objective" measure of the regression, that it can offer a pass/fail rule for the regression. This is a classic misimpression I think.

There is no simple measure that I know of (including R^2) that will reliably tell you that a regression model is adequate for a general set of data. This will still be true for the TLS case. Nothing changes in that respect. That applies to a general model, and for a completely arbitrary set of data.

Nothing stops you from constructing some simple measure however, based on a subset of your many sets of data. For example, extract some subset of size N of those sets at random, and perform a careful analysis of the resulting models. Look at the residual plots for every one of those models. Compute various measures of fit (I'll offer a couple of example measures later.)

Choose a large enough subset of models such that there will be at least a few of the regression models that would be deemed inadequate. Now find a decision point for some parameter P, such that it reasonably predicts when the regression would have failed, in YOUR opinion, based on that subset of analyses.

Essentially I m suggesting that you choose some parameter P and a set point such that it offers a way to discriminate between "good" and "bad" models.

I can't tell you in advance that any given R^2 analogue, if it is greater than perhaps 0.975 (or any magic number) that it will offer the discriminant you desire. But the point is that you can choose some parameter, and essential calibrate things so that it might indicate when there MIGHT be a problem. Even so, any single number can be fooled by lack of fit masquerading as noise, but you seem insistent on having a magic measure.

So two such measures that would be appropriate for a TLS model might be:

1. Root Mean Square Total Error.

2. One minus the ratio of singular values. (Or maybe squared singular values. It is early in the morning for me, and my brain is a bit foggy now about which would be the proper analogue.)

So the first measure is simply an analogue of mean square error, so you compute the squared normal distance to the line for each point. Compute the sqrt of the mean of all of those squared distances, so RMSTE.

The second measure I listed is based on the idea of how the SVD works to generate a TLS model itself. You can think of the singular values as measuring the size of the point cloud in each of two directions, along the regression line, and normal to that line. So the ratio of the smaller singular value divided by the larger one will yield a number between 0 and 1. Subtract that from 1, and we get a number that will mimic R^2. A visual way of thinking about it is as if we are measuring the ratio of the axis lengths of an ellipse around your data. If that ellipse around your data formed a perfect circle, then this measure would return 0. If the data fell on a perfect straight line with no error, then this measure would return 1.

I'm now done consulting in the comments, with my apologies to the authors of this submission.

Chi-Fu

### Chi-Fu (view profile)

Hi John,
Thanks for letting me know to trust my eye more than a number. My problem is I have more than ten thousands curves and I want to rule out some data that fit really bad before I start to select data manually. So if there is any objective value that represent the goodness of fitting in TLS, it would be useful to me.
Best,
Chi-Fu

John D'Errico

### John D'Errico (view profile)

@Chi-Fu: The classic R^2 formula is not appropriate for TLS, since it fails to utilize the errors in both variables.

You COULD cobble up a scheme that mimics R^2, in the sense that as it approaches 1, the fit becomes perfect, and at zero the fit is random. However, IF you are a chronic user of R^2 such that you rely on it to know when your fit is adequate, any such measure will not satisfy your internal calibration for the resulting number. You will simply get a number out, but without any meaning behind that number. For example, is 0.9 a good enough value for R^2? For some systems, maybe any R^2 over 0.5 might be as good as we can expect. For others, any R^2 under 0.99 might reflect a worthless model.

Is the error in the model lack of fit? Or is it noise? Looking at the residuals can help you learn things like this, but a simple number will be completely fooled.

In the end, DON'T rely on a simple measure like R^2 anyway. It is a crutch. Instead, look at the fit. Look at the residuals. If the result seems adequate for your purposes, then it is. Your eyes and your brain will provide a far better measure of the fit than any single number.

Look at it another way. I (or any random person) might fit your set of data with a line. Without knowing your goals, without your knowledge of the process, your knowledge of the noise structure, I might deem the fit adequate. Hey, it might look like a reasonable fit to me! Only you know how good of a fit you need to obtain. WHAT ARE YOUR GOALS FOR THIS FIT? Only you can use your knowledge of the system to know if the fit is adequate.

Of course, if you have no knowledge of the process that generated your data, if you have no understanding of that system, if you are just pushing numbers through a formula to get a number out, then you are the wrong person to be doing the job! Don't use a number to provide a substitute for good judgment. If you lack that skill or any basis for judgment, then find someone who has the necessary knowledge.

So throw away the crutch of R^2. Please.

I'll get off my soapbox now, end my rant.

Chi-Fu

### Chi-Fu (view profile)

Great code! it works perfectly to me.
One little thing need help. After fitting, how to evaluate the goodness of fitting? I am dividing the err by number of data points (like MSE). I would like to calculate R-square value, can someone give me a hint how to do that? thanks in advance!

Scarlett

### Scarlett (view profile)

Fantastic code, thank you! I am running into a bug when I run it for my model. If you could help that would be amazing.

??? Subscripted assignment dimension mismatch.

Error in ==> numerFminS>calculation at 80
points(:,2)=y';

Error in ==> fminsearchbnd>intrafun at 259
fval =
feval(params.fun,reshape(xtrans,params.xsize),params.args{:});

Error in ==> fminsearch at 205
fv(:,1) = funfcn(x,varargin{:});

Error in ==> fminsearchbnd at 229
[xu,fval,exitflag,output] =
fminsearch(@intrafun,x0u,options,params);

Error in ==> numerFminS at 63
[a,fval,exitflag,output] =
fminsearchbnd(@calculation,a0,LBa,UBa,options);
%#ok<*ASGLU>

Error in ==> inFxnEst at 257
[err, fun] =
numerFminS(@expHandle,2,[0,0],[maxODabs+1,2],(0:length(A1(:,3))-1),A1(:,3));

The input parameters are the same size so I'm not sure what the error would be. Any help is greatly appreciated, thank you!

Alex Vaughan

### Alex Vaughan (view profile)

Lovely code. I found one bug, which I believe is the same as the one Martin points out; his fix didn't work for me, though. In numerFminS, the 'fun' parameter is never used, and the fit is always evaluated based on whatever is in model.m. This can be fixed as:

- 65: [yy]=model(xdata, a);
+ 65: [yy] = @(xdata,a)fun(xdata,a)

Following this, I can then call the function without using model.m as:

fun = @(a,x)(a(1) .* ( x.^a(2) ) ./ (x.^a(2) + a(3).^a(2))); %or whatever.
[ErrTLS,P_TLS] = numerFminS( fun , 3 , lowerBounds, upperBounds, x, y)

Cheers,
Alex

Martin

### Martin (view profile)

Hi Ivo!

I used numerFminS.m which works very well.

I just want to suggest a minor correction: When you replace 'model' with 'fun' in lines 65 and 77, you can make use of the 'fun' parameter and need no longer rely on the 'model.m'-file. I think that's what you intended anyway, when you created the 'fun' parameter.

Cheers,

Martin

trison

### trison (view profile)

Thank you very much for you sharing with your package on TLS.
I tested it, and found some mistakes on function [Err, P] = orm(X, Y, order).
They were listed the following:
line 21 kx=length(X); instead of kx=length(Xdata);
line 22 ky=length(Y); instead of ky=length(Ydata);
line 28 P = R/(Q'.*X.^order); When matlab run it, matlab always sent an Error message"
Matrix dimensions must agree".
And the followed line 29 30 may meet the same Error. But I can not get your mean and have not any advise.

Charles Nelatury

Works great!

Oleg Nazarevych

### Oleg Nazarevych (view profile)

??? Undefined function or variable "Yhat".

Error in ==> fit_2D_data at 49
alpha = atan(abs((Yhat-YData)./(Xhat-XData)));