Non-constant statistical weights in non-linear regression

Hello everyone, hope this message finds you and your families safe and healthy.
I am trying to perform weighted non-linear regression, where the weight factors are not contant. To clarify everything consider the following example:
exp_data = importdata('data.txt');
xdata = exp_data(:,1);
ydata = [exp_data(:,2),exp_data(:,3)];
x0 = [1 1 1 1];
fitfcn = @(c,xdata)test_function(c,xdata);
[lb] = [1 1 1 1]; ub = [10 10 10 10];
problem = createOptimProblem('lsqcurvefit','x0',x0,'objective',fitfcn,...
'lb',lb,'ub',ub,'xdata',xdata,'ydata',ydata);
ms = MultiStart('PlotFcns',@gsplotbestf);
[xmulti,errormulti] = run(ms,problem,500)
Normally, when a constant weight, w, is used one can simply multiply ydata and the expression in the function handle with w:
exp_data = importdata('data.txt');
xdata = exp_data(:,1);
ydata = [exp_data(:,2),exp_data(:,3)];
w = 1./sqrt(((ydata(:,1).^2)+(ydata(:,2).^2)));
ydata2 = ydata.*w;
x0 = [1 1 1 1];
fitfcn = @(c,xdata)test_function(c,xdata).*w;
[lb] = [1 1 1 1]; ub = [10 10 10 10];
problem = createOptimProblem('lsqcurvefit','x0',x0,'objective',fitfcn,...
'lb',lb,'ub',ub,'xdata',xdata,'ydata',ydata2);
ms = MultiStart('PlotFcns',@gsplotbestf);
[xmulti,errormulti] = run(ms,problem,500)
In my case I have two options. Either calculating w by using the experimental (imported) values or the calculated ones. The first case is straightforward (since the weight vector is defined and constant) and can be solved as above. However, due to the existence of random errors and an increased possibility of bias, I have to resort to the second case which is more complicated (at least for me), since the weight vector changes each time the c vector changes (from the x0 guess values) to produce an output based on the objective function. The objective function produces as an ouput a matrix of n x 2 dimensions (the real and imaginary part of a complex number). My goal is to use as weight the inverse of the modulus of the produced (calculated) values, i.e. the ydata in w above would be the calculated ydata for each c.
My question is how I can assign these calculated values to w.
Any help/suggestions would be highly appreciated.
Thank you in advance for your time.
Regards

Answers (1)

When I have this type of problems (fitting of functions to photon-counting-data where the weights should be calculated from the function - Poisson-statistics using the expected values), I typically get good enough results with your first approach (the weights calculated from the data doesn't differ much from the weights calculated from the fitted function). For the next-level attempt I've thought about running this iteratively, first use the data-weights, then repeat with weights calculated from the first solution - which hopefully should be close to the optimal solution.
If that is not good enough, you might have to do this "properly" and calculate the weights as you should - you might have to be careful and make sure that the weights don't blow up/down. If your data doesn't have normal-distributed noise you might have to use the proper "norm" to minimize - this might force you away from any lsq-function - unfortunately.
HTH

3 Comments

Hey Bjorn,
Many thanks for your reply. This iterative approach of yours makes total sense. In fact it is a quite good idea. I could give it a try. Thank you for your suggestion. I am trying to find a solution to the above problem especially for these data that is noisy due to instabilities and non-linear effects (I am dealing with dielectric constants and impedance data) which sometimes results to bias. I ve been struggling the past couple of days to find a work around it but nothing yet. If I manage to think of sth that actually works I will let you know, since you are also dealing with similar problems (maybe in the meantime though one of the many experts in the community will add sth).
Once again thank you for your time and your reply.
Regards
What you want to do is to get a maximum-likelihood solution, right? This requires you to have some knowledge/grasp/estimate of the statistical characteristics of your data (if all data can be assumed to be samples from some normal-distributions we get the least-square-solution, weighted or not. If you have some other noise-characteristics, for example Cauchy-distribution (with much longer tails) you have to maximize another likelihood-function). If you think that you have data that is reasonably normal-distributed except for some fraction of outliers you could take a look at Huber-like norms, that's one way to reduce the impact of outliers.
Thank you very much for your illuminating comments. Electrical impedance measurements are usually measured by the application of a perturbation signal (sinuosidal excitation signal). The excitation and response signals (voltage and current) are recorded using a constant sample rate and then are transferred to the frequency domain (by FT) to calculate the impedance of the system. For an impedance spectrum, several frequencies are applied. The noise component of the signal at each frequency (isolated by FT) results in a Gaussian distribution around the true value of the impedance. I believe that this is why the least-square-solution is implemented (as you pointed out). However, I will totally have in mind the different distributions you mentioned and try to see if they can be applied to my case (especially the Huber function).

Sign in to comment.

Products

Release

R2018b

Asked:

on 6 May 2020

Edited:

on 7 May 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!