Curve fitting returns different parameter values every time

9 views (last 30 days)
Hello,
I'm trying to fit some data with the following equation, which is a modified error function and contains 4 unknown parameters (I'm mainly interested in par4):
Y = par1+(par2*(erf((X-par3)/ par4)))
If I set some sensible start points, the data point are fitted reasonably well (R-square ~ 0.95) but the error on some of the parameters found has a very wide range.
Below is a image of the data to fit, the fitted curve and the parameters found.
par4 = 0.9109 (-52.89, 54.71)
par2 = 0.02647 (0.02421, 0.02872)
par3 = 38.15 (8.128, 68.18)
par1 = 0.8647 (0.8624, 0.867)
If I change even slightly the start points (e.g. for par4, from 0.5 to 0.1), the final fit values change. Similarly, if I make TolFun value larger the final parameters and their error change as well as the fit. In some cases, I'm even presented with no error intervals for the found parameters. For instance (if I set par4 start point = 3 and the others remain unchanged) I get:
par4 = 0.1712
par2 = 0.02674
par3 = 38.63
par1 = 0.8649 (0.8625, 0.8672)
As you can imagine, I'm no expert in the maths of the fitting process but I thought that the different results mean that the data is poor (no data points in the centre of the curve) and many curves can reproduce it with a relatively good R-square. Also, the large error intervals may mean that the spread in the data is large. Are my thoughts correct?
Also, what does the option TolFun control? And, I know the error intervals are calculated on 95% (2sigma) criterion; can I change that to 68% (1sigma) to have a narrower error interval?
Many thanks,
Giuseppe

Accepted Answer

Matt J
Matt J on 23 Oct 2015
Edited: Matt J on 23 Oct 2015
but I thought that the different results mean that the data is poor (no data points in the centre of the curve) and many curves can reproduce it with a relatively good R-square.
Yes, I would say that that's a big part of it. When you have no data near the non-flat portions of your curve, the least squares cost function is essentially locally constant as a function of the unknown parameters. Small perturbations in the parameters produce negligible improvement in cost and the iteration stopping criteria get triggered very early.
Also, it would be prudent to express X in smaller units so that the curved portion of the data varies more gradually. Otherwise, the sensitivity of the curve to par3,par4 is much greater than to par1, par2, and you get ill-conditioning.
Also, what does the option TolFun control?
Some normalized threshold on iteration-to-iteration changes in the cost function and its gradient. There is no precise and detailed documentation on this, unfortunately. You should tune TolFun using simulations, although making your model comparably sensitive to all parameters as I mention above should make the optimization less sensitive to the specific choice of TolFun.
  3 Comments
Giuseppe Bugatti
Giuseppe Bugatti on 26 Oct 2015
I'm not sure I fully understand. Your code allows to change the errorbar series (values and appearance) but how does that relate to the error interval for the parameters found by the fit function?
Matt J
Matt J on 27 Oct 2015
It wasn't clear what error intervals you were talking about. Whatever they are, I'm pretty sure they are linear in sigma, so if you want 1 sigma instead of 2, you would just divide what the function originally gives you by 2.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!