# Least Square Curve Fitting, finding the initial start values in lsqcurvefit function in Matlab

22 views (last 30 days)
Dritan Nikolla on 20 Oct 2016
Commented: Yasin Islam on 23 Jan 2021
First of all thank you for your help in advance. My question is regarding Matlab and non-linear Least Square curve fitting in Matlab - in both I am not familiar with.
I have this type of data:
x = [600, 800, 1000, 1200, 1400];
y = [0, 0.2, 0.4, 0.7, 1];
I am trying to use the following algorithms:
f = @(p,x) (p(3)-p(4))./(1+exp(-(x-p(2))/p(1)))+p(4);
opts = optimset('Display','off','MaxFunEvals',1000);
sigfit = lsqcurvefit(f, starting_value, intervals,problong,[],[],opts);
bisection_point= sigfit(2)-sigfit(1)*log((sigfit(3)-0.5)/(0.5-sigfit(4)))
The only problem I have is the starting value in sigfit variable. What would be the best starting values given the above numbers? Any help please would be extremely appreciated.
The above algorithms are based in lsqcurvefit function found in Matlab. Here is the link: http://uk.mathworks.com/help/optim/ug/lsqcurvefit.html
The X vector is time intervals in milliseconds, whereas the Y vector represents responses some participant made whether those intervals where percieved as close to a short (400ms) or long (1600ms) interval. I dont understand what starting points mean. Ultimately what I need to do is find the 0.5 point in the Y axis and the corresponding value on the X axis. The solution will be somewhere between 600ms to 1400 ms and probably around the 1200ms mark. I have put the starting value as a vector from 600 to 1400 but, I have no idea whether that is right or what that means. I was hoping someone better equipped than me can help answer this problem precisely :).
Thank you again, Dritan

dpb on 21 Oct 2016
Starting points means initial "guesses" for what the values of the parameters are; for nonlinear equations sometimes figuring out how to get reasonable estimates for these is difficult. One should always begin something like this by plotting the data to visualize what the response looks like and to see obvious problems. In your case, the data are pretty much a scatter shot with no real pattern overall and certainly nothing that one would think an exponential model particularly appropriate...plus, you've only got four data points and you're trying to estimate four parameters--you're seriously out of extra degrees of freedom here.
That said, let's give it a shot--
>> x = [600, 800, 1000, 1200, 1400];
y = [0, 02, 04, 0.7, 1];
>> plot(x,y,'*-')
>> f = @(p,x) (p(3)-p(4))./(1+exp(-(x-p(2))/p(1)))+p(4);
>> p0=[mean(x) mean(x) 1 1];
>> p=lsqcurvefit(f,p0,x,y)
Local minimum possible.
lsqcurvefit stopped because the final change in the sum of squares relative to
its initial value is less than the default value of the function tolerance.
<stopping criteria details>
p =
1.0e+04 *
0.9053 -1.0463 0.0006 -0.0013
>> yhat=f(p,x)
yhat =
1.3960 1.4689 1.5409 1.6121 1.6823
>> hold all
>> plot(x,yhat,':')
>> legend('data','fit')
>> ylim([-1 5])
>>
As not too unexpected, the fitted result is, over the range of x, essentially a straight line with a slightly positive slope; reflecting the general linear trend in the values over x with the obvious aberration between the 3rd and 4th points. There's not enough data to say anything meaningful about trying to guess what that might mean, if anything.
OK, back to the original question--how did I get the p0 values I chose? Mostly by simply looking at the functional form as you wrote it and considering what the terms would have to be to have any shot at all of being "in the neighborhood" of the response. With the exponential exp(-(x-p(2))/p(1)), I just figured subtracting the mean(x) from x and normalizing would be a reasonably small argument such that exp() wouldn't "blow up". The '1' values for the other two were based on the scale of the y values.
You can try various alternatives for starting values; I tried a couple of others and while not identical results, the difference in the fitted results was negligible as shown by the other points on the response line. ##### 2 CommentsShowHide 1 older comment
dpb on 21 Oct 2016
Well, that does make some difference on the shape!!! :)

Alex Sha on 26 Dec 2019
if " the second and third number are 0.2 and 0.4.", then the result will be much better:
Root of Mean Square Error (RMSE): 0.0103862568043762
Sum of Squared Residual: 0.000539371652032255
Correlation Coef. (R): 0.99957319083783
R-Square: 0.999146563841721
Determination Coef. (DC): 0.999146563841721
Chi-Square: 0.00489464660589835
F-Statistic: 390.244601747778
Parameter Best Estimate
---------- -------------
p1 -543.133091644427
p2 1587.68753263344
p3 -0.502866494727889
p4 3.12808542388122 Yasin Islam on 23 Jan 2021
wich function gives all the above mentioned values as an output? It's not out of lsqcurvefit is it?