Why does Curve Fitting Toolbox produce a good exponential fit graph with a wrong equation?

Hello everyone,
I have created a graph for my data points using second order exponent (a * exp(bx) + c * exp(dx)). You can see it on the first picture. It looks good to me, there is no problem with the graph. However, I need to get the equation of this graph and that is where the problem comes out.
On the second picture you can see the constants for my equation with unusually large 95% confidence bounds. The graph shows these boundaries to be very small, almost insignificant. Also if I plug in some value of x in the equation and the graph, for example x=25, the results are very different:
Equation: -1.297 * exp(-2.486 * 25) + 1.325 * exp(-2.48 * 25) = 2.47 * 10^-20
Graph: at x = 25, y = 0.183 (shown on the first picture)
You can find my variables by clicking on this link and downloading the excel file.
https://drive.google.com/file/d/1YVOYX0Mno68Ab5pr53ytZEC5_GuEKC6H/view?usp=sharing
The value on the graph is correct. My question is what is the problem and how can I solve it?

7 Comments

Why describing it with two exponential term? One exponential term makes me think that it should also do it for you.
I think Birdman has made a good point. Both of your exponents are similar.
Suppose that the 'true' equation should be 0.027*exp(-2.483 * t).
If you request a fit of the form A*exp(B*t), then the respective confidence intervals on A and B should be small.
But if you request a fit of the form A*exp(B*t) + C*exp(D*t), then valid fits include:
  • (0.027)*exp(-2.483 * t) + (0.000)*exp(-2.483 * t)
  • (1.027)*exp(-2.483 * t) + (-1.000)*exp(-2.483 * t)
  • (100.027)*exp(-2.483 * t) + (-100.000)*exp(-2.483 * t)
  • (-0.973)*exp(-2.483 * t) + (+1.000)*exp(-2.483 * t)
  • (0.000)*exp(-2.483 * t) + (0.027)*exp(-2.483 * t)
  • et cetera
Those are all mathematically identical! Therefore the C.I's on A and C would be very, very wide. (In your case, due to the specific numeric details you also obtained wide C.I's on B and D.)
Lastly, the erroneous value of 2.47 * 10^-20 that you calculated from the equation is almost certainly arising due to rounding error, as the parameter estimates are only printed in your output with a maximum of 3 decimal places. Presumably this can be adjusted to see more decimal places. Try the format long command, perhaps.
—DIV
Well, the curve with one exponential term is far less accurate than with two exponential term.
Nonetheless, the problem still persists even with one exponential term.
Can you share your fitting data in a mat file?
Ok, I can increase the number of decimal points, no problem. But what that 'formal long' command you are talking about. I am sorry, but I am not very familiar with MatLab.
I cannot attach the mat file with my variables, because I am struggling to save it. As I said, I am new here. So here is the link for the Excel file. Just create two variables, X and Y, and paste the values from the excel file
https://drive.google.com/file/d/1YVOYX0Mno68Ab5pr53ytZEC5_GuEKC6H/view?usp=sharing
Why is it a problem to save it?
save mymatfile X Y
That will save a .mat file in your current directly, assuming you have write access to that directory. So WTP?

Sign in to comment.

 Accepted Answer

NEVER use 4 or 5 digit approximations to coefficients. In fact, even if you display the coefficients in the command window using format long, STILL don't copy them out, because you won't get the exact coefficient!
I don't have your data, so I can't show results for your data. I'll just make up some garbage data.
x = rand(10,1);
y = rand(10,1);
F = fit(x,y,'exp1')
F =
General model Exp1:
poly(x) = a*exp(b*x)
Coefficients (with 95% confidence bounds):
a = 0.6954 (0.1274, 1.263)
b = -0.7068 (-2.599, 1.186)
Now, NEVER just copy those coefficients.
Instead, use coeffvalues to extract them.
C = coeffvalues(F)
ans =
0.69543 -0.70679
That is not their true values though. Instead, the numbers are closer to this:
format long g
coeffvalues(F)
ans =
0.695433813207908 -0.706794760999295
So C has the final set of coefficients used for the model.
Again however, try not to use 0.695433813207908 as the number. Instead, the model is:
y = C(1)*exp(C(2)*x)
Use it that way.
Ok, now to what I think is the more important part of your question. Why are those bounds so wide?
LOOK AT YOUR CURVE!!!!!!
Your curve basically looks like a negative exponential, with a minor tweak at the beginning. So the fit with one exponential term in there was probably not terribly bad. But sums of exponentials are moderately difficult things to estimate. Why do I say that?
Think of it as arising from the fact that exp(x) looks a lot like exp(2*x), or exp(k*x) in general. So the nonlinear solver can have some problems.
Now, look at the coefficients that it found! Don't just build a model and then assume it is correct! In fact, the model you estimated was probably complete crap! (Sorry, but true.)
LOOK AT THE COEFFICIENTS.
b = -2.486, with bounds [-56,51]
d = -2.48 with bounds of [-55,50]
Both exponentials are almost identical in shape, with the exception that the coefficients out front had opposite signs. This is a sign that the fit gave you lower error, but it is probably meaningless. Those coefficients have no value at all. Were the noise in the data to change by a small amount, you would get different numbers.
Again, those hugely wide bounds on the coefficients suggests that the model is terribly poorly estimated. (I say poorly because you really have no clue as to the actual values of those coefficients, if that was indeed the true model for this process.) Your data is simply insufficient to estimate a two term exponential fit. The noise is way too high, and you have way too little data for this model.
This is not a problem of the curve fitting toolbox, or of MATLAB at all, but in your choice of model for this data.
Is there a better choice for the model? Perhaps. But I don't know why you chose the model you did. Perhaps you have some reason for thinking it should be a sum of two exponentials. Honestly, I doubt that is true. Most of the time, people use an exponential model, then decide that it the curve is basically a negative exponential, then a sum of two of them MUST be better. Sorry, but that does not follow.

2 Comments

Alright, I see your point. Should not have tried exponents in the first place.
Anyway, before sticking to the exponents I used half-life equation (a + b/2^(x/c)). Still an exponential equation, but MatLab gave correct values for a, b and c and good r^2. Then I tried these second order exponents with an even better r^2. I guess I better use the half-life equation for my data.
Thank you for your answer! It was very detailed and well explained!
I've often argued that you use a model because you have some physical reason, based on valid physical principles for the choice of model.
For example, suppose you were fitting the growth of a population of bacteria in a petri dish. Here, an exponential model makes a lot of sense, as growth of bacteria will arguably grow exponentially until something counters the growth. So you can now talk about growth rate.
So that model is based on basic physical principles. In some cases, we have what I call a metaphorical model. A good example of that is the sales of a product. Here, a disease propagation model makes sense, which can also be formulated in exponential terms. A disease propagates by transmission between members of the society it targets. But sales of a product can also be thought of similarly. People talk to each other, and they pass the "disease" on to other members. So product sales can be quite nicely modeled by disease propagation models, as long as word of mouth is the main mode of transmission. This is what I would call a metaphorical model. Again, if you had data for such a process, you would fit the appropriate model to the data, and get coefficients out that could be interpreted in a consistent sense.
But when you really have no good reason for choosing one model over another, I often suggest a spline model as making as much sense. Using my SLM toolbox,
X = [0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240];
Y = [0.268 0.223810336 0.196870203 0.1720023879 0.1450622548 0.1077605322 0.08703735289 0.06838649156 0.05388026608 0.03522940474 0.02901245096 0.02279549719 0.01657854341 0.01243390756 0.01036158963 0.008289271704 0.006216953778 0.004144635852 0.004144635852 0.004144635852 0.002072317926 0.002072317926 0.002072317926 0 0];
slm = slmengine(X,Y,'knots',8,'decreasing','on','minvalue',0,'plot','on');
You can evaluate it at any point as simply:
slmeval(85,slm)
ans =
0.0433105080236424

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!