Why does Curve Fitting Toolbox produce a good exponential fit graph with a wrong equation?

Question

0 votes

Hello everyone,

I have created a graph for my data points using second order exponent (a * exp(bx) + c * exp(dx)). You can see it on the first picture. It looks good to me, there is no problem with the graph. However, I need to get the equation of this graph and that is where the problem comes out.

On the second picture you can see the constants for my equation with unusually large 95% confidence bounds. The graph shows these boundaries to be very small, almost insignificant. Also if I plug in some value of x in the equation and the graph, for example x=25, the results are very different:

Equation: -1.297 * exp(-2.486 * 25) + 1.325 * exp(-2.48 * 25) = 2.47 * 10^-20

Graph: at x = 25, y = 0.183 (shown on the first picture)

You can find my variables by clicking on this link and downloading the excel file.

https://drive.google.com/file/d/1YVOYX0Mno68Ab5pr53ytZEC5_GuEKC6H/view?usp=sharing

The value on the graph is correct. My question is what is the problem and how can I solve it?

7 Comments
Show 5 older comments Hide 5 older comments

David Verrelli on 21 Dec 2017

I think Birdman has made a good point. Both of your exponents are similar.

Suppose that the 'true' equation should be 0.027*exp(-2.483 * t).

If you request a fit of the form A*exp(B*t), then the respective confidence intervals on A and B should be small.

But if you request a fit of the form A*exp(B*t) + C*exp(D*t), then valid fits include:

(0.027)*exp(-2.483 * t) + (0.000)*exp(-2.483 * t)
(1.027)*exp(-2.483 * t) + (-1.000)*exp(-2.483 * t)
(100.027)*exp(-2.483 * t) + (-100.000)*exp(-2.483 * t)
(-0.973)*exp(-2.483 * t) + (+1.000)*exp(-2.483 * t)
(0.000)*exp(-2.483 * t) + (0.027)*exp(-2.483 * t)
et cetera

Those are all mathematically identical! Therefore the C.I's on A and C would be very, very wide. (In your case, due to the specific numeric details you also obtained wide C.I's on B and D.)

Lastly, the erroneous value of 2.47 * 10^-20 that you calculated from the equation is almost certainly arising due to rounding error, as the parameter estimates are only printed in your output with a maximum of 3 decimal places. Presumably this can be adjusted to see more decimal places. Try the format long command, perhaps.

—DIV

Boleslav Tiunkov on 21 Dec 2017

I cannot attach the mat file with my variables, because I am struggling to save it. As I said, I am new here. So here is the link for the Excel file. Just create two variables, X and Y, and paste the values from the excel file

https://drive.google.com/file/d/1YVOYX0Mno68Ab5pr53ytZEC5_GuEKC6H/view?usp=sharing

John D'Errico on 21 Dec 2017

Open in MATLAB Online

Why is it a problem to save it?

save mymatfile X Y

That will save a .mat file in your current directly, assuming you have write access to that directory. So WTP?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

John D'Errico on 21 Dec 2017

Edited: John D'Errico on 21 Dec 2017

Open in MATLAB Online

0 votes

NEVER use 4 or 5 digit approximations to coefficients. In fact, even if you display the coefficients in the command window using format long, STILL don't copy them out, because you won't get the exact coefficient!

I don't have your data, so I can't show results for your data. I'll just make up some garbage data.

x = rand(10,1);
y = rand(10,1);
F = fit(x,y,'exp1')
F = 
     General model Exp1:
     poly(x) = a*exp(b*x)
     Coefficients (with 95% confidence bounds):
       a =      0.6954  (0.1274, 1.263)
       b =     -0.7068  (-2.599, 1.186)

Now, NEVER just copy those coefficients.

Instead, use coeffvalues to extract them.

C = coeffvalues(F)
ans =
      0.69543     -0.70679

That is not their true values though. Instead, the numbers are closer to this:

format long g
coeffvalues(F)
ans =
         0.695433813207908        -0.706794760999295

So C has the final set of coefficients used for the model.

Again however, try not to use 0.695433813207908 as the number. Instead, the model is:

y = C(1)*exp(C(2)*x)

Use it that way.

Ok, now to what I think is the more important part of your question. Why are those bounds so wide?

LOOK AT YOUR CURVE!!!!!!

Your curve basically looks like a negative exponential, with a minor tweak at the beginning. So the fit with one exponential term in there was probably not terribly bad. But sums of exponentials are moderately difficult things to estimate. Why do I say that?

Think of it as arising from the fact that exp(x) looks a lot like exp(2*x), or exp(k*x) in general. So the nonlinear solver can have some problems.

Now, look at the coefficients that it found! Don't just build a model and then assume it is correct! In fact, the model you estimated was probably complete crap! (Sorry, but true.)

LOOK AT THE COEFFICIENTS.

b = -2.486, with bounds [-56,51]
d = -2.48 with bounds of [-55,50]

Both exponentials are almost identical in shape, with the exception that the coefficients out front had opposite signs. This is a sign that the fit gave you lower error, but it is probably meaningless. Those coefficients have no value at all. Were the noise in the data to change by a small amount, you would get different numbers.

Again, those hugely wide bounds on the coefficients suggests that the model is terribly poorly estimated. (I say poorly because you really have no clue as to the actual values of those coefficients, if that was indeed the true model for this process.) Your data is simply insufficient to estimate a two term exponential fit. The noise is way too high, and you have way too little data for this model.

This is not a problem of the curve fitting toolbox, or of MATLAB at all, but in your choice of model for this data.

Is there a better choice for the model? Perhaps. But I don't know why you chose the model you did. Perhaps you have some reason for thinking it should be a sum of two exponentials. Honestly, I doubt that is true. Most of the time, people use an exponential model, then decide that it the curve is basically a negative exponential, then a sum of two of them MUST be better. Sorry, but that does not follow.

2 Comments
Show None Hide None

Boleslav Tiunkov on 21 Dec 2017

Alright, I see your point. Should not have tried exponents in the first place.

Anyway, before sticking to the exponents I used half-life equation (a + b/2^(x/c)). Still an exponential equation, but MatLab gave correct values for a, b and c and good r^2. Then I tried these second order exponents with an even better r^2. I guess I better use the half-life equation for my data.

Thank you for your answer! It was very detailed and well explained!

John D'Errico on 21 Dec 2017

Edited: John D'Errico on 21 Dec 2017

Open in MATLAB Online

I've often argued that you use a model because you have some physical reason, based on valid physical principles for the choice of model.

For example, suppose you were fitting the growth of a population of bacteria in a petri dish. Here, an exponential model makes a lot of sense, as growth of bacteria will arguably grow exponentially until something counters the growth. So you can now talk about growth rate.

So that model is based on basic physical principles. In some cases, we have what I call a metaphorical model. A good example of that is the sales of a product. Here, a disease propagation model makes sense, which can also be formulated in exponential terms. A disease propagates by transmission between members of the society it targets. But sales of a product can also be thought of similarly. People talk to each other, and they pass the "disease" on to other members. So product sales can be quite nicely modeled by disease propagation models, as long as word of mouth is the main mode of transmission. This is what I would call a metaphorical model. Again, if you had data for such a process, you would fit the appropriate model to the data, and get coefficients out that could be interpreted in a consistent sense.

But when you really have no good reason for choosing one model over another, I often suggest a spline model as making as much sense. Using my SLM toolbox,

X = [0  10  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210  220  230  240];
Y = [0.268  0.223810336  0.196870203  0.1720023879  0.1450622548  0.1077605322  0.08703735289  0.06838649156  0.05388026608  0.03522940474  0.02901245096  0.02279549719  0.01657854341  0.01243390756  0.01036158963  0.008289271704  0.006216953778  0.004144635852  0.004144635852  0.004144635852  0.002072317926  0.002072317926  0.002072317926  0  0];
slm = slmengine(X,Y,'knots',8,'decreasing','on','minvalue',0,'plot','on');

You can evaluate it at any point as simply:

slmeval(85,slm)
ans =
      0.0433105080236424

Sign in to comment.

Why does Curve Fitting Toolbox produce a good exponential fit graph with a wrong equation?

7 Comments
Show 5 older comments Hide 5 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (0)

Categories

Tags

Community Treasure Hunt

Why does Curve Fitting Toolbox produce a good exponential fit graph with a wrong equation?

7 Comments Show 5 older comments Hide 5 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments

2 Comments
Show None Hide None