Finding Coefficients and Curve of best fit for a power function using polyfit!

8 views (last 30 days)
Im trying to find the coefficients A and n from the function y=A.*Re.^n, which is a power function. The graphs looks like this:
and i'm not sure how to find the line of best fit with polyfit, do i log them both?

Accepted Answer

John D'Errico
John D'Errico on 15 May 2015
Edited: John D'Errico on 15 May 2015
It is clear that the given power function won't fit your data. Wanting it to work is not sufficient. In fact, it appears to be a pair of discontinuous models. A SINGLE power function won't suffice here. How do I know that?
Let me start at the beginning. Suppose you have a log-log plot of some data, where the model is
y = a*x^b
then if we were to plot using a log-log plot, you are implicitly preforming the following transformation:
log(y) = log(a) + b*log(x)
so if in a log-log plot, the curve is now very nicely linear, with a slope of b, then we can identify the model pretty well.
The problem is, you obviously have two segments in that curve, with VERY different slopes. Worse, the model seems to be not even continuous. There is a discontinuity in those curves in the log-log plot.
So my recommendation is to solve this as a function that lives on two domains in the dependent variable, with a break point that falls roughly at Re = 2500. (I cannot know exactly where that break point lies, because you have no data between roughly Re = 2000 and Re = 3000.)
So extract the data points that lie BELOW Re == 2500, and do one fit. Then extract the remainder of the set, and do a second fit. Any single, global power fit to the entire set will fail here. I'm sorry, but it will fail to give you something that is a viable model with any kind of predictive power.
The next question is whether the modeling should be done in a log domain by polyfit, or in the original un-logged domain. Star was unequivocal in the claim that it should be done in the original domain, but I'd be happy to argue with that point. In fact, we don't yet KNOW where the modeling should be done. Star might be correct there, or he might be wrong. I lack sufficient information to know, and so do you until you do some investigation.
When you fit a model of the form
y = a*x^b
you implicitly assume an ADDITIVE error structure. That is, you are assuming the errors are of the form
y = a*x^b + e
where e is probably normally distributed error, with a mean of zero and some unknown variance.
However it is often the case that we don't have additive error. In fact, if your errors are proportional, so bigger errors will lie on bigger function values, then you essentially have a multiplicative error structure. In this case, when you log the model, that multiplicative error turns into additive error. But if you tried to log the expression (a*x^b+e), the log of a sum of terms is nothing simple.
A multiplicative error (proportional error) model would look like this:
y = a*x^b * e
where the e term is generally log-normally distributed error. So the log of that expression looks like this:
log(y) = log(a) + b*log(x) + log(e)
So if e is lognormally distributed noise, then log(e) is normally distributed.
So it is often enough the case that it IS correct to build your model in the log domain. How can we know which is which? A very strong clue can come from looking at the residuals. You should always plot the residuals anyway for a model fit. So plot them. That takes little more than
plot(x,y - yhat,'o')
Look to see if they tend to be more uniformly distributed in one form of the model versus the other. That will tell you how to fit the model.
So Star has already shown you how to do a model fit in the nonlinear form. In the case of a linearized model, do this
ind = Re < 2500;
x_1 = Re(ind);
x_2 = Re(~ind);
y_1 = y(ind);
y_2 = y(~ind);
ab_1 = polyfit(log(x_1),log(y_1),1);
ab_2 = polyfit(log(x_2),log(y_2),1);
Now, we can recover the coefficients from these fits easily enough.
A_1 = exp(ab_1(2));
n_1 = ab_1(1);
A_2 = exp(ab_2(2));
n_2 = ab_2(1);
yhat_1 = A_1*x_1.^n_1;
yhat_2 = A_2*x_2.^n_2;
resid_1 = y_1 - yhat_1;
resid_2 = y_2 - yhat_2;
So now plot the residuals. If they look to be nicely well-behaved, random looking residuals, then the transformation seems to make sense, and you have a simple way to solve the problem. If the residuals have a pattern to them, say they look roughly u-shaped, or they appear to be larger at one end of the curve then the other, then the log-domain was a bad choice in which fit the model.
Another way to think about the proportional error issue is to just think about how the data was measured, how it is created, and how the noise will enter in. Some data just logically has noise that is proportional to the size of the dependent variable. If so, then logging the model makes sense to fit it. Another clue is often just in the numbers themselves. If you log the data because it spans many orders of magnitude, there is a good chance that your noise structure is proportional.
But this is something we won't know unless you try it. The only thing I can absolutely guarantee? It is that a single power function fit, IN ANY domain, is a bad choice here. That has been proven (in advance) by your plot.
  2 Comments
Shao Wen Lim
Shao Wen Lim on 16 May 2015
blue dots are the initial points and the red ones are the attempted best fit... doesn't look too right but i get ur concept! thanks !!
John D'Errico
John D'Errico on 16 May 2015
Edited: John D'Errico on 16 May 2015
But I don't know what you are plotting. My first guess might be that the red dots you have plotted are the residuals, not the fit itself. Since I lack your data, I cannot test the code I've written (though it looks correct) to show what the result should look like.
You might decide by the way to plot it using a log scale for x, thus to use semilogx. Of course, you cannot use a log scale on the residuals, since they will be both positive and negative.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!