P-values on model coefficients, further explanation from "P-value from nlinfit" posted question

13 views (last 30 days)
I am analyzing some data, much of which can be modeled linearly but some will be more complex. I am trying to gain an understanding of the capabilities of "glmval", and "nlinfit"->"linhyptest" to determine the significance of my fit. From the "P-value from nlinfit" post I saw how to compare a model against a horizontal line. I would like to increase the model complexity and see if the extra term is significant. I'll better be able to ask my question with this example.
%%Actual data where a linear fit does a pretty good job
x = [0 4.446 8.892 13.338 17.784 22.23 26.676 31.122 35.568 40.014 44.46]';
y = [0 1.3420442 1.832596 2.3950097 3.0239183 3.7037954 4.4469934 5.1660109 5.8499245 6.549648 7.2603489]';
%%Using glmfit
[b,dev,stats] = glmfit(x,y);
[yhat,dylo,dyhi] = glmval(b,x,'identity',stats);
%%Plot observations vs linear model w/ confidence interval, not necessary just so you can see the data
figure
hold on
plot(x,y,'-ob');
plot(x,yhat,'--+k')
plot(x,yhat+dyhi,'--r')
plot(x,yhat-dylo,'--','color',[1 0.58 0])
legend('observation','regression','upper 95% conficence bound','lower 95% confidence bound')
glmfit returns 2 p-values (in the stats structure) that are way less than 0.05, if they're expressing the signifiance of the values in b this makes sense to me as a linear model is expected.
%%Linear model, I expect to see similar resuts with linhyptest and glmfit
%My assumption is verified with the values of "b" from glmfit and
%"beta" in linhyptest in regards to equation coefficients
mdl = @(a,x)(a(2)*x + a(1)) ;
a0 = [y(6); .0545 ];
[beta,r,J,cov,mse] = nlinfit(x,y,mdl,a0);
dfe = length(y)-1;
H = [1 0; 0 1];
c = [0;0];
[p,t,r] = linhyptest(beta,cov,c,H,dfe);
Using this method results in a single small p-value. Does this value represent the significance of both variables compared against the null hypothesis of a horizontal line with a y intercept of zero?
Now for verification I would like to be able to apply a 2nd (or higher for that matter) order polynomial as my model and see that the higher order terms are not significant.
%%2nd order
mdl = @(a,x)(a(3)*x.^2 + a(2)*x + a(1));
a0 = [0.0959; .0545; 0.5 ];
[beta,r,J,cov,mse] = nlinfit(x,y,mdl,a0);
dfe = length(y)-1;
H = eye(3);
c = [0;0;0];
[p,t,r] = linhyptest(beta,cov,c,H,dfe);
Like this p is still very small because (I'm assuming) it's still comparing the 2nd order equation against a horizontal line. If I alter c and H to focus the hypothesis test to the 2nd order term
[p,t,r] = linhyptest(beta,cov,c(3),H(3,1:3),dfe);
%For clarity this is the same
[p,t,r] = linhyptest(beta,cov,0,[0 0 1],dfe);
the p value jumps up to 0.8365, way more than the common 0.05 value for 95percent confidence interval . Here's the question. Was that last step correct? Am I actually finding the significance of the highest order term or am I just getting coincidentally good results?
Thanks for any input, -Hannon

Accepted Answer

Tom Lane
Tom Lane on 28 Apr 2012
It looks like you have the correct interpretation. Using your own idea of comparing with glmfit, you can set that the p-value around 0.8 matches the p-value for the coefficient of the squared term only:
>> [b,dev,st] = glmfit([x,x.^2],y);
>> st.p(end)
ans =
0.8375
If you have any opportunity to use the LinearModel, NonLinearModel, and GeneralizedLinearModel features in R2012a, you may find it easier to understand.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!