GRANGER_CAUSE is a Granger Causality Test. The null hypothesis is that the y does not Granger Cause x. A user specifies the two series, x and y, along with the significance level and the maximum number of lags to be considered. The function chooses the optimal lag length for x and y based on the Bayesian Information Criterion. The function produces the F-statistic for the Granger Causality Test along with the corresponding critical value. We reject the null hypothesis that y does not Granger Cause x if the F-statistic is greater than the critical value. Type help granger_cause to learn more.
Wonderful code, I learned a lot about Granger Causality from
studying the code.
Replies to comments:
RK - I am pretty sure T changes as the lag length changes.
LangS - I agree that granger_cause rejects the null hypothesis too often.
There seems to be two things happening. (1) using BIC to search over the various lagged models to find the best one is a form of multiple comparison that changes the distribution of F under the null hypothesis.
this can be seen by re-running your monte-carlo simulation with longer and longer lagged models
and note the the probability of F exceeding the critical value increases.
(2) If there is excess rejections of F with x lag = y lag = 1 then there is an error in calculating the degrees of freedom or some other error in the model.
I stated off a few years ago chasing down this error and wound up rewriting the code ( granger_cause_1 ) which i uploaded to the file exchange.
To be able to use BIC for lag selection, the number of observations across specifications needs to be constant, given by max_lag in this case. Right now the number of observations varied with the lag order, so it's not an apples-to-apples comparison. Also, BIC shouldn't have T: it needs to be adjusted for the observations dropped to ensure that the number of observations is the same.
Dear users, did someone confirmed the comments of Weihai, Langs and Jangyu ?
Thanks for your help and best luck
Hi Chandler, I compared this code with granger.test() in R package MSBVAR using the same data sets. And I got very different F statistics. This makes me confused.
Hi Chandler, I have a question re max_lag. Does this number relates to the frequency of my data, i.e. if I specify that max_lag=1 and my input is daily data, does this mean I am allowing 1 day lag? thanks
How should I determine the significance level?
Thank you for your great contribution.
Yet I have a question concerning calculating F-statistic,
the parameters for numerator and denominator as at the end of the function
are y_lag and x_lag+y_lag+1.
I read in Wikipedia which says "the simple linear model y = mx + b has p=2 ", if Wikipedia
is right, then the parameters should be "y_lag+1 and x_lag+y_lag+2"
could you please explain this in a simple way? I am quite confused.
This function uses Matlab function "regress" which assumes a constant term(intercept) in the linear regression and therefore violates Granger causality definition(See Granger's original paper, Econometrica 37:3 424-438, 1969).
I applied this function on two random numbers series (10 time points, lag = 2) and repeated 1000 times (every time the two series are different), and set the alpha value as 0.05. Among the 1000 tests, 400 were found to have "Granger causality"! So I think the result is not reliable.
Dear Chandler, I have gratefully downloaded your script. How would it be possible to also include a stationarity check into the routine or would you recommend to check stationarity of the series beforehand? Many thanks, Wolfgang
William, there was a small error in the selection of the lag lengths using the BIC. It should be fixed as of 03/18/2010
This appears to calculate the same x and y lags regardless of the actual lags present between the two vectors. In addition, the F statistic generated by the function only appears valid for specific (small) lags. The test I used involved a simple pair of sinusoids phase-shifted to various degrees with and without a small amount of Gaussian noise.
I like how the author uses the Bayesian Information criterion to select the lag length which is a consistent model selection criterion
There was an error in the calculation of the lengths of the BIC. It is now fixed. I would like to thank Mads for pointing out the bug.
There was an error in selecting the lag length for the BIC. It is now fixed. I would like to thank Mads for pointing out the bug.
A correction was made in the calculation of the critical value from the F-distribution