Code covered by the BSD License

# Symbolic Derivatives for Econometric Tests

02 Nov 2009 (Updated 08 Feb 2010)

How to calculate derivatives required by Econometric Toolbox tests via Symbolic Math Toolbox

Symbolic Derivatives for Econometric Tests

# Symbolic Derivatives for Econometric Tests

## Introduction to Derivatives for Econometric Tests

Two Econometric Toolbox™ model misspecification tests, lmtest and waldtest, require you to provide expressions for derivatives as inputs. While most of us learned to compute derivatives in school, these computations can be tedious and error-prone.

The Symbolic Math Toolbox™ jacobian function calculates derivatives of symbolic expressions. matlabFunction converts symbolic expressions into function handles or files that calculate numerically as if substituting values for symbolic variables. Between these two functions you can automate the calculation of derivatives and use the resulting expressions in econometric applications. This example shows how to include data in your symbolic setup, and how to use the results in lmtest and waldtest.

## Example: Data, Hypothesis, and Basic Regression Analysis

This example models the real GNP data series from the Nelson-Plosser data set as a loglinear series plus Gaussian noise. The question is whether the long-term growth rate might, in fact, be less than 2.9%, when the fitted model shows it is about 3.1%.

There are several ways of answering this question. The simplest is probably to use the regress function from Statistics Toolbox™. Load the Nelson-Plosser dataset and extract the real GNP series. Find the best linear fit and a 95% confidence interval for the coefficients of the fit.

load Data_NelsonPlosser % your computer might have to use
gnp = Dataset.GNPR;
gnpr = log(gnp(not(isnan(gnp)))); % log scaled
datesr = dates(not(isnan(gnp))); % Date series
X = [ones(size(datesr)) datesr]; % include constant term in fit
alpha = 0.05;
[betahat,Ibeta] = regress(gnpr,X,alpha);
fprintf('Best slope = %4.3g\n',betahat(2))
fprintf('95%% confidence interval = [%4.3g %4.3g]',Ibeta(2,:))

Best slope = 0.031
95% confidence interval = [0.0291 0.0329]

The 95% confidence interval does not contain the slope 0.029. Therefore, this test indicates rejection of the hypothesis that the true slope is 2.9%.

To see the results of calculating the best fitting line with slope 0.029, regress (gnpr - 0.29*datesr) on a constant:

x = ones(size(datesr));
[xhat Ix] = regress(gnpr-0.029*datesr,x,alpha);
gnpx = xhat + 0.029*datesr;
hold on
plot(datesr,gnpr,'bo');lsline % data points and least squares fit
plot(datesr,gnpx,'r-') % 0.029 line
legend('Data','Best linear fit','0.029 Linear fit','Location','nw')
hold off


## Analysis using Symbolic Math Toolbox and Econometric Functions

We now show how to use lmtest and waldtest to examine the hypothesis that the true slope is 2.9%. lmtest requires three inputs: the 'score' (gradient) at the restricted maximizer of the loglikelihood, a covariance estimated at this restricted maximizer, and the number of restrictions. waldtest requires the restriction functions and their Jacobian evaluated at the unrestricted maximizer, and also requires a covariance estimated at the unrestricted maximizer. To calculate the score and covariance, you need to calculate derivatives.

There are several covariance estimates in general use. One, the outer product of gradients (OPG), is not very accurate for small data sets, but is in widespread use because it requires only first derivatives (gradients). This example shows how to compute an OPG estimate for lmtest. Another estimator, , is often more accurate than the OPG estimate, but requires a second derivative. This example shows how to compute this estimator for waldtest.

Now begin the process of evaluating whether the 0.029 slope is statistically distinguishable from the 0.031 slope.

## Write a symbolic expression for the loglikelihood function

Assume independent normal zero-mean deviates, with standard deviation . The loglikelihood of an observation when the expected value is is

syms a b x y real
syms sigma positive % 'real' and 'positive' aid simplification
indeps = [a b sigma]; % the decision variables, or parameters
mydata = [x y]; % symbolic variables for the data
L = 1/sqrt(sym(2)*pi*sigma^2) * exp(-(a*x+b-y)^2/(2*sigma^2));
logL = simplify(log(L)); % simplify leads to shorter expressions that compute faster


## Call matlabFunction to generate a function handle for evaluating the loglikelihood

Since MATLAB® optimization functions minimize, rather than maximize, use matlabFunction on the negative of logL.

% take negative loglikelihood since solvers minimize
llfcn0 = matlabFunction(-logL,'vars',{indeps,mydata});

% Take the sum of the loglikelihoods for the function evaluated
% at the data points
llfcn = @(x)sum(llfcn0(x,[datesr,gnpr]));


## Call fmincon to evaluate the maximizers of the restricted and unrestricted loglikelihood

Let the fitted regression line be . Find the maximum-likelihood regression (MLR) with set to 0.029 (restricted) and with allowed to vary (unrestricted). The MLR is the same as the least-squares regression line, so there are two ways of computing this line: least squares with regress as above, or maximum likelihood with fmincon. For generality we show how to use fmincon for these calculations.

Take an initial point [.03,–54,.03], because the least-squares solution has slope 0.031 and intercept –54.5. The starting value for , 0.03, is simply a guess. The guess is not very accurate, since the unrestricted maximum value turns out to be 0.13.

For both the restricted and unrestricted maximum loglikelihoods, bound the variable below by 0. For the restricted problem, bound the variable above and below by 0.029; this restricts the variable to be 0.029 without having to rewrite the objective function.

opts = optimset('Algorithm','interior-point','Display','off');
unrestricted = fmincon(llfcn,[.03 -54 .03],...
[],[],[],[],[-inf -inf 0],[],[],opts)
restricted = fmincon(llfcn,[.029 -54 .03],...
[],[],[],[],[.029 -inf 0],.029,[],opts)

unrestricted =

0.0309  -54.4034    0.1318

restricted =

0.0290  -50.6860    0.1365



## Call jacobian to calculate the gradient and Hessian of the loglikelihood

Now that we have the restricted and unrestricted maximization points, we calculate the gradient of the loglikelihood function and the Hessian. The gradient is calculated only for the independent variables, not for the data.

grad = jacobian(logL,indeps); % Gradient loglikelihood
hessLL = jacobian(grad,indeps); % Symbolic Hessian


## Call matlabFunction to generate a function handle for evaluating the score and covariance

Recall that we need the score at the restricted maximum for lmtest, and a covariance estimate at both the restricted maximum (lmtest) and the unrestricted maximum (waldtest). We use the OPG estimator for lmtest, and the observed Hessian estimator for waldtest.

gradnle = matlabFunction(grad,'vars',{indeps,mydata});
% gradients evaluated at the data points, restricted maximum
scorer = sum(totgradr); % the 'score' at the restricted max
% of the Hessian
hessLLhandle = matlabFunction(hessLL,'vars',{indeps,mydata});
% Calculates the Hessian for input parameters
hessum = zeros(length(indeps));
for ii = 1:length(datesr)
hessum = hessum + hessLLhandle(unrestricted,[datesr(ii) gnpr(ii)]);
% Adds the Hessians at the unrestricted maximum
% evaluated at all the data points
end
escov = -inv(hessum); % The unrestricted covariance estimate


## Test the hypothesis with lmtest and waldtest

We now have all the inputs for lmtest. For waldtest, we still need the value of the restriction function and its gradient at the unrestricted maximum. The restriction is given by the function , where . Clearly r = unrestricted(1) – 0.029, and the Jacobian of the gradient is . For more complex restriction functions, you could use jacobian and matlabFunction to obtain these inputs.

r = unrestricted(1) - 0.029;
R = [1 0 0]; % independent variables a,b,sigma
[hlm,pValue,stat,criticalValue] = lmtest(scorer,OLLestr,1)
[hwald,pValue,stat,criticalValue]  = waldtest(r,R,escov)

hlm =

1

pValue =

0.0022

stat =

9.3759

criticalValue =

3.8415

hwald =

1

pValue =

0.0405

stat =

4.1965

criticalValue =

3.8415



Since hlm and hwald are both 1, both lmtest and waldtest reject the hypothesis that the growth rate of GNP in the Nelson-Plosser data is only 2.9%. Note that the p-value for waldtest is almost 5%, meaning waldtest just barely rejects the hypothesis. This close call reflects the initial confidence interval that just barely missed a slope of 0.029.

## Conclusion

Symbolic Math Toolbox functions give straightforward ways of incorporating derivatives into MATLAB calculations. They speed the tedious, error-prone process of calculating derivatives, and reduce the possibility of error. They ease the use of some econometric functions.