Resolving discrepancy numerical vs analytical gradient

19 views (last 30 days)
Dear all,
I am trying to implement an analytical gradient for the likelihood function L given in the attached pdf. I give the gradients I have calculated there as well.
My MATLAB implementation of the negative log-likelihood function and gradients is given below and I use fminunc to solve it. I have run the optimization on two different datasets (attached) and get differences in the analytical and numeric gradients on the order of 1.8e-6 on one dataset (set ex = 2) but not on the other (set ex = 1). Also my parameter estimates do not match those obtained from running equivalent regressions in STATA with differences around 5% of the parameter value on average.
I am assuming I have some error in the analytical gradient implementation of the beta-components of the gradient vector. If anyone can help me out here and spot the error I would be very grateful. I will of course provide additional information as needed. The function and script needed to reproduce the behavior are below.
Thanks very much,
Peter
Here is the implementation of the log likelihood function and gradients and the data files are attached.
function [negLogLik grad] = truncReg_cost(Z,theta,params,c)
sigma = sqrt(params(1)); %standard dev of trunc normal
beta = params(2:end); %parameter vector
nObs = size(Z,1); % number of Obs'ns n
c = ones(nObs,1)*c; % vectorize truncation point
err = theta-Z*beta;
sse = err'*err;
cdfArg = (c-Z*beta)/sigma;
quotDens = normpdf(cdfArg,0,1)./(ones(nObs,1)-normcdf(cdfArg,0,1));
negLogLik = nObs*log(sigma*sqrt(2*pi))+0.5*(1/sigma^2)*sse+...
sum(log(ones(nObs,1)-normcdf(cdfArg,0,1)));
partialSigma = 0.5*((nObs/sigma^2)-(1/sigma^4)*sse+...
(1/sigma^3)*(cdfArg*sigma)'*quotDens);
partialBeta = -(1/sigma^2)*Z'*err+...
(1/sigma)*Z'*quotDens;
grad = [partialSigma;partialBeta];
end
To reproduce the issue you can run this script
% load example data
ex = 1;
if ex == 1
data = csvread('trExample.csv');
else
data = csvread('trExample2.csv');
end
if ex == 1
c = 0;
truncDir = 'l';
indVar = [ones(size(data,1),1) data(:,3:end)];
depVar = data(:,2);
else
c = 40;
truncDir = 'l';
indVar = [ones(size(data,1),1) data(:,2:end)];
depVar = data(:,1);
end
% truncate i.e. throw away all obs'ns where truncation violated
if truncDir == 'l'
indVar = indVar(depVar>c,:);
depVar = depVar(depVar>c,:);
else
indVar = indVar(depVar<c,:);
depVar = depVar(depVar<c,:);
end
% initial param estimates from OLS and std of depvar
betaInit = pinv(indVar'*indVar)*indVar'*depVar;
sigmaInit = var(depVar);
params = [sigmaInit; betaInit];
costfunc = @(params) truncReg_cost(indVar,depVar,params,c);
options = optimset('GradObj','on','Algorithm','interior-point', ...
'Display','on','DerivativeCheck','on');
[outParams,LL,ef,out,grad] = fminunc(costfunc,params,options);

Accepted Answer

Peter
Peter on 27 Nov 2013
Hi all,
another thought. I just modified the code to use Mark Schmidt's minFunc instead of fminunc. Not only is it 8-9 times faster than fminunc, but it also gives me identical estimates for the regressors as I obtained in STATA. minFunc does not carry out a gradient check so the faulty gradient implementation may still be an issue but I am starting to think that the problem may have been the solver used and not the code.
Best,
Peter

More Answers (2)

Matt J
Matt J on 26 Nov 2013
To me, partialSigma looks like it should be
partialSigma = nObs/sigma - sse/sigma^3 -quotDens*(c-Z*beta)/sigma^2
  2 Comments
Matt J
Matt J on 26 Nov 2013
You said there was supposed to be a pdf attached? I don't see it.
Matt J
Matt J on 27 Nov 2013
Edited: Matt J on 27 Nov 2013
One thing you could do is take the solution STATA, or minFunc, gave you and compute the value of your analytical gradient there. It should be very close to zero if the solution is genuine and your gradient calculation is good.
Also, you could feed the fminunc solution as an initializer to the other solvers (and vice versa) to see if the solvers quit in zero iterations. This would be a way of checking whether you have multiple solutions. You should also be comparing the value of the objective function that the different solvers terminate at to see which reaches the lowest value.
Finally, even after you clarified that you are minimizing w.r.t. sigma^2, I do not see how you obtain the term
(1/sigma^3)*(cdfArg*sigma)'*quotDens
or equivalently
(1/sigma^2)*(c-Z*beta)'*quotDens
The derivative of cdfArg w.r.t. sigma^2 is
-0.5(c-Z*beta)*sigma^(-3)

Sign in to comment.


Peter
Peter on 26 Nov 2013
Hi Matt,
thanks for the rapid answer. There is certainly the possibility that I have got a bug in the algebra and not the code. However I don't think that the result you suggest for partialSigma is the solution. I'll try to write the log-likelihood in terms of my code to make things more transparent:
LL = -n*log(sigma*sqrt(2*pi))-1/(2*sigma^2)*sse-sum(log(ones(nObs,1)-normcdf(cdfArg)))
It's probably my code nomenclature that is misleading. I am talking about partialSigma but actually I am taking the derivative w.r.t. sigma^2. Given this, the derivative of the second term cannot have an odd power of sigma in the denominator. Also the discrepancy enters into the beta components of the gradient vector.
Any other ideas?
Thanks,
Peter

Categories

Find more on Systems of Nonlinear Equations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!