Resolving discrepancy numerical vs analytical gradient

Question

Peter on 26 Nov 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/107593-resolving-discrepancy-numerical-vs-analytical-gradient

Edited: Matt J on 27 Nov 2013

Dear all,

I am trying to implement an analytical gradient for the likelihood function L given in the attached pdf. I give the gradients I have calculated there as well.

My MATLAB implementation of the negative log-likelihood function and gradients is given below and I use fminunc to solve it. I have run the optimization on two different datasets (attached) and get differences in the analytical and numeric gradients on the order of 1.8e-6 on one dataset (set ex = 2) but not on the other (set ex = 1). Also my parameter estimates do not match those obtained from running equivalent regressions in STATA with differences around 5% of the parameter value on average.

I am assuming I have some error in the analytical gradient implementation of the beta-components of the gradient vector. If anyone can help me out here and spot the error I would be very grateful. I will of course provide additional information as needed. The function and script needed to reproduce the behavior are below.

Thanks very much,

Peter

Here is the implementation of the log likelihood function and gradients and the data files are attached.

function [negLogLik grad] = truncReg_cost(Z,theta,params,c)
sigma = sqrt(params(1)); %standard dev of trunc normal
beta = params(2:end); %parameter vector
nObs = size(Z,1); % number of Obs'ns n
c = ones(nObs,1)*c; % vectorize truncation point
err = theta-Z*beta;
sse = err'*err;
cdfArg = (c-Z*beta)/sigma;
quotDens = normpdf(cdfArg,0,1)./(ones(nObs,1)-normcdf(cdfArg,0,1));
negLogLik = nObs*log(sigma*sqrt(2*pi))+0.5*(1/sigma^2)*sse+...
    sum(log(ones(nObs,1)-normcdf(cdfArg,0,1)));
partialSigma = 0.5*((nObs/sigma^2)-(1/sigma^4)*sse+...
    (1/sigma^3)*(cdfArg*sigma)'*quotDens);
partialBeta = -(1/sigma^2)*Z'*err+...
    (1/sigma)*Z'*quotDens;
grad = [partialSigma;partialBeta];
end

To reproduce the issue you can run this script

% load example data 
ex = 1;
if ex == 1
    data = csvread('trExample.csv');
else
    data = csvread('trExample2.csv');
end
if ex == 1
  c = 0;
  truncDir = 'l';
  indVar = [ones(size(data,1),1) data(:,3:end)];
  depVar = data(:,2);
else
       c = 40;
       truncDir = 'l';
  indVar = [ones(size(data,1),1) data(:,2:end)];
  depVar = data(:,1);
end
% truncate i.e. throw away all obs'ns where truncation violated
if truncDir == 'l'
    indVar = indVar(depVar>c,:);
    depVar = depVar(depVar>c,:);
else
    indVar = indVar(depVar<c,:);
    depVar = depVar(depVar<c,:);
end
% initial param estimates from OLS and std of depvar
betaInit = pinv(indVar'*indVar)*indVar'*depVar;
sigmaInit = var(depVar);
params = [sigmaInit; betaInit];
costfunc = @(params) truncReg_cost(indVar,depVar,params,c);
options = optimset('GradObj','on','Algorithm','interior-point',    ...
                                        'Display','on','DerivativeCheck','on');
[outParams,LL,ef,out,grad] = fminunc(costfunc,params,options);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Peter on 27 Nov 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/107593-resolving-discrepancy-numerical-vs-analytical-gradient#answer_116487

Hi all,

another thought. I just modified the code to use Mark Schmidt's minFunc instead of fminunc. Not only is it 8-9 times faster than fminunc, but it also gives me identical estimates for the regressors as I obtained in STATA. minFunc does not carry out a gradient check so the faulty gradient implementation may still be an issue but I am starting to think that the problem may have been the solver used and not the code.

Best,

Peter

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Matt J on 26 Nov 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/107593-resolving-discrepancy-numerical-vs-analytical-gradient#answer_116409

Open in MATLAB Online

To me, partialSigma looks like it should be

partialSigma = nObs/sigma - sse/sigma^3 -quotDens*(c-Z*beta)/sigma^2

2 Comments
Show NoneHide None

Matt J on 26 Nov 2013

You said there was supposed to be a pdf attached? I don't see it.

Matt J on 27 Nov 2013

Edited: Matt J on 27 Nov 2013

Open in MATLAB Online

One thing you could do is take the solution STATA, or minFunc, gave you and compute the value of your analytical gradient there. It should be very close to zero if the solution is genuine and your gradient calculation is good.

Also, you could feed the fminunc solution as an initializer to the other solvers (and vice versa) to see if the solvers quit in zero iterations. This would be a way of checking whether you have multiple solutions. You should also be comparing the value of the objective function that the different solvers terminate at to see which reaches the lowest value.

Finally, even after you clarified that you are minimizing w.r.t. sigma^2, I do not see how you obtain the term

(1/sigma^3)*(cdfArg*sigma)'*quotDens

or equivalently

(1/sigma^2)*(c-Z*beta)'*quotDens

The derivative of cdfArg w.r.t. sigma^2 is

-0.5(c-Z*beta)*sigma^(-3)

Sign in to comment.

Answer 3

Peter on 26 Nov 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/107593-resolving-discrepancy-numerical-vs-analytical-gradient#answer_116416

Open in MATLAB Online

Hi Matt,

thanks for the rapid answer. There is certainly the possibility that I have got a bug in the algebra and not the code. However I don't think that the result you suggest for partialSigma is the solution. I'll try to write the log-likelihood in terms of my code to make things more transparent:

LL = -n*log(sigma*sqrt(2*pi))-1/(2*sigma^2)*sse-sum(log(ones(nObs,1)-normcdf(cdfArg)))

It's probably my code nomenclature that is misleading. I am talking about partialSigma but actually I am taking the derivative w.r.t. sigma^2. Given this, the derivative of the second term cannot have an odd power of sigma in the denominator. Also the discrepancy enters into the beta components of the gradient vector.

Resolving discrepancy numerical vs analytical gradient

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (2)

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Resolving discrepancy numerical vs analytical gradient

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (2)

2 Comments Show NoneHide None

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments