Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

Subject: Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

From: Barry Shaw

Date: 13 Sep, 2011 19:55:29

Message: 1 of 6

Hi All

I was wondering could you help me out please? I think there is probably a simple solution to my question.

I have a simple data set (two binary response (dependent) variables and 3 continuous exploratory variables) . I want to fit the bivariate probit model to the data (using fminsearch). Below is my code. I was wondering if someone could point out what I have done wrong, or what I am misunderstanding?

function LL = bivarprobit(guess)
load SchoolData.txt
data = SchoolData
x=[data(:,3) data(:,4) data(:,5)] %my exploratory variables
y=[data(:,1) data(:,2)] %my binary response variables

probit_y1 = guess(1)+guess(2)*x(:,1)+ guess(3)*x(:,2)+guess(4)*x(:,3)
probit_y2 = guess(5)+guess(6)*x(:,1)+ guess(7)*x(:,2)+guess(8)*x(:,3)
probit_y1y2 =[probit1_y1 probit2_y2]
mu =[guess(9) guess(10)] %i think mu should =[0 0] (as standard normal)

Covar = [1 guess(11); guess(11) 1] %my covariance matrix
prob_of_y1 = normcdf(probit_y1) %Call CDF of y1
prob_of_y2 = normcdf(probit_y2) %Call CDF of y2
prob_of_y1y2 = mvncdf(probit_y1y2, mu, Covar) %Call CDF of y1y2

LL = -sum(y(1).*y(2).*log(prob_of_y1y2)
       + y(1).*(1-y(2)).*log(prob_of_y1 - prob_of_y1y2)
       + (1-y(1)).*y(2).*log(prob_of_y2 - prob_of_y1y2)
       + (1-y(1)).*(1-y(2)).*log(1-prob_of_y1-prob_of_y2-prob_of_y1y2))

My fminsearch is not converging at all.


For reference, I have successfully fitted a univariate probit model to some data with 1 binary response variable (verified results using glmfit), and also a standard normal bivariate distribution to 2 variables where the data was initially generated from N(0,1,0.8). The codes for each parts are below:

function LL = fitunivarprobit(guess)
load Lee_Cancer_Data.txt
data = Lee_Cancer_Data
x=[data(:,5)]
y=[data(:,1)]
probit = guess(1)+guess(2)*x(:,1)
prob_of_y = normcdf(probit)
LL = -sum(y.*log(prob_of_y)+(1-y).*log(1-prob_of_y))


function LL = fitbivarnorm(guess)
load Simdata_bivarnormal_p08_10000samples.txt
data = Simdata_bivarnormal_p08_10000samples;
y=data;
mu=guess(1)
covar=[1 guess(2);guess(2) 1]
prob = mvnpdf(data,guess(1),covar);
LL = -sum(log(prob))


Any help would be appreciated.
Regards,
Barry

Subject: Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

From: Tom Lane

Date: 13 Sep, 2011 21:11:03

Message: 2 of 6

> y=[data(:,1) data(:,2)] %my binary response variables
...
> LL = -sum(y(1).*y(2).*log(prob_of_y1y2) +
> y(1).*(1-y(2)).*log(prob_of_y1 - prob_of_y1y2) +
> (1-y(1)).*y(2).*log(prob_of_y2 - prob_of_y1y2) +
> (1-y(1)).*(1-y(2)).*log(1-prob_of_y1-prob_of_y2-prob_of_y1y2))
>
> My fminsearch is not converging at all.

Barry, it looks to me like you intended to use y(:,1) rather than y(1).

I haven't tried running your code, but I hope this gets you pointed in the
right direction.

-- Tom

Subject: Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

From: Barry Shaw

Date: 14 Sep, 2011 09:30:31

Message: 3 of 6

"Tom Lane" <tlane@mathworks.nospam.com> wrote in message <j4ogt7$3lg$1@newscl01ah.mathworks.com>...
> > y=[data(:,1) data(:,2)] %my binary response variables
> ...
> > LL = -sum(y(1).*y(2).*log(prob_of_y1y2) +
> > y(1).*(1-y(2)).*log(prob_of_y1 - prob_of_y1y2) +
> > (1-y(1)).*y(2).*log(prob_of_y2 - prob_of_y1y2) +
> > (1-y(1)).*(1-y(2)).*log(1-prob_of_y1-prob_of_y2-prob_of_y1y2))
> >
> > My fminsearch is not converging at all.
>
> Barry, it looks to me like you intended to use y(:,1) rather than y(1).
>
> I haven't tried running your code, but I hope this gets you pointed in the
> right direction.
>
> -- Tom

Thanks for your reply Tom. You are correct - that was a stupid mistake on my part. I've corrected, however, there is still something wrong with my code and I can't see/understand what. I'm getting complex number likelihoods (e.g. LL = 1.7548e+002 -6.2832e+000i).

If it helps, the data I'm trying to fit the model to is here:
http://shazam.econ.ubc.ca/intro/school.htm
I'm assuming column 1 = YESVM, column 9 = PRIV (my 2 response variables, PRIV = y(:,1) and YESVM = y(:,2)) and columns 5, 7 and 8 are YEARS, LOGINC and PTCON (my 3 explatorary variables).

The "answers" I'm trying to work towards are on page 285 here:
http://www.stata-journal.com/sjpdf.html?articlenum=st0045.

My 2 lines I enter into my command window are
guess=[-4 -0.01 -0.1 0.37 -0.5 -0.016 -1.29 0.99 -.3];
fminsearch('bivarprobit',guess)


Note, I have changed my function file slightly and just set mu = [0 0], as I think this should be the case, as the bivariate probit uses the inverse of the bivariate STANDARD normal distribution. so my updated code is:
function LL = bivarprobit(guess)
load SchoolData.txt
data = SchoolData
x=[data(:,3) data(:,4) data(:,5)] %my exploratory variables
y=[data(:,1) data(:,2)] %my binary response variables
 
probit_y1 = guess(1)+guess(2)*x(:,1)+ guess(3)*x(:,2)+guess(4)*x(:,3)
probit_y2 = guess(5)+guess(6)*x(:,1)+ guess(7)*x(:,2)+guess(8)*x(:,3)
probit_y1y2 =[probit1_y1 probit2_y2]
mu =[0 0] %i think mu should =[0 0] (as standard normal)
 
Covar = [1 guess(9); guess(9) 1] %my covariance matrix
 prob_of_y1 = normcdf(probit_y1) %Call CDF of y1
 prob_of_y2 = normcdf(probit_y2) %Call CDF of y2
 prob_of_y1y2 = mvncdf(probit_y1y2, mu, Covar) %Call CDF of y1y2
 
LL = -sum(y(:,1).*y(:,2).*log(prob_of_y1y2)
       + y(:,1).*(1-y(:,2)).*log(prob_of_y1 - prob_of_y1y2)
       + (1-y(:,1)).*y(:,2).*log(prob_of_y2 - prob_of_y1y2)
       + (1-y(:,1)).*(1-y(:,2)).*log(1-prob_of_y1-prob_of_y2-prob_of_y1y2))
 
If you, or anyone, have a few spare minutes and can help that would be great.
Thanks
Barry

Subject: Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

From: Aaron

Date: 14 Sep, 2011 22:04:29

Message: 4 of 6

"Tom Lane" <tlane@mathworks.nospam.com> wrote in message <j4ogt7$3lg$1@newscl01ah.mathworks.com>...
> > y=[data(:,1) data(:,2)] %my binary response variables
> ...
> > LL = -sum(y(1).*y(2).*log(prob_of_y1y2) +
> > y(1).*(1-y(2)).*log(prob_of_y1 - prob_of_y1y2) +
> > (1-y(1)).*y(2).*log(prob_of_y2 - prob_of_y1y2) +
> > (1-y(1)).*(1-y(2)).*log(1-prob_of_y1-prob_of_y2-prob_of_y1y2))
> >
> > My fminsearch is not converging at all.
>
> Barry, it looks to me like you intended to use y(:,1) rather than y(1).
>
> I haven't tried running your code, but I hope this gets you pointed in the
> right direction.
>
> -- Tom


There may be an issue with your probabilities. ie somehow your function is telling matlab to take the log of a negative number yielding imaginary numbers...

Subject: Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

From: Tom Lane

Date: 15 Sep, 2011 21:50:44

Message: 5 of 6

Barry, I don't have your data. I just typed in some numbers to demonstrate
what might be going wrong. Try this:

probit_y1 = 1;
probit_y2 = 2;
probit_y1y2 = [probit_y1 probit_y2]; % not [probit1_y1 probit2_y2];
mu = [0 0];
Covar = eye(2);

prob_of_y1 = normcdf(probit_y1) %Call CDF of y1
prob_of_y2 = normcdf(probit_y2) %Call CDF of y2
prob_of_y1y2 = mvncdf(probit_y1y2, mu, Covar) %Call CDF of y1y2

1-prob_of_y1-prob_of_y2-prob_of_y1y2 % you take the log of this

-- Tom

Subject: Univariate probit ok, bivariate normal distribution ok, Need help with bivariate probit!

From: Barry Shaw

Date: 16 Sep, 2011 11:24:11

Message: 6 of 6

"Tom Lane" <tlane@mathworks.nospam.com> wrote in message <j4trvk$i0$1@newscl01ah.mathworks.com>...
> Barry, I don't have your data. I just typed in some numbers to demonstrate
> what might be going wrong. Try this:
>
> probit_y1 = 1;
> probit_y2 = 2;
> probit_y1y2 = [probit_y1 probit_y2]; % not [probit1_y1 probit2_y2];
> mu = [0 0];
> Covar = eye(2);
>
> prob_of_y1 = normcdf(probit_y1) %Call CDF of y1
> prob_of_y2 = normcdf(probit_y2) %Call CDF of y2
> prob_of_y1y2 = mvncdf(probit_y1y2, mu, Covar) %Call CDF of y1y2
>
> 1-prob_of_y1-prob_of_y2-prob_of_y1y2 % you take the log of this
>
> -- Tom


Aaron and Tom - thank you for your replies.
I think I've got to the bottom of my problem - at least I'm now able to replicate a few different set of results that I've found online. In case any one else has the same problem and stumbles across this thread, I post below my solution (I think the inital code above was fairly close to being correct, but below is bit of a different approach).

function LL = bivarprobit_v5(guess)

load autodata.txt
data = autodata;
x=[data(:,3) data(:,4)];
y=[data(:,1) data(:,2)];

[r c] = size(y);
for i = 1 : r
    for j = 1 : c
            q(i,j) = 2*y(i,j)-1;
    end
end

for i = 1 : r
    z(i,1) = guess(1)+guess(2)*x(i,1)+guess(3)*x(i,2); %probits (probit 1)
    z(i,2) = guess(4)+guess(5)*x(i,1)+guess(6)*x(i,2); %probits (probit 2)
end

for i = 1 : r
    for j = 1 : c
            w(i,j) = q(i,j) * z(i,j);
    end
end

for i = 1 : r
    rho(i) = q(i,1) * q(i,2) *guess(7); %correlations
end

%Prob(Y1=yi1 and Y2=yi2 | x1 and x2) = cdf_2(wi1,wi2,rhoi)
for i = 1 : r
    Prob(i) = mvncdf([w(i,1) w(i,2)],[0 0],[1 rho(i);rho(i) 1]);
end

LL = -sum(log(Prob))

The above is replicating the results for the 1978 auto data from stata. Note, the inital guestimates of the parameters have to be pretty close to the "answers" to get convergance. Else the parameters "shoot off" to more extreme values and the convergence fails. I think this was part of my inital problem, as pointed out by Aaron and Tom.

Barry

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us