|
Hi there,
I have the response y, and the covariate x that is suffering from censoring. Knowing the distribution of each of them, I want to build the likelihood and fit the regression line .
y=b0+b1*x.
The likelihood will be of the form
Π (f(y|x)*f(x))^(1-d(i)) * (int(f(y|x)f(x)),x(i),Inf)^(d(i)),
where d is the status indicator.
The log likelihood would be the sum of the logarithms.
The code below is running correctly. My problem is speed especially in the function "censreg" at the line where I say: "L=@(g) -sum(((cellfun(@(f) f(g), cont))));". It takes a lot of time.
For each datum inside the for loop i create its contribution (cont(i)) as a cell (according to the status). Then I sum all the elements of the cell array in order to get the log likelihood, and afterwards I convert the cell array to a function handle, in order to pass to fminsearch and maximize it.
I think that everything would be faster if I could just sum up function handles and not needing to work with cells at all.
Can you tell me anything that would speed things up? The time that the for loop need is fine. The problem is in the next line. Maybe building the likelihood with a different way? Can it be vectorized?I think that the integral is not allowing this.
Thanx in advance for any answers.
n=200;
x=exprnd(3,n,1);
c=exprnd(3,n,1);
status=(x>c);
xcen=min(x,c);
s=1;
e=normrnd(0,s,n,1);
y=5-0.2.*x+e;
% I want to estimate the regression line y=b0+b1*x via maximum likelihood
% knowing that the response is normal and the covariate
% exponentially distributed:
% i.e. I want b0, b1, and going to estimate s(=std), m (of exponential) also
pars=censreg(y,xcen,status)
function pars=censreg(y,xcen,status)
%estimates intercept, slope, parameter (mean) of exponential, std
%when y~Normal, x~exponential and censored.
sizex=size(xcen); sizey=size(y); sizes=size(status);
if sizex(1)==1; xcen=xcen';end
if sizey(1)==1; y=y';end
if sizes(1)==1; status=status';end
%----start to find initial values:
xcc=xcen(status==0);
ycc=y(status==0);
ncc=length(xcc);
Xcc=[ones(ncc,1) xcc];
bcc=inv(Xcc'*Xcc)*Xcc'*ycc;
s=(ycc-Xcc*bcc)'*(ycc-Xcc*bcc)/(ncc-length(bcc));
m=expfit(xcen,0.05,status);
init=[bcc(1) bcc(2) s m];
%----- initial values found
%--Start building the likelihood:
for i=1:length(status)
if status(i)==0
cont(i)={@(g) log(1./(g(3).*sqrt(2*pi)).*exp(-(y(i)-(g(1)+g(2).*xcen(i))).^2./(2.*g(3).^2)).*1./g(4).*exp(-xcen(i)./(g(4))) )};
elseif status(i)==1
cont(i)={@(g) log(quadgk(@(x) (1./(g(3).*sqrt(2*pi)).*exp(-(y(i)-(g(1)+g(2).*x)).^2./(2.*g(3).^2)).*1./g(4).*exp(-x./g(4)) ),xcen(i),Inf))};
end
end
L=@(g) -sum(((cellfun(@(f) f(g), cont))));
pars=fminsearch(L,[init]);
end
|