Edric M Ellis <eellis@mathworks.com> wrote in message <ytwvbx5kb9l.fsf@ukeellis0l.dhcp.mathworks.com>...
> "Christian " <proechri@umich.edu> writes:
>
> > I have compiled a mex file using Coder on a Windows machine. Running
> > this file on CUDA gives me an 'Undefined function' error. The mex
> > file itself doesn't contain any CUDA code and its inputs are no
> > gpuArrays. The only reason why I want this mex file to run on CUDA is
> > because another section of my code runs much faster on CUDA.
>
> Can you post a simple example of something you tried that doesn't work?
>
> Cheers,
>
> Edric.
Hi Edric,
I've figured out the problem: i was trying to run a windows mex file on a linux machine. I'm using a windows cuda now and the mex file works. However, it's much slower than on a standard windows machine. That's a bummer because cuda speeds up other parts of my algorithm. Here's a small version of my file that I mexed using Coder on a standard windows machine:
%%%%%%%%%%%%
% some arbitrary inputs for the function
x=linspace(5,5,100);
yp=randn(100,3,3);
w=randn(100,3);
TMz=randn(3,3);
% function itself
function wn=myfunction(x,yp,w,TMz) %#codegen
n1=size(yp,1);
n2=size(yp,2);
wn = zeros(n1,n2);
xind=NaN(n1,n2,n2);
for j=1:n2
for jp=1:n2
for i=1:n1
[~, xind(i,j,jp)] = min(abs(yp(i,j,jp)x));
end
end
end
xval = x(xind);
etadown = (ypxval)./(x(max(xind1,1))xval);
etaup = (ypxval)./(x(min(xind+1,n1))xval);
for j=1:n2
for jp=1:n2
z=zeros(1,n1);
for i = 1:n1
if xind(i,j,jp)==1 && yp(i,j,jp)<xval(i,j,jp)
z(1) = z(1) + w(i,j);
elseif xind(i,j,jp)==n1 && yp(i,j,jp)>xval(i,j,jp)
z(n1)=z(n1)+w(i,j);
else
if xval(i,j,jp)>yp(i,j,jp)
z(xind(i,j,jp)) = z(xind(i,j,jp)) + w(i,j)*(1etadown(i,j,jp));
z(xind(i,j,jp)1) = z(xind(i,j,jp)1) + w(i,j)*etadown(i,j,jp);
else
z(xind(i,j,jp)) = z(xind(i,j,jp)) + w(i,j)*(1etaup(i,j,jp));
z(xind(i,j,jp)+1) = z(xind(i,j,jp)+1)+w(i,j)*etaup(i,j,jp);
end
end
end
wn(:,jp)=wn(:,jp)+z'*TMz(j,jp);
end
end
%%%%%%%%%
So there's a lot of indexing and if statements. I've also tried to vectorize the code, but that seemed infeasible. What is the best way to deal with the code on Cuda? Using arrayfun? Setting up a mex file on cuda (I've never done that; I know that the Coder toolbox is not available on my Cuda machine)?
Thanks for your help,
Christian
