How to compile a MEX file containing CUDA code, without using "mexcuda"

7 views (last 30 days)
I'm a 2013b user and I'm trying to compile my .cu file containing CUDA code and MEX gateway function. The code takes an mxArray as input and outputs another mxArray, after internally moving data to and from GPU for computation. I'm aware of the 2015a built-in function "mexcuda", which purports to do exactly what I want. Is anyone aware of another method of achieving this?
A secondary question: I've considered using both Nvidia syntax (i.e. "cudaMalloc()") and Matlab syntax (i.e. "mxGPUCreateGPUArray()"). Can anyone let me know if one is preferable, and why? I've never seen an example of the Nvidia syntax in a MEX file.
Thanks, Elliot

Answers (1)

Joss Knight
Joss Knight on 9 Sep 2015
Edited: Joss Knight on 9 Sep 2015
mexcuda was released in R2015b. Before this compiling GPU mex code was perfectly possible, but you needed to follow the process explained in the documentation for earlier versions. This was the process in R2013b: http://www.mathworks.com/help/releases/R2013b/distcomp/run-mex-functions-containing-cuda-code.html.
Note that the process changed slightly in R2015a, and then moved over to use of mexcuda in R2015b, so do read the doc for your version.
There's no reason for you to return data as CPU arrays. Assuming you have Parallel Computing Toolbox, you pass data in and out as gpuArrays, so you don't need to continually copy on and off the device. Follow the documentation for how to do this (or read this blog article).
mxGPUCreateGPUArray creates a mxGPUArray object, which wraps a CUDA memory allocation. This can be queried for its dimensions, class, and other properties as described in the mxGPU API which mirrors the mex API. Or you can get its raw device pointer. It can also be easily converted into a mxArray in order to be passed back to MATLAB as a gpuArray object. In particular, if you do not create a mxGPUArray this way and instead choose to do a raw cudaMalloc, you cannot then pass the device data back to MATLAB without first copying it.
  2 Comments
Elliot Gray
Elliot Gray on 10 Sep 2015
Thanks for your response. I was able to compile some simple code. I would like some clarification on two points, however:
1) the output array from my function needs to be saved on the hard drive. Does it not need to be retrieved from the GPU, either with the gather function, or by converting it into (and outputting it as) an mxArray within the mex file? Would it not be faster to do the latter?
2) I want to use the Nvidia syntax because it appears to give a wider range of options, for example, syntax to use shared memory instead of global memory. I don't see any mention of shared memory in the parallel processing toolbox documentation. Will it be possible for me to achieve this control without Nvidia syntax?
Joss Knight
Joss Knight on 12 Oct 2015
1) You can't write data directly from GPU memory onto disk. You'll have to get it into CPU memory first. It's difficult to recommend anything because I don't know your application. I personally would do the computation in MEX, return gpuArrays to MATLAB, and then do all the file IO in MATLAB code.
2) The mexGPU API serves a different purpose to the CUDA runtime. Its job is to allow you to read and write to gpuArrays, i.e. the objects that MATLAB understands. You only use it at the interface layer - to get data from MATLAB, and back to MATLAB at the end. If you want to launch kernels using shared memory, create multiple streams, or do other funky stuff you can't do from the MATLAB command line, you're going to need to use the CUDA runtime API, driver API, or tools from another library.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!