You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Why OpenMP does not work in a CUDA code compiled in Matlab (as a MEX)?
13 views (last 30 days)
Show older comments
Moein Mozaffarzadeh
on 20 Sep 2021
Hi, I'm trying to use OpenMP in my CUDA code, compile and use it in Matlab. However, it does not work. Here is a simple example:
#include <cuda_runtime.h>
#include "device_launch_parameters.h"
#include <stdio.h>
#include "cuda.h"
#include "mex.h"
#include "omp.h"
void mexFunction(int nlhs, mxArray* plhs[],
int nrhs, const mxArray* prhs[])
{
#pragma omp parallel
{
int ID = omp_get_thread_num();
printf("Hello(%d)", ID);
printf("World(%d) \n", ID);
}
}
I use
mexcuda OpenMPTest.cu
to compile this code in Matlab. When i run this in Matlab, all I get is "Hello(0)World(0) ". So, something is wrong. Maybe the probem is with the way that I compile?!!! Please help.
Moein.
20 Comments
Walter Roberson
on 20 Sep 2021
Moein Mozaffarzadeh
on 20 Sep 2021
Edited: Walter Roberson
on 20 Sep 2021
Hi Walter,
I have already seen that post. It does not provide any compiling solution.
could you please let me know how i can add liomp5 to my build line?
Searching through the web, i got this one so far, but no changes! The code gets compiled well, but ThreadID is always zero.
mexcuda C:\ProgramData\MATLAB\SupportPackages\R2021a\3P.instrset\mingw_w64.instrset\lib\gcc\x86_64-w64-mingw32\6.3.0\libgomp.a -v CFLAGS="$CFLAGS -fopenmp" -LDFLAGS="$LDFLAGS -fopenmp" Xcompiler="-fopenmp" OpenMPTest.cu
Walter Roberson
on 20 Sep 2021
-liomp5 is probably the flag needed; you might also need a -L flag to point to the directory the library is in.
The -l part is a flag, and iomp5 is its argument, but by convention the linker will automatically try prefixing it with l and lib to search for it. You can also give a specific name and extension
I did not realize that OpenMP could be used with CUDA, so this is not something I have experience with myself.
Joss Knight
on 20 Sep 2021
Sorry, I'm not an OpenMP expert. Are you sure you've specified that more than one thread should be created? Don't you have to compile with -fopenmp, which you might need to add to the CXXFLAGS variables when invoked using MEX?
Walter Roberson
on 20 Sep 2021
Ah, add the -fopenmp to the CXXFLAGs instead of the current CFLAGS location ?
Moein Mozaffarzadeh
on 20 Sep 2021
I added -fopenmp to the CXXFLAGs instead of the current CFLAGS, but still the threadID is zero.
I cannot find iomp5 anywhere. Any idea at what directory this could be found?
Moein Mozaffarzadeh
on 20 Sep 2021
I just tried the following .cpp code to see if the problem is with my OMP or the cuda compiler is the problem.
#include <stdio.h>
#include "mex.h"
#include "omp.h"
void mexFunction(int nlhs, mxArray* plhs[],
int nrhs, const mxArray* prhs[])
{
omp_set_num_threads(4);
#pragma omp parallel
{
printf("Hello(%d), ", omp_get_thread_num());
printf("World(%d) \n", omp_get_thread_num());
}
}
I used the following code
mex C:\ProgramData\MATLAB\SupportPackages\R2021a\3P.instrset\mingw_w64.instrset\lib\gcc\x86_64-w64-mingw32\6.3.0\libgomp.a -v ...
CXXFLAGS="$CXXFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" Xcompiler="-fopenmp" OpenMPTest.cpp
to compile. Unfortunately, it does not even compile. I get the following error:
... OpenMPTest.cpp:1:0: sorry, unimplemented: 64-bit mode not compiled in
Any idea what is the problem now?
Bruno Luong
on 20 Sep 2021
Edited: Bruno Luong
on 20 Sep 2021
cc = cc(loc);
if contains(cc.ShortName,'MSVC')
opmopt = {'COMPFLAGS="$COMPFLAGS /openmp"'};
elseif contains(cc.ShortName,'INTEL')
opmopt = {'COMPFLAGS="$COMPFLAGS /MD /Qopenmp"'};
elseif contains(cc.ShortName,{'gcc' 'mingw64'}) % not tested
opmopt = {'CFLAGS="$CFLAGS -fopenmp"'
'LDFLAGS="$LDFLAGS -fopenmp"'};
else
fprintf('Error: Not known compiler\n')
fprintf('You might want to edit %s if you know how to setup openmp options\n', me)
return
end
No idea how to compile (or compatible) with CUDA.
Note that the MSVS OpenMP has some limitation.
Moein Mozaffarzadeh
on 20 Sep 2021
I have just tried to compile outside of Matlab environment to see if the problem is with the compiler or Matlab. I used
g++ OpenMPTest.c -o OpenMPTest.exe -fopenmp
in the windows command bar, and compiled the following code.
#include <stdio.h>
//#include "mex.h"
#include "omp.h"
#include <unistd.h>
//void mexFunction(int nlhs, mxArray* plhs[],
// int nrhs, const mxArray* prhs[])
int main ()
{
#pragma omp parallel
{
printf("Hello(%d), ", omp_get_thread_num());
printf("World(%d) \n", omp_get_thread_num());
}
system("pause");
return 0;
}
It works well. So, there is nothing wrong with the compiler installation in my Laptop.
I changed the code back to the version the output will be a MEX file as follows:
#include <stdio.h>
#include "mex.h"
#include "omp.h"
#include <unistd.h>
void mexFunction(int nlhs, mxArray* plhs[],
int nrhs, const mxArray* prhs[])
//int main ()
{
#pragma omp parallel
{
printf("Hello(%d), ", omp_get_thread_num());
printf("World(%d) \n", omp_get_thread_num());
}
system("pause");
// return 0;
}
and now using the following to compile, following your suggestion:
mex -v CFLAGS="$CFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" OpenMPTest.c
the code compiles, but my Matlab crashes once i run the .mex file. Please let me know what is it that I'm missing.
Moein.
Bruno Luong
on 20 Sep 2021
I never use "printf" in mex, rather mexPrintf(), but such function are not suitable for parallel calling (notr reentrance).
You must test with something simple.
Joss Knight
on 21 Sep 2021
It certainly could be a problem that you're using printf, which MEX overrides with mexPrintf in order to output to the command window; thus you may be attempting to call into MATLAB from two different threads, which won't be supported.
Try using fprintf(stderr, ...) or just put #undef printf at the top of your file.
I'm guessing attempting a system call in a mexFunction may be inadvisable too, not sure though.
Joss Knight
on 21 Sep 2021
By the way if you were already adding -fopenmp to CFLAGS then that's fine, it won't make any difference adding it to CXXFLAGS. All that really matters is the compiler command you see in the verbose output ( -v ) . As long as the right compiler switches appear in that line (or those lines) you know the right thing is happening. MEX is basically just a MAKE system which helps you invoke your compiler with the right commands and with the correct environment variables set on the system. With a bit of clever interpretation you can reproduce its behaviour using system calls (but getting all the library and include paths right is hard).
Moein Mozaffarzadeh
on 21 Sep 2021
thank you @Joss Knight and @Bruno Luong for your help. you were rigth about printf. So, i changed the code as follows:
#include <stdio.h>
#include "mex.h"
#include "omp.h"
void mexFunction(int nlhs, mxArray* plhs[],
int nrhs, const mxArray* prhs[])
//int main ()
{
int* Output;
plhs[0] = mxCreateNumericMatrix(1, 6, mxINT32_CLASS, mxREAL);
Output = (int*)mxGetData(plhs[0]);
omp_set_num_threads(6);
#pragma omp parallel
{
int ID = omp_get_thread_num();
Output[ID] = ID+10;
}
}
So, as you have noticed. this is a C code. So, I compiled it with the following command in Matlab and it works fine:
mex CFLAGS="$CFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" OpenMPTest.c
Now, here is the same code, but in .cu format:
#include <cuda_runtime.h>
#include "device_launch_parameters.h"
#include <stdio.h>
#include "cuda.h"
#include "mex.h"
#include "omp.h"
void mexFunction(int nlhs, mxArray* plhs[],
int nrhs, const mxArray* prhs[])
{
int* Output;
plhs[0] = mxCreateNumericMatrix(1, 6, mxINT32_CLASS, mxREAL);
Output = (int*)mxGetData(plhs[0]);
omp_set_num_threads(6);
#pragma omp parallel
{
int ID = omp_get_thread_num();
Output[ID] = ID + 10;
}
}
Following your comments, I compiled this code with:
mexcuda -v CFLAGS="$CFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" OpenMPTest.cu
When i run this, the output should be "10 11 12 13 14 15" (or something like this in different orders), but I get 10 0 0 0 0 0. So, OpenMP is not working.
Following page 25 of this LINK (says: add "-Xcompiler -fopenmp" flag and -lgomp flag ), I see that I need to use (it mi)
mexcuda -v CFLAGS="$CFLAGS -Xcompiler -fopenmp" LDFLAGS="$LDFLAGS -Xcompiler -fopenmp" -lgomp OpenMPTest.cu
However, I get error for the -lgomp. Matlab cannot find it. I also looked into the Matlab/Mingw base folder, but could not something with such a name.
Could you please let me know what is it that I'm missing here? is the compile command wrong?
Moein.
Joss Knight
on 21 Sep 2021
I can't comment on exactly why nvcc might have trouble building host-side code with OpenMP syntax. You may be better off keeping your host-side code in a .c or .cpp file and putting your device-side code in a separate .cu file. That way, MEXCUDA will invoke gcc for the host code instead of nvcc.
Answers (1)
俊凯 王
on 24 Sep 2021
Edited: 俊凯 王
on 24 Sep 2021
til now i can't find the corroect compile cmd to create a cu project with omp activited
but there are some other ways can handle this trouble,like me i just use omp to create binary tree in cpu,and i want to make this tree root(struct in C) accessible to cu project in matlab
in fact,use intptr_t which cast pointer into int, matlab can accept this int parameter ,next step,we use this int as a input for a matlab cu input parameter
all in all,pointer can be casted into a int parameter for matlab ,and matlab transfer this parameter to cu project , all the variable in CPP/C can sited as a pointer ,this method maybe useful
.cpp(mex64) pointer->int matlab int -> .cu(project) int->pointer
See Also
Categories
Find more on Write C Functions Callable from MATLAB (MEX Files) in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)