Hello, i am testing a ptx compiled kernel in MATLAB, card gtx 560 ti, cuda architecture 2.1. As the cuda programming guide writes:
"The same on-chip memory is used for both L1 and shared memory: It can be configured as 48 KB of shared memory and 16 KB of L1 cache or as 16 KB of shared memory and 48 KB of L1 cache, using cudaFuncSetCacheConfig()/cuFuncSetCacheConfig()"
i cant find any information on whether i can do this configuration in MATLAB, or with some option to nvcc when compiling the ptx.
The reason why i want to do this, is that my kernel doesnot use any shared memory, and i want to try whether i get any speedup with a bigger L1 cache. (the whole kernel calculates and reads double precision data)
Thanks for any solution!
Gaszton