I'm trying to use accumarray on large gpuArrays, but get the error 'radix_sort: failed to get memory buffer'.
This is a minimal example that gives me the error:
a = randi(intmax, 2^28-2048, 1, 'gpuArray');
b = gpuArray(randi(3, 2^28-2048, 3, 'uint16'));
c = accumarray(b,a);
When I do the same with arrays of size [2^28-2047 1] and [2^28-2047 3] it works.
This is my gpuDevice after creating a and b:
CUDADevice with properties:
Name: 'GeForce GTX 1080 Ti'
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
Shouldn't this be enough memory for this kind of operation?
I'm running version 126.96.36.1994444 (R2018b) on Linux.
I can work around this problem but I'd like to understand it so I can adapt my code accordingly.