This is (hopefully) a simple reduction variable question for performing parallel GPU operations onto a single value. I have read the tutorial on stencil processing and frankly do not understand why this does not work.
A simple example is below (not intended to be actually used, stand in for more complicated operations). Here, I am taking some vector array, using a gpu arrayfun to get the difference between neighbors, and then trying to sum those differences to a single variable. Since the difference operation is order independent, and the result is summed onto a single variable, I figured a comination of arrayfun + a reduction variable using nested functions would be the best way to start.
function v = reductionVariableLoopTest()
x = gpuArray.rand(100,1);
v = gpuArray.zeros(1);
function d = difFun(ind)
d = x(ind+1) - x(ind);
v = v + difFun(ind);
vect = gpuArray.colon(1,length(x)-1);
However, this gives the error: Assignment of parent function variable(s): 'v' by 'sumFun' is not allowed.
Now, I know I could get around this by simply using
y = arrayfun(difFun,vect);
v = sum(y);
but this misses the whole point of using a reduction variable. The order independent on-gpu difFun should be extremely fast, and the use of the shared variable v should be both fast and memory friendly.