Is there a good way to compute column-wise dot-products of two matrices of same size when stored on the GPU? Specifically, say A and B are n (rows) times m (columns) matrices, I seek to find the 1 (row) times m (columns) row-vector of dot-products of the columns of A and B (i.e. the diagonal elements of the dot-product matrix). Following minimum working example demonstrates the computations:
n = 10;
m = 100;
A = randn(n,m);
B = randn(n,m);
C = bsxfun(@dot,A,B);
However, if A and B are gpuArrays, then this is not possible:
C = bsxfun(@dot,gpuArray(A),gpuArray(B));
% Error using gpuArray/bsxfun
% Use of 'dot' is not supported
Similarly, using anonymous function '@(a,b) a'*b' instead of '@dot' does not work. In my application n is roughly 100 and m roughly 500000. Hope that someone has an idea how C can be computed directly on the GPU and without for loops, as this was very slow for the given problem size.
Ahmed on 5 Sep 2013
A possible solution (I hope it will be helpful to someone):
C = sum(gpuArray(A).*gpuArray(B));

