Hi Chuck37,
You're not doing it wrong and it (mostly) isn't the fault of your GPU either! Unfortunately the SUM in this code is performing inefficiently. If you transpose the problem and sum along the rows instead of down the columns you will get the same result but find the performance is significantly better. This is something we will look into. On my machine the timings are 0.42secs for the original and 0.046secs when transposed.
The GPU you are using is optimized for singleprecision calculation. If you can live with lower accuracy then you might try changing all the data to single, at least for this part of the calculation. You would get much faster computation and require half the storage space, both of which will help performance. Only the Tesla family of GPUs are designed explicitly for doubleprecision work.
Ben
"Chuck37" wrote in message <kn0v9u$pgh$1@newscl01ah.mathworks.com>...
> I have the parallel toolbox and wanted to get a sense of what could be done using a GPU. I have a GT 520 GPU, which I think is pretty weak for general processing, but I figured I'd give it a shot anyway.
>
> A typical problem for me might be to find the distance of a whole bunch of points from another point. I tried this:
>
> A = randn(3,1e6); % A million points in 3D
> B = repmat(randn(3,1),1,1e6); % The single point
>
> Ag = gpuArray(A);
> Bg = gpuArray(B);
>
> tic
> d = sqrt(sum((AB).^2));
> toc
>
> tic
> dg = sqrt(sum((AgBg).^2));
> toc
>
>
> The result is that the GPU way takes about 10x the time. Is it because I'm doing it wrong, or just because my GPU is not up to it?
