GPU: Iteratively adding 2D gaussians to a 2D array
2 views (last 30 days)
Show older comments
Hi everyone.
First I'd like to thank you for taking the time to consider my problem.
I am currently building a novel microscope for imaging samples (cells, nano-particles, graphene, etc.) at sub-diffraction limited resolution. I won't go into too many details here so this is what you need to know. I have three columns of time dependent information:
1) Px(t) = A pointer to the x-coordinate of the image (ranging from 1 to 2048) [float]
2) Py(t) = A pointer to the y-coordinate of the image (ranging from 1 to 2048) [float]
3) I(t) = The measured intensity [float]
The objective here it to produce a 2D gaussian centred around (Px,Py) with intensity I for each point in time. Here is what I have written so far:
% Two pixel arrays and one intensity array
Px_t = [x1, x2, ..., xn];
Py_t = [y1, y2, ..., yn];
I_t = [I1, I2, ..., In];
% Define the span of the gaussian in x & y
x = linspace(1,2048,2048);
y = linspace(1,2048,2048);
% Let stdev = 1 for simplicity
sgma = 1;
% Initialize the image space
image = zeros(2048);
% Loop through time data
for t = 1:n
x0 = Px_t(t);
y0 = Py_t(t);
I = I_t(t)
% Generate new gaussian spread around x0 & y0
gx = exp(-(x-x0).^2/(2*sgma^2));
gy = exp(-(y-y0).^2/(2*sgma^2));
gxy = gx'*gy;
% Add the intensity scaled gaussian to the image array
image = image + I*gxy;
end
My question is if there is any way to perform this operation in parallel on the GPU? It is impossible to store n 2048x2048 arrays in memory and add them up after so I've resorted to this iterative approach.
All the best,
- Luke
0 Comments
Accepted Answer
Joss Knight
on 2 Dec 2015
You are already getting a lot of data parallelism out of what you have - you are performing multiple element-wise operations on 4 million elements at once. Just stick your input data x, y and image in a gpuArray and see what happens.
Two improvements come to mind: You might get a speedup on the GPU if you use arrayfun to package the contents of your loop into a single kernel:
function im = addGaussian(im, xin, yin, x0in, y0in, intensity)
gx = exp(-(xin-x0in).^2/(2*sgma^2));
gy = exp(-(yin-y0in).^2/(2*sgma^2));
gxy = gx*gy;
im = im + intensity*gxy;
end
[x,y] = ndgrid(gpuArray.linspace(1,2048,2048));
image = gpuArray.zeros(2048);
for t = 1:n
image = arrayfun(@addGaussian, image, x, y, Px_t(t), Py_t(t), I_t(t));
end
Secondly, can you generate one Gaussian image that is twice as big in each dimension and then index into it inside the loop to crop the portion you need centered on the pixel of interest? That would save a lot of repeated computation of the same values.
2 Comments
Joss Knight
on 3 Dec 2015
Edited: Joss Knight
on 3 Dec 2015
When you say "the rate limiting step was the cumulative addition of the image matrix" - in my suggestion I moved the addition inside the arrayfun function, so you wouldn't then be able to distinguish the cost of creating the gaussians from the cost of the addition. There's no particular reason why the addition would be any slower than the gaussian creation steps when run in parallel, except that it has to write to global memory.
However, the second option does seem more efficient.
More Answers (0)
See Also
Categories
Find more on 3-D Volumetric Image Processing in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!