Fast subarray access when using GPU matrices

3 views (last 30 days)
I need to optimize my GPU code and the slowest line of my code is adding multiple subarrays to one large matrix
for ii = 1:Npos
large_array(ROI{ii,:}) = large_array(ROI{ii,:}) + smaller_array(:,:,ii);
end
Npos is around ~500 and large_array ~2000x2000, smaller is ~256x256, ROI are continuous subregions of large_array
do you have any idea how to write it faster and remove the for-loop ?
The main issue is the huge overhead when Im calling subsref many times.

Answers (1)

Edric Ellis
Edric Ellis on 7 Apr 2015
Edited: Edric Ellis on 7 Apr 2015
I think the best way to proceed is to concoct a single indexing expression that you can use with smaller_array to result in a single update
large_array = large_array + smaller_array(idx);
Obviously, the trick is calculating idx. This depends on the layout of the "pages" of smaller_array. If the pages are in the correct order in a column-major sense, here's how you could come up with "idx" for the case where large_array is 4-by-4 and smaller_array is 2-by-2-by-4:
idx_0 = reshape(1:4, 2, 2); % [1, 3; 2, 4]
idx_1 = repmat(idx_0, 2, 2); % 2-by-2 grid of [1,3;2,4]
idx_2 = 2 * 2 * kron(idx_0, ones(2,2));
idx = idx_1 + (idx_2 - (2*2));
which gives
idx =
1 3 9 11
2 4 10 12
5 7 13 15
6 8 14 16

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!