limiting broadcast variables in parfor loops

I have some 3D or 4D arrays that I want to run a block processing on. To do this I create a cell object that I can call that returns matricies of position values.
For example, ind_object{1} = [1 2 3; 1 2 3; 1 2 3], ind_object{2} = [1 2 3; 1 2 3; 2 3 4], (i.e. they can be overlapping blocks).
I then call this index object in a parfor loop to pull out blocks of my image (sz_block(n) is the size of the block, on this example [3,3,3])
parfor n=1:numel(index_object)
block_im = pad_im(index_object{n}(1,1:sz_block(1)),index_object{n}(2,1:sz_block(2)),index_object{n}(3,1:sz_block(3)),:);
out(n) = run_some_function(block_im);
end
And the @run_some_fucntion() is usually some sort of optimization problem, for example it could be using fmincon.
The problem with this is that my "pad_im" array ends up being a broadcast variable and it's rather large 512x512x300 so the memory swapping really slows down the calculation.
One solution to this would be to break the "pad_im" array into smaller arrays
pad_im1 = pad_im(:,:,1:149);
pad_im2 = pad_im(:,:,150:end);
then run this parfor loop twice, once on the pad_im1 and another time on the pad_im2, then I could do something smart to stitch where the break was made. However, I don't think this is a very good solution and there has to be a more eligant/inteligent way to do this; something like a que?
Or does anyone know how other programs (like C/C++ or python) do n-dimensional block processing?
Thanks for the help!!

2 Comments

Have you considered using distributed arrays for this?
Alternatively, what function are you trying to run on those image blocks? Check out functions in the Image Processing Toolbox, there could be one that already does what you want and has built-in parallelization.
It's a pretty specific use case. I have a loss function that I'm trying to minimize on the different blocks. These loss functions can change and sometimes I need to maintain the spatial information which is why I keep it as a 3D array instead of vectorizing the blocks. Also, I'm sometimes mapping to different dimensions. For example, sometimes I go from R^4 --> R othertimes I go from R^6 --> R^5. It really depends what data I want to look at.

Sign in to comment.

 Accepted Answer

Matt J
Matt J on 24 Jan 2023
Edited: Matt J on 24 Jan 2023
The block decomposition that you are doing needs more explanation. If the blocks are just fixed-sized, non-overlapping tiles of the original array, then it would be a simple matter to just reshape pad_im into pages consisting of the blocks. Then it can be passed to the parfor loop as a sliced variable. A simple way to do that reshaping is with this FEX download,
sz=[3,3,3]; %block size
N=blkNumel(pad_im,sz);
Pages=blkReshape(pad_im,sz,[1,1,1,N]);
parfor n=1:N
block_im=Pages(:,:,:,n);
out(n) = run_some_function(block_im);
end

4 Comments

Matt J
Matt J on 24 Jan 2023
Edited: Matt J on 24 Jan 2023
BTW, I hope that run_some_function() is not some trivial operation like the mean or sum over the block. If it is, you can do that very efficiently (and without parfor), using sepblockfun(), to be downloaded from,
Hey @Matt J,
Thanks for these answers! Both of those file exchange functions seem very useful and I think I'll download them for future use!
I should have updated the question with more information but just to reiterate;
My run_function() is an optimization, usually using fmincon, so it's not seperable like max,min, fft, ect. I also want to be able to make my blocks overlap (I think the DSP world uses the terminology stride; sometimes I want to have a stride as little as 1).
I think that @Alvaro has the right idea using a distributed array but I'll need to look into it more. I got a recommendation that if I was doing this in C++ I could define the pad_im as a read-only global variable and use the MPI library to run the parallel loop, and I think that the distributed array is doing something similar to this.
Alvaro
Alvaro on 24 Jan 2023
Edited: Alvaro on 24 Jan 2023
I couldn't find a straightforward way to create overlapping blocks using codistributed arrays though you might be able to split your image first and then build the overlapping blocks as a codistributed array.
I think the codistributed array is the method I'll have to use. It's basically the same consept as to what I said in the original question but might make the code a little nicer/more readable. I'll post some file-exchange function if I ever get around to implementing it.
Thanks for all the help @Matt J @Alvaro

Sign in to comment.

More Answers (0)

Categories

Asked:

on 9 Jan 2023

Commented:

on 30 Jan 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!