Sudden drop in speed for large matrices. RAM full -> Prevention?

Question

2 votes

Dear all,

I am doing image analyses with a toolbox called "PIVlab". At work, we recently bought a new camera with 25 MP resolution (previously only 2 MP). That is very nice, but it also means that the sizes of several matrices increase substantially, and now suddenly, memory managment seems to become a very important task to keep the processing speed high. I identified the operation that seems to slow the code down with very large matrices, and wrote a minimal code example to reproduce the behaviour:

clear all %#ok<CLALL>
close all
clc
% Usually, in PIV, we have input image A and input image B which are
% captured with a short pause in between. These images are then cut into
% small pieces of e.g. 64x64 pixels. Each of these "interrogation areas"
% in image A is then cross-correlated with the same part from image B.
% With the cross-correlation code in PIVlab, it is possible to to this for
% a 3D matrix (saves a lot of time, but apparently runs into RAM problems 
% when matrices are large).
counter=0;
stack_sizes=1000:10000:200000 ;%these numbers are fine to demonstrate the effect on a 16 GB RAM laptop	
for size_of_the_stack=stack_sizes 
	%% generate some quick + dirty "particle" image pairs, arranged in a 3D matrix
	A=rand(64,64,size_of_the_stack);
	A(A<0.98)=0;
	A = imgaussfilt(A,0.9);
	B = circshift(A,round(rand([1,1])*10)); %displace the second image
	%% Quick fix: converting to single already saves 50% of memory, apparently without negative effects.
	A=single(A);
	B=single(B);
	%% do the cross-correlation with FFT
	result_conv = zeros(size(A),'single'); %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	tic
    % The following line of code eats up all the RAM
	result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2); %cross-correlation code in PIVlab
	counter=counter+1
	calc_time(counter)=toc;
	time_per_subimage(counter)=calc_time(counter)/size_of_the_stack;
end
%% plot results
bar(stack_sizes,time_per_subimage*1000)
grid on;
xlabel('stack size')
ylabel('time per correlation in ms')

These are my results. The speed suddenly decreases.

A quick look at the task manager already shows that speed decreases in the moment where the cross-correlation takes all RAM and when virtual memory has to be used (indicated by the sudden increasy in SSD activity).

It is obvious that this line of code is the cause for the drop in speed with large matrices:

result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2);

How could I prevent that RAM is filled up and speed drops? Is there a way to anticipate problems, and then maybe split the arrays into smaller chunks, saving them temporarily to the hard disk...?

The problem is slightly more complicated in the real application, because there is a lot of other processing happening, and everything is run in parallel with the parallel computing toolbox... But finding a solution for the above minimal example might already bring me on the right track... Thanks for your input!

13 Comments
Show 11 older comments Hide 11 older comments

William Thielicke on 15 Sep 2021

Open in MATLAB Online

When I quickly split the cross-correlation operation into two pieces (hopefully done correctly...), then everything seems to run a bit faster...? So vectorization speed improvements seems to have an maximum matrix size? Or am I just doing something wrong...

clear all %#ok<CLALL>
close all
clc
% Usually, in PIV, we have input image A and input image B which are
% captured with a short pause in between. These images are then cut into
% small pieces of e.g. 64x64 pixels. Each of these "interrogation areas"
% in image A is then cross-correlated with the same part from image B.
% With the cross-correlation code in PIVlab, it is possible to to this for
% a 3D matrix (saves a lot of time, but apparently runs into RAM problems 
% when matrices are large).
counter=0;
stack_sizes=1000:10000:200000 ;%these numbers are fine to demonstrate the effect on a 16 GB RAM laptop	
for size_of_the_stack=stack_sizes 
	%% generate some quick + dirty "particle" image pairs, arranged in a 3D matrix
	A=rand(64,64,size_of_the_stack);
	A(A<0.98)=0;
	A = imgaussfilt(A,0.9);
	B = circshift(A,round(rand([1,1])*10)); %displace the second image
	%% converting to single already saves 50% of memory, apparently without negative effects.
	A=single(A);
	B=single(B);
	bla=memory;
	%% do the cross-correlation with FFT
	result_conv_1_of_2 = zeros([size(A,1:2) size_of_the_stack/2],'single'); %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	result_conv_2_of_2 = result_conv_1_of_2; %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	tic
	result_conv_1_of_2 = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,1:size_of_the_stack/2))).*fft2(B(:,:,1:size_of_the_stack/2)))), 1), 2); %cross-correlation code in PIVlab
	result_conv_2_of_2 = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,size_of_the_stack/2+1:end))).*fft2(B(:,:,size_of_the_stack/2+1:end)))), 1), 2); %cross-correlation code in PIVlab
	%result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2); %cross-correlation code in PIVlab
	counter=counter+1
	calc_time(counter)=toc;
	time_per_subimage(counter)=calc_time(counter)/size_of_the_stack;
	avail_mem(counter)= bla.MemAvailableAllArrays;
end
%% plot results
bar(stack_sizes,time_per_subimage*1000)
grid on;
xlabel('stack size')
ylabel('time per correlation in ms')

William Thielicke on 15 Sep 2021

Open in MATLAB Online

... and if I split it into 4 parts, then it becomes even better?

	%% do the cross-correlation with FFT
	result_conv_1_of_4 = zeros([size(A,1:2) size_of_the_stack/4],'single'); %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	result_conv_2_of_4 = result_conv_1_of_4; %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	result_conv_3_of_4 = result_conv_1_of_4; %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	result_conv_4_of_4 = result_conv_1_of_4; %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	tic
	result_conv_1_of_4 = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,1:size_of_the_stack/4))).*fft2(B(:,:,1:size_of_the_stack/4)))), 1), 2); %cross-correlation code in PIVlab
	result_conv_2_of_4 = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,size_of_the_stack/4+1:size_of_the_stack/4*2))).*fft2(B(:,:,size_of_the_stack/4+1:size_of_the_stack/4*2)))), 1), 2); %cross-correlation code in PIVlab
	result_conv_3_of_4 = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,size_of_the_stack/4*2+1:size_of_the_stack/4*3))).*fft2(B(:,:,size_of_the_stack/4*2+1:size_of_the_stack/4*3)))), 1), 2); %cross-correlation code in PIVlab
	result_conv_4_of_4 = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,size_of_the_stack/4*3+1:end))).*fft2(B(:,:,size_of_the_stack/4*3+1:end)))), 1), 2); %cross-correlation code in PIVlab
    %result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2); %cross-correlation code in PIVlab
	counter=counter+1
	calc_time(counter)=toc;

William Thielicke on 15 Sep 2021

Open in MATLAB Online

Ok, so this seems to work, now I have the best of both worlds: Fast speed for smaller matrices due to vectorization, and still ok performance for really large matrices with a for-loop:

clear all %#ok<CLALL>
close all
clc
% Usually, in PIV, we have input image A and input image B which are
% captured with a short pause in between. These images are then cut into
% small pieces of e.g. 64x64 pixels. Each of these "interrogation areas"
% in image A is then cross-correlated with the same part from image B.
% With the cross-correlation code in PIVlab, it is possible to to this for
% a 3D matrix (saves a lot of time, but apparently runs into RAM problems
% when matrices are large).
counter=0;
stack_sizes=1000:20000:400000 ;%these numbers are fine to demonstrate the effect on a 16 GB RAM laptop
for size_of_the_stack=stack_sizes
	%% generate some quick + dirty "particle" image pairs, arranged in a 3D matrix
	A=rand(64,64,size_of_the_stack,'single');
	B=A;
	%% do the cross-correlation with FFT
	result_conv = zeros(size(A),'single'); %same starting conditions for all repetitions of the loop %#ok<PREALL> %
	tic
	% The following line of code eats up all the RAM
	if numel(A)>577536000 %large matrices become slow do to limited RAM, serial processing is faster
		disp('serial')
		for i=1:size(A,3)
			result_conv(:,:,i) = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,i))).*fft2(B(:,:,i)))), 1), 2); %cross-correlation code in PIVlab
		end
	else %smaller matrices benefit from parallelization
		disp('vectorized')
		result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2); %cross-correlation code in PIVlab
	end
	counter=counter+1
	calc_time(counter)=toc;
	time_per_subimage(counter)=calc_time(counter)/size_of_the_stack;
end
%% plot results
bar(stack_sizes,time_per_subimage*1000)
grid on;
xlabel('stack size')
ylabel('time per correlation in ms')

Bjorn Gustavsson on 15 Sep 2021

You might cut off some time for the smaller cases if you move the pre-allocation inside the "large-stack" part of the if-condition (you might even remove it if you can run the loop from the last image to the first in the stack), when you assign the output from the fftshift(-correlation-calculation in the small-stack-case it doesn't utilize the pre-allocation (if I've understood things properly...)

Sign in to comment.

Sign in to answer this question.

Follow Question

Sudden drop in speed for large matrices. RAM full -> Prevention?

13 Comments
Show 11 older comments Hide 11 older comments

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Sudden drop in speed for large matrices. RAM full -> Prevention?

13 Comments Show 11 older comments Hide 11 older comments

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

13 Comments
Show 11 older comments Hide 11 older comments