Likely error when doing high dimensional permute of big gpu array
Show older comments
The following code reproduces what appears to be an error when using permute, specifically I permute a matrix contains all positive integers, but the permute output contains zeros.
To trigger the error seems to require three things:
(i) big, say >4gb, array (does not trigger on 3.5gb array)
(ii) gpu array (does not trigger for cpu array)
(iii) high dimensional permute (triggers with 6D, but not 2D)
Following is code to reproduce the issue, and below that a copy-paste of what is printed to command window.
[I use Matlab R2025b, my gpu is detailed in the output.]
%% permutetest.m
% There is an error when permuting BIG GPU arrays with HIGH dimensions.
%
% 1. Doing a permute on a 6D gpu array that is big (roughly >4gb) GIVES ERRORS, specifically this is the 'Case 1' below.
% The error is that many elements get replaced with zeros.
%
% 2. Doing the same, but on a smaller (<4gb) gpu array does not error (Case 2 below)
% 3. Redoing first two cases, but on the cpu does not error (Cases 3 and 4)
% 4. Redoing first two cases, but only having permute across 2D does not error (Cases 5 and 6).
%
% The matices being permuted contain integers between 1 and 100, so seems
% unlikely that the actual contents are relevant to the error but I have
% not attempted to test this.
clear all
disp(gpuDevice) % show GPU model + available memory
% Sizes via [4, X, 15, 5, 3, 100]: elements = 90000*X, bytes(double) = 720000*X.
Xmid = 5220; % 469,800,000 elems = 3,758,400,000 bytes = 3.500 GiB (~3.5 GB)
Xhigh = 5966; % 536,940,000 elems = 4,295,520,000 bytes = 4.000 GiB (just above 4 GB)
% mode '6D' => permute(A,[1,2,4,5,3,6]) on the array as-is
% mode '2D' => reshape A to a 2-D matrix [4*X, 22500] then permute(A,[2,1])
%%
cases = {
'~4.0 GB, GPU, 6-D permute', [4 Xhigh 15 5 3 100], 'gpu', '6D'; % big, gpu, high dim
'~3.5 GB, GPU, 6-D permute', [4 Xmid 15 5 3 100], 'gpu', '6D'; % small, gpu, high dim
'~4.0 GB, CPU, 6-D permute', [4 Xhigh 15 5 3 100], 'cpu', '6D'; % big, cpu, high dim
'~3.5 GB, CPU, 6-D permute', [4 Xmid 15 5 3 100], 'cpu', '6D'; % small, cpu, high dim
'~4.0 GB, GPU, 2-D transpose', [4 Xhigh 15 5 3 100], 'gpu', '2D'; % big, gpu, low dim
'~3.5 GB, GPU, 2-D transpose', [4 Xmid 15 5 3 100], 'gpu', '2D'; % small, gpu, low dim
};
%% Do each case and give feedback on results
for ii=1:size(cases,1)
label = cases{ii,1}; sz = cases{ii,2}; dev = cases{ii,3}; mode = cases{ii,4};
% Decide the actual array shape and the permutation order for this case
if strcmp(mode,'2D')
asize = [sz(1)*sz(2), prod(sz(3:end))]; % e.g. [4*X, 22500]
perm = [2,1];
else
asize = sz;
perm = [1,2,4,5,3,6];
end
nel = prod(asize); nbytes = nel*8; % double = 8 bytes/element
fprintf('\n===== %s | array=[%s] | perm=[%s] =====\n', label, num2str(asize), num2str(perm));
fprintf(' %d elements | %d bytes | %.4f GiB | device=%s\n', nel, nbytes, nbytes/2^30, dev);
%% Pre-permute
% Build with every element >= 1 (so any 0 in the output == corruption), on the chosen device
if strcmp(dev,'gpu')
A = randi([1 100], asize, 'gpuArray'); % double, on GPU
else
A = randi([1 100], asize); % double, on CPU
end
sumA = gather(sum(A(:))); % computed BEFORE any permute -> trustworthy
nzA = sum(A(:)==0);
% Double-check the count of zeros pre-permute [passes every time]
if nnz(A)~=(numel(A)-nzA)
error('Zero count seems wrong')
end
fprintf(' input : min=%g #zeros=%d sum(A)=%.0f\n', gather(min(A(:))), gather(nzA), sumA);
%% Permute
B = permute(A, perm);
nnz1=nnz(B);
Bc = gather(B); % bring to host for trustworthy checks (no-op if already on CPU)
nnz2=nnz(Bc);
% Double-check that gather() is innocent [it is innocent, this never triggers]
if nnz1~=nnz2
error('gather() is not innocent')
end
%% Post-permute
nzBc = sum(Bc(:)==0);
minB = min(Bc(:));
sumB = sum(Bc(:));
% Double-check the count of zeros post-permute [passes every time]
if nnz(Bc)~=(numel(Bc)-nzBc)
error('Zero count seems wrong')
end
fprintf(' output: size=[%s] min=%g #zeros=%d sum(B)=%.0f\n', num2str(size(Bc)), minB, nzBc, sumB);
fprintf(' sum preserved? %d (lost %.1f%% of total)\n', sumA==sumB, 100*(sumA-sumB)/sumA);
if nzBc>0 || sumA~=sumB
fprintf(' >>> PERMUTE CORRUPTED THIS ARRAY (a faithful permute is impossible here) <<<\n');
else
fprintf(' OK: permute faithful (no zeros introduced, sum preserved).\n');
end
% Clean everything up between runs, to make sure nothing corrupts across runs
clear B Bc
clear A
reset(gpuDevice);% free GPU memory before the next allocation
end
fprintf('\nDone.\n');
When this is run, the following is output to Command Window
>>permutetest
CUDADevice with properties:
Name: 'NVIDIA RTX 4000 Ada Generation'
Index: 1 (of 1)
ComputeCapability: '8.9'
DriverModel: 'N/A'
TotalMemory: 20991901696 (20.99 GB)
AvailableMemory: 20807024640 (20.81 GB)
DeviceAvailable: true
DeviceSelected: true
Show all properties.
===== ~4.0 GB, GPU, 6-D permute | array=[4 5966 15 5 3 100] | perm=[1 2 4 5 3 6] =====
536940000 elements | 4295520000 bytes | 4.0005 GiB | device=gpu
input : min=1 #zeros=0 sum(A)=27115243076
output: size=[4 5966 5 3 15 100] min=0 #zeros=500387064 sum(B)=1845835241
sum preserved? 0 (lost 93.2% of total)
>>> PERMUTE CORRUPTED THIS ARRAY (a faithful permute is impossible here) <<<
===== ~3.5 GB, GPU, 6-D permute | array=[4 5220 15 5 3 100] | perm=[1 2 4 5 3 6] =====
469800000 elements | 3758400000 bytes | 3.5003 GiB | device=gpu
input : min=1 #zeros=0 sum(A)=23724420469
output: size=[4 5220 5 3 15 100] min=1 #zeros=0 sum(B)=23724420469
sum preserved? 1 (lost 0.0% of total)
OK: permute faithful (no zeros introduced, sum preserved).
===== ~4.0 GB, CPU, 6-D permute | array=[4 5966 15 5 3 100] | perm=[1 2 4 5 3 6] =====
536940000 elements | 4295520000 bytes | 4.0005 GiB | device=cpu
input : min=1 #zeros=0 sum(A)=27115350572
output: size=[4 5966 5 3 15 100] min=1 #zeros=0 sum(B)=27115350572
sum preserved? 1 (lost 0.0% of total)
OK: permute faithful (no zeros introduced, sum preserved).
===== ~3.5 GB, CPU, 6-D permute | array=[4 5220 15 5 3 100] | perm=[1 2 4 5 3 6] =====
469800000 elements | 3758400000 bytes | 3.5003 GiB | device=cpu
input : min=1 #zeros=0 sum(A)=23724663007
output: size=[4 5220 5 3 15 100] min=1 #zeros=0 sum(B)=23724663007
sum preserved? 1 (lost 0.0% of total)
OK: permute faithful (no zeros introduced, sum preserved).
===== ~4.0 GB, GPU, 2-D transpose | array=[23864 22500] | perm=[2 1] =====
536940000 elements | 4295520000 bytes | 4.0005 GiB | device=gpu
input : min=1 #zeros=0 sum(A)=27115442092
output: size=[22500 23864] min=1 #zeros=0 sum(B)=27115442092
sum preserved? 1 (lost 0.0% of total)
OK: permute faithful (no zeros introduced, sum preserved).
===== ~3.5 GB, GPU, 2-D transpose | array=[20880 22500] | perm=[2 1] =====
469800000 elements | 3758400000 bytes | 3.5003 GiB | device=gpu
input : min=1 #zeros=0 sum(A)=23724054242
output: size=[22500 20880] min=1 #zeros=0 sum(B)=23724054242
sum preserved? 1 (lost 0.0% of total)
OK: permute faithful (no zeros introduced, sum preserved).
Done.
>>
3 Comments
Matt J
5 minutes ago
Yep. I see it as well. I hope you've reported it. Interestingly also, the error is only triggered (for me at least) when perm(1)=1. All 600 other permutation orders work fine.
Stephen23
17 minutes ago
Report bugs here:
Robert Kirkby
34 minutes ago
Answers (0)
Categories
Find more on Creating and Concatenating Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!