How to use the matfile function to call and slice up a very large structure and use it in parfor without having broadcast variable warning?

2 views (last 30 days)
I have a very large structure with an array in one of the fields that is approximately 15,000,000 x 20. Currently, I am locating the structure via the matfile command in hopes to only load windows of data that I can apply a simple function to. I am recieving a broadcast variable warning for the structure even though I thought I wrote code in such a way that slices up the dataset. I have access to a remote cluster however, it is running extremely slow. I fear that the overhead of the broadcast variable is the cause for this. I will have to do this same routine for 50+ Files and can use anytime performance modifications that I can.
Is there a way possible to load only slices of array using the matfile command in such a way to speed up this process and avoid broadcasting this very large array? Any help would be appreciated. Sample code of exactly what I am doing is seen below:
%Locate the file without loading it into memory
m = matfile('filename.mat');
%Note, m is a large structure witht the following:
% m.A = {'Information'}
% m.B = {'Information'}
% m.C = [15,000,000 x 1]
% m.D = [15,000,000 x 20]
%Extract the size of the array
Info = whos(m,'C');
Length = Info.size(1);
%Using parallel processing, step through the data set with Window Size, Win
parfor j = Win:Length
Result(j,:) = SomeFunc(m.C(j-Win+1,:)); %call function along columns for window size, Win
end
Note, I am using 2017b

Accepted Answer

Edric Ellis
Edric Ellis on 3 May 2019
In this case, the warning about broadcasting the matfile object is probably safe to ignore. The point is that the matfile object itself is not large, it simply knows how to load the data on demand. You can easily prove this to yourself using ticBytes and tocBytes:
%% Prepare data
N = 100000;
C = rand(N,1);
fname = tempname();
save('-v7.3', fname, 'C');
clear C
%% Create matfile and pool
pool = gcp();
if isempty(pool)
pool = parpool('local', 4);
end
m = matfile(fname);
%% Use matfile, check bytes transmitted
Info = whos(m, 'C');
Length = Info.size(1);
t = ticBytes(pool);
parfor j = 1:Length
out(j) = m.C(j, 1).^2;
end
tocBytes(pool, t);
This gives the output:
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 23408 2.163e+05
2 23408 2.163e+05
3 23408 2.163e+05
4 23408 2.163e+05
Total 93632 8.6522e+05
One thing I would note though is that reading single elements at a time from a matfile can be slow. It might be more efficient to load larger chunks of data, more like this:
%% Load in chunks
chunkSize = 1024;
numChunks = ceil(Length / chunkSize);
t = ticBytes(pool);
parfor j = 1:numChunks
firstIdx = 1 + ((j-1) * chunkSize);
lastIdx = min(Length, firstIdx + chunkSize - 1);
out2{j} = m.C(firstIdx:lastIdx, 1).^2;
end
tocBytes(pool, t);

More Answers (1)

Walter Roberson
Walter Roberson on 2 May 2019
You do not appear to be using the D array in your parfor, and your C array is less than 120 megabytes. Just copy all of m.C into a local variable in your client and then let it be sliced automatically.
If your code had a typo and really refers to M.D then that array is about 2 1/4 gigabytes. It might still be worth taking a local copy if it and letting it be sliced.
  1 Comment
Adam
Adam on 3 May 2019
Walter-
I apolgoize for the typo. Actually, the parfor function should read:
parfor j=Win:Length
Results(j,:) = SomeFunc(m.D(j-Win+1:j,:));
end
where each slice has "Win" number of values. The function needs to operate on an array with "Win" number of rows which is why I am trying to slice it as such.
Can you elaborate on your answer with this new information please?

Sign in to comment.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!