Cannot iterate over table in parfor loop for fastering saving process on multiple workers in cluster.

% remove all 0 activities
subject101(subject101.activityID == 0, :) = [];
% divide table to sub-tables by labels
groups = unique(subject101{:, 2});
parfor i = 1:length(groups)
str = [ 'Working on activity ' num2str(groups(i)) ];
disp(str);
T = subject101(subject101.activityID == groups(i), :); % HERE IS THE WARNING
str = [ 'subject101_' num2str(groups(i)) '.csv' ];
writetable(T, str);
end
Hello,
I get a warning "The entire array or structure 'subject101' is a broadcast variable. This might result in unnecessary communication overhead.".
'subject101' is type table 249957x12.
How can I solve it? Please help.
Thanks.

 Accepted Answer

MATLAB's optimizer is not advanced enough to be able to trace through that code to prove that each entry will be used in only one worker and then to automatically split the table for you. So it needs to send the entire table to each worker.
To avoid that, you would need to split the table yourself, such as by using findgroups() and splitapply().
[G, groups] = findgroups(subject101.activityID);
grouped_tables = splitapply(@(varargin) {table(varargin{:}, 'VariableNames', subject101.Properties.VariableNames)}, subject101, G);
parfor i = 1 : length(grouped_tables)
T = grouped_tables{i};
%do whatever is needed
end

9 Comments

You're running R2015a. findgroups was introduced in R2015b.
Good point, I did not pay attention to the version.
[groups, ~, G] = unique(subject101.activityID);
grouped_tables = splitapply(@(varargin) {table(varargin{:}, 'VariableNames', subject101.Properties.VariableNames)}, subject101, G);
parfor i = 1 : length(grouped_tables)
T = grouped_tables{i};
%do whatever is needed
end
Now, I get an error:
Undefined function or variable 'splitapply'.
splitapply was introduced in R2015b too.
In that case, you would probably have to end up doing looping and the equivalent to
T = subject101(subject101.activityID == groups(i), :); % HERE IS THE WARNING
except saving the results in a cell array, after which you would parfor over the indices of the cell array.
I guess there would also be a way to do it without looping or unique(), by using accumarray
group_tables = accumarray(subject101.activityID, subject101.activityID, [], @(idx) subject101(idx,:));
Now I get an error
group_tables = accumarray(subject103.activityID, subject103.activityID, [], @(idx) subject103(idx,:));
subject103 is Table.
Perhaps
group_tables = accumarray(subject103.activityID, subject103.activityID, [], @(idx) {subject103(idx,:)});
Some representative sample data would help.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2015a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!