Why doesn't parfeval(@splitapply) improve splitapply's performance?

I want to readtable many html-files to extract tables. I wrote a function extract_sheet to do just that. I had used parfor to perform this task, and it runs decently fast. Then it occurs to me that those html-files can be grouped according to their foder and filename segments. So, I try splitapply(extract_sheet, input variables, groupNumber), and it works. Then I want to see if parfeval would improve the speed. I do something like parfeval(@splitapply, extract_sheet, input variables, groupNumber.)
For a small testing file list, both methods spend almost the same amount of elapsed time, around 27.5 +/- .1 seconds. My question is why parfeval doesn't improve the performance?

 Accepted Answer

Matt J
Matt J on 31 Aug 2023
Edited: Matt J on 31 Aug 2023
It probably means that Matlab's internal parallellization already does what parfeval does.

6 Comments

I suspect that’s the reason as well. My computer’s fans would wake up like tornado when splitapply was running, almost as noisy as when parfor-loop was.
I just now ran the same data with parfor-loop and for-loop algorithms. The for-loop took about the same time as splitapply and parfeval(splitapply). So this means that splitapply does not fully utilize the power of parallel pool. One reason is probably that the data are separated into too many groups. Each row in the data table is a group.
On the other hand, the parfor-loop algorithm took a quarterth of time to finish.
Summary:
  • splitapply increases code readibility.
  • When a data set is seprated into too many sub-groups, splitapply would lose its advantage in vectorization speed.
  • parfor-loop is the fastest in 'too-many' subgroups scenario.
These are just thoughts based on my own personal experiences. I hope Mathworks staff can write a tutorial of using splitapply to take advantage of parallel computation.
Simon, looking at the splitapply doc page, I do not think there is anything inheriently parallel about that function.
When you run parfeval(@splitapply) you are running splitapply on a single worker. If you want to leverage parfeval on multiple workers at the same time, you will need to wrap it in a for-loop.
I think what you could ultimately do is something like
for i = 1:numBins
f(i) = parfeval(@foo, numOutputs, grouping)
end
Your foo function would include the splitapply and additional processing and you would feed different groupings to each parfeval worker, so each worker is processing its own independent chunk.
@Sam Marshalik, thanks for the answer. I just now looked inside splitapply and found that it uses for-loop. So indeed there is nothing inherently parallel.
for curGroup = 1:numGroups
% Find the elements of the group
groupNums = sgnums(grpStart(curGroup):grpEnd(curGroup));
dataVars = cell(1,numVars);
% Extract the group data
for i = 1:numVars
dataVars{1,i} = getVarRows(vars{i},groupNums,gdim);
end
% Apply the function to the group
[funOut,nout] = localapply(fun,dataVars,gdim,nout,funOut,curGroup,numVars);
end
@Sam Marshalik, the splitapply doc page does say it is one of the functions that support multi-thread and suggest the users to read parfeavl doc. That's all I found in the doc. The doc is too stingy with providing parallelism instruction materials for Matlab-users (not software developers).
@Matt J: You bring up a good point that the doc page is lacking information on this topic. I put in an enhancement request to improve that. In the meantime, I would suggest to call our Technical Support - they can investigate this further and reach out to the relevant Dev team.

Sign in to comment.

More Answers (1)

If you're going to be using PCT functions anyway, I wonder if a parfor loop might do better than splitapply. I.e., instead of,
splitapply(func,X,G)
one might instead do,
I=splitapply(@(x){x}, 1:numel(G), G);
parfor j=1:numel(I)
results{j}=func( X(I{j}) );
end

1 Comment

That's a really nice solution. I'll try it. My experience with parfor tells me it's gonna be fast.

Sign in to comment.

Categories

Products

Release

R2023a

Asked:

on 31 Aug 2023

Commented:

on 6 Sep 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!