MATLAB R2018b not using all parfor workers on server Ubuntu

4 views (last 30 days)
I use R2018b (9.5.0.944444) 64-bit (glnxa64) and run some code with parfor on a Ubuntu server. Reading from other posts [1], I understand that the number of workers equals the number of physical cores. In fact, running
feature('numcores')
while my main code is running with parfor I obtain
MATLAB detected: 36 physical cores.
MATLAB detected: 72 logical cores.
MATLAB was assigned: 72 logical cores by the OS.
MATLAB is using: 36 logical cores.
MATLAB is not using all logical cores because hyper-threading is enabled.
I don't plan to use all 72 logical cores since it is bad advised [2], but I would like to use all 36 MATLAB logical cores ('workers'). However, when I check what's running with 'top', I get
top - 10:58:19 up 51 days, 22:05, 1 user, load average: 11,09, 11,07, 11,07
Tasks: 772 total, 1 running, 447 sleeping, 0 stopped, 0 zombie
%Cpu(s): 15,3 us, 0,0 sy, 0,0 ni, 84,6 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem : 65858916 total, 34396420 free, 25277820 used, 6184676 buff/cache
KiB Swap: 67008508 total, 66963128 free, 45380 used. 39908680 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
42075 me 20 0 10,769g 874756 225980 S 100,0 1,3 2127:54 MATLAB
42077 me 20 0 10,711g 893356 225944 S 100,0 1,4 2128:00 MATLAB
42079 me 20 0 10,705g 861136 226612 S 100,0 1,3 2127:46 MATLAB
42081 me 20 0 10,704g 840472 226484 S 100,0 1,3 2127:52 MATLAB
42083 me 20 0 10,712g 856320 227408 S 100,0 1,3 2127:40 MATLAB
42085 me 20 0 10,711g 903280 226572 S 100,0 1,4 2127:50 MATLAB
42087 me 20 0 10,766g 871928 226380 S 100,0 1,3 2127:46 MATLAB
42091 me 20 0 10,706g 971448 225872 S 100,0 1,5 2127:44 MATLAB
42093 me 20 0 10,705g 863472 225972 S 100,0 1,3 2127:45 MATLAB
42095 me 20 0 10,767g 904780 226684 S 100,0 1,4 2127:45 MATLAB
42089 me 20 0 10,768g 849096 226192 S 99,7 1,3 2127:55 MATLAB
5942 me 20 0 42988 4784 3360 R 1,0 0,0 0:01.38 top
42026 me 20 0 10,708g 874944 226500 S 0,3 1,3 26:12.25 MATLAB
42029 me 20 0 10,711g 873740 226648 S 0,3 1,3 25:55.30 MATLAB
42031 me 20 0 10,707g 872984 226104 S 0,3 1,3 26:43.04 MATLAB
42035 me 20 0 10,710g 869180 226996 S 0,3 1,3 26:52.10 MATLAB
42037 me 20 0 10,707g 870420 226072 S 0,3 1,3 26:21.89 MATLAB
42039 me 20 0 10,709g 981948 226356 S 0,3 1,5 29:33.97 MATLAB
42041 me 20 0 10,707g 869492 227508 S 0,3 1,3 61:37.80 MATLAB
42043 me 20 0 10,775g 899368 227492 S 0,3 1,4 77:11.67 MATLAB
42045 me 20 0 10,712g 912452 227276 S 0,3 1,4 154:24.69 MATLAB
42047 me 20 0 10,712g 919104 226272 S 0,3 1,4 152:14.17 MATLAB
42055 me 20 0 10,705g 848856 226424 S 0,3 1,3 447:04.21 MATLAB
42057 me 20 0 10,706g 866580 226292 S 0,3 1,3 449:49.36 MATLAB
42061 me 20 0 10,705g 852156 226460 S 0,3 1,3 450:37.76 MATLAB
42067 me 20 0 10,707g 827552 225740 S 0,3 1,3 1487:57 MATLAB
42073 me 20 0 10,706g 911500 227056 S 0,3 1,4 1502:18 MATLAB
and it seems to me that at the moment only 11 workers are full running while the others are not (I have deleted irrelevant rows).
Why aren't the other processes running at 100%?
Here's a draft of the code that I use
% Setup some shared variables (nothing really big in terms of memory)
% For instance, parameter_set is an array with (say 3) columns where
% parameter_set(k,:) defines a tuple of parameters that a worker uses
% to run some functions in parfor.
parameter_set = randi(10,100,3);
cost = zeros(size(parameter_set,1),1); %saves output of computation
poolobj = gcp('nocreate'); % gets current pool object if existing
if isempty(poolobj) % check if pool object is empty
poolobj = parpool([1 Inf]); % starts Parallel Computing Toolbox
end
parfor k=1:size(parameter_set,1)
a = parameter_set(k,1);
b = parameter_set(k,2);
c = parameter_set(k,3);
% run some user-defined functions, fmincon, etc
% with parameters a,b,c
cost(k) = fmincon(@(x) myfun(x,a,b,c));
disp(['k = ' num2str(k) ' done']);
end
delete(poolobj);
save('myFile');
Note that when starting parpool, MATLAB gives me
Starting parallel pool (parpool) using the 'local' profile ...
connected to 36 workers.
Thanks in advance for any usefull reply
  3 Comments
Nicola Dalla Pozza
Nicola Dalla Pozza on 24 Sep 2019
Some iterations take more than others depending on the values of a,b,c. Each iteration may last from 350 to 40000 sec.
I will run a dummy workload as soon as the server gets free (may require a couple of days).
PS Actually, I use parameter_set to queue multiple iterations of the loop so that as soon as a worker is free a new iteration will start with a new set of parameters. Any ideas on how to make this simpler, cleaner or more efficient is welcome - an improvement would be running the iterations from the longest to the fastest, but in parfoof you cannot control the order of the iterations, so I haven't figure it out yet a way to do it. Thanks
Nicola Dalla Pozza
Nicola Dalla Pozza on 25 Sep 2019
Hi Edric, the code you suggested occupies all the workers, only at the last few seconds some workers seems to be used at 25 %, but maybe it's because they are shutting down.
I have to say that also my script immediately occupies all the available workers. It is only when I check after some hours that top gives me a screen as reported in the question.
Here's a more detailed version of my code, I wonder if xOptimal or transfer may be badly sliced and creating unneccessary communication overhead. I will try to profile the code later.
parameter_set = randi(10,100,3);
cost = zeros(size(parameter_set,1),1); %saves output of computation
xOptimal = zeros(size(parameter_set,1),20);
transfer = zeros(size(parameter_set,1), 8, 8);
Aeq = ...
beq = ...
poolobj = gcp('nocreate'); % gets current pool object if existing
if isempty(poolobj) % check if pool object is empty
poolobj = parpool([1 Inf]); % starts Parallel Computing Toolbox
end
parfor k=1:size(parameter_set,1)
a = parameter_set(k,1);
b = parameter_set(k,2);
c = parameter_set(k,3);
for r=1:initRuns
% Fmincon
options = optimset('Display','off');
xinit = randi(10, 20,1);
try
xTmp = fmincon( @(x) myFun(x, a, b, c), xinit, [], [], Aeq, beq, ...
-ones(size(xinit)), ones(size(xinit)), [], options);
catch e
disp(e)
disp(['error at k= ' num2str(k) ', xinit= ' num2str(xinit)]);
quit force
end
[costTmp, transferTmp] = myFun(xTmp, a, b, c);
if costTmp > cost(k)
cost(k) = costTmp;
xOptimal(k,:) = xTmp; % badly sliced?
transfer(k,:,:) = transferTmp; % badly sliced?
end
end
disp(['k = ' num2str(k) ' done']);
end
delete(poolobj);
save('myFile');

Sign in to comment.

Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!