Only 1 core per node will run in a parallel job on clusters

1 view (last 30 days)
I'm running a parallel job (Generic Algorithm) on a supercomputer clusters. We have 12 cores on a single node. I use the "batch" function to submit the job requiring 127 cores (actually 128). Before it's running, it will show the information below showing 128 tasks pending. And I can check there are 11 nodes assigned to this job.
>> myJob=batch('Run_GA_Code_SI_dens', 'matlabpool', 127)
additionalSubmitArgs =
--time=480
myJob =
Job
Properties:
ID: 14
Type: pool
Username: zd6
State: running
SubmitTime: Thu Aug 13 14:51:14 CDT 2015
StartTime:
Running Duration: 0 days 0h 0m 0s
NumWorkersRange: [128 128]
AutoAttachFiles: true
Auto Attached Files: /dascratch/zd6/Density_cal_basedon_v202_v30_v4.2_with_Sr_Pitzer_and_Holmes/BPhiterm.m
/dascratch/zd6/Density_cal_basedon_v202_v30_v4.2_with_Sr_Pitzer_and_Holmes/fPZ6.m
AttachedFiles: {}
AdditionalPaths: {}
Associated Tasks:
Number Pending: 128
Number Running: 0
Number Finished: 0
Task ID of Errors: []
============
However, when it is really running, only 11 associated task are running, the same as the node assigned. When the engineer helped to find out why, they found that, actually on each node only 1 core (worker) is really working, however, generated a lot of threads. And the code is running really slow...
>> job14
job14 =
Job
Properties:
ID: 14
Type: pool
Username: zd6
State: running
SubmitTime: Thu Aug 13 14:51:14 CDT 2015
StartTime: Thu Aug 13 14:51:36 CDT 2015
Running Duration: 0 days 0h 1m 45s
NumWorkersRange: [128 128]
AutoAttachFiles: true
Auto Attached Files: /dascratch/zd6/Density_cal_basedon_v202_v30_v4.2_with_Sr_Pitzer_and_Holmes/BPhiterm.m
/dascratch/zd6/Density_cal_basedon_v202_v30_v4.2_with_Sr_Pitzer_and_Holmes/Bpterm.m
/dascratch/zd6/Density_cal_basedon_v202_v30_v4.2_with_Sr_Pitzer_and_Holmes/fPZ6.m
AttachedFiles: {}
AdditionalPaths: {}
Associated Tasks:
Number Pending: 0
Number Running: 11
Number Finished: 0
Task ID of Errors: []
=============
  2 Comments
Edric Ellis
Edric Ellis on 18 Aug 2015
What cluster type are you using? I'm a bit confused about this - you're requesting that MATLAB launches 128 worker processes simultaneously to run your job - on a cluster that has only 12 cores available? The fact that you see only 11 running tasks might indicate that the job is still starting up...
nash
nash on 5 Jul 2017
I experience the same problem. Did someone of you found a solution to this? Thanks in advance.

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 5 Jul 2017
Until R2017a, each worker will only be assigned a single core. Starting from R2017a you can edit your cluster profile to assign more than one core to each worker.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!