The number of workers in PARPOOL is limited to 6 on Linux Cluster

10 views (last 30 days)

BenC on 4 Nov 2019

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/489218-the-number-of-workers-in-parpool-is-limited-to-6-on-linux-cluster

Answered: zawye aung on 19 Jun 2020

I'm currently working through some stereoscopic video processing on a Linux cluster with 11 physical processors and 126 GiB RAM on 2019A. Each physical process has 8 cores (Opteron 6300 series). For some reason, if I try to create a PARPOOL larger than 6 workers, it fails on the verification step. I'm currently running an analysis, but I will post the specific error message once it's complete. I was originally restricted to fewer than 3 workers, but I increased the size of my Java heap memory to 8GB (using java.opts in the /bin/glnxa64/ directory). Memory usage at 6 workers does not come near system limits. How can I open this up to take advantage of the other physical processors on this machine, increase the java heap memory again?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

Jason Ross on 5 Nov 2019

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/489218-the-number-of-workers-in-parpool-is-limited-to-6-on-linux-cluster#answer_399949

It will be very useful to see the error message.

It's also not clear what scheduler you are using -- is this a local scheduler, MJS, etc? I'm also assuming that when you say "11 physical processors" you mean 11 nodes in the cluster with 126 GiB each?

My initial hunch is that you are hitting some limit set in the user environment -- something like file handles, RAM, vmem size, etc. In addition to the actual error message it might be useful to see the output of the shell command "ulimit -a" or "limit", depending on what your system uses.

In my experience it's usually been that the "descriptors" is set too low, and it needs to be increased.

You could also be running out of communications ports or hitting a communications error if you have firewalls set up (and are using Parallel Server)

3 Comments
Show 1 older commentHide 1 older comment

Jason Ross on 5 Nov 2019

Edited: Jason Ross on 5 Nov 2019

I suggest upping the "open files" setting. I suspect that you are hitting file handle limits. FWIW mine is set to 4096 on my workstation and I up it to 65535 for some servers.

The exact procedure is slightly different for each OS but in general you edit limits.conf and might need to set something in your shell initilization files.

If you want to try a one-off experiment you can set the limit higher in one shell (something like "limit -n 4096"), and launch MATLAB from this shell. The spawned worker processes should inherit the changed limit.

BenC on 5 Nov 2019

That was the ticket. Thanks so much. Adjusting the "open files" limit fixed the issue.

More Answers (2)

zawye aung on 19 Jun 2020

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/489218-the-number-of-workers-in-parpool-is-limited-to-6-on-linux-cluster#answer_453745

Any suggestion? I have this problem, help me

0 Comments
Show -2 older commentsHide -2 older comments

zawye aung on 19 Jun 2020

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/489218-the-number-of-workers-in-parpool-is-limited-to-6-on-linux-cluster#answer_453751

I trying to use mdcs with mjs cluster profile. In my test, i've already passed Admin Center validation. But, when i used the matlab with mjs cluster profile can't passed validation. My problem was shown in figure. I met pool job test fail. So, i want to any suggestion. Please, help me. Thank u!

0 Comments
Show -2 older commentsHide -2 older comments

Products

Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

The number of workers in PARPOOL is limited to 6 on Linux Cluster

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

The number of workers in PARPOOL is limited to 6 on Linux Cluster

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments