Matlab R2024b parallel pool not working above 32 cores
Show older comments
Hi everyone,
I have a P8 ThinkStation with the AMD Ryzen ThreadRipper 7985WX working on W11 and Matlab R2024b installed. The processor has 64 physical cores and 128 logical ones. When I try to validate the local cluster profile for the parallel processing with a number of cores greater than 32, the validation fails at "SPMD job test" stage returning the following error:
Error Report: Job errored or did not reach the state 'finished'. MATLAB worker shut down unexpectedly with status -4 during task execution.
Indeed, the error status changes sometimes among -1, -2 and -4.
Any suggestion to fix this issues? I didn't have such a problem with R2023b...
Best regards,
Filippo
Answers (2)
sidik
on 7 Nov 2024
0 votes
Hello @Filippo Ambrosino
try to follow this :
Step 1: Reduce the Number of Cores Used
- Open Matlab.
- Go to Home > Parallel > Manage Cluster Profiles.
- In the Cluster Profile Manager window, select local from the list of cluster profiles.
- Click on Edit at the bottom right.
- In the NumWorkers section, set the number to 32 (or a lower number if you want to test gradually).
- Click Done to save the changes.
- Close the Cluster Profile Manager window.
Step 2: Test the Cluster Profile
- Go back to Parallel and click on Validate.
- Let Matlab validate the cluster profile. If the test still fails, try decreasing NumWorkers (to 16 or 8) and validate again to see if a lower number of cores resolves the issue.
Step 3: Create a Custom Cluster Profile (if needed)
- If validation continues to fail, go back to Manage Cluster Profiles and click on New Profile.
- Name the new profile (e.g., CustomProfile).
- In NumWorkers, try a reasonable number (such as 16 or 24).
- Save by clicking Done.
- Set this new profile as the active profile by checking the box next to its name.
- Validate the profile by clicking on Validate.
if all the above steps fail, i suggest you to visit support and open a support ticket.
don't hesitate if you're still stuck
Filippo Ambrosino
on 7 Nov 2024
0 votes
8 Comments
sidik
on 7 Nov 2024
Hello @Filippo Ambrosino
do you try to Use System Environment Variables like set OMP_NUM_THREADS=64?
try to open a command prompt (with administrator privileges) and set set OMP_NUM_THREADS=64.
Best regards,
Sidik
Alison Eele
on 7 Nov 2024
I would recommend opening a support ticket with a complete copy of your "Processes" profile cluster validation report failure for the 64 workers. They will be able to take a deeper look and help you use the number of workers you need.
Filippo Ambrosino
on 7 Nov 2024
Filippo Ambrosino
on 7 Nov 2024
sidik
on 7 Nov 2024
@Filippo Ambrosino yes better you contact the technical support
Filippo Ambrosino
on 7 Nov 2024
sidik
on 7 Nov 2024
@Filippo Ambrosino you're welcom
Same here:
running a simulation with more than 60 workers crashed with R2024b on several machines.
The same simulation runs fine with R2024a using 700 cores/Matlab workers.
No idea why R2024b crashed; also running SPMD validation test.
in the Job log there is only a "Matlab crashed on worker XXX" message - no other useful information.
Raffael-
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!