Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Matlabpool performance/deadlock? problems

Subject: Matlabpool performance/deadlock? problems

From: John

Date: 21 Oct, 2012 10:51:07

Message: 1 of 9

Hi,

we are running a GNU/Linux (openSuSE) server with 4x8-core Opterons and 256 GiB of RAM for several MATLAB users (MATLAB version doesn't matter).
As of late we encounter strange problems when (for example) two users simultaneously use MATLAB pools from the Distributed Computing Toolbox. We only have the local scheduler so up to 12 Tasks per MATLAB instance/user.
Now, one symptom is that the matlabpool session of one user can be opened but won't do anything but hang in some wait call (seen when pressing CTRL+C) while the CPU load is at minimum. We have tried waiting up to a few hours but no change, it simply doesn't do anything. This always happens when beginning the parallel computation, as soon as it's running it gets somewhat better.
In addition the MATLAB Gui becomes horribly slow from time to time up to the point where it's non-usable. As soon as one of the two users closes their pool the other user's session gets from sluggish to smooth and works normally.

The worker tasks don't do anything specific - they are either invoked by parfor or by spmd and have each their share of I/O and computation.

This seems also related to Linux or maybe Java: We had these problems before with openSuSE 12.1 and MATLAB 2011b. At some point I had the idea of changing the MATLAB built-in Java version to the system Java (via export MATLAB_JAVA) and from then on it worked.
However, since the update to openSuSE 12.2 no change of Java version remedies the problems and we are more or less lost... Is there a general problem when to users operate the same MATLAB with parallel sessions?

Has someone encountered similar problems before? It might not really be a MATLAB problem but since MATLAB is the only application we are noticing it seems logical to post it here.

Subject: Matlabpool performance/deadlock? problems

From: Edric M Ellis

Date: 22 Oct, 2012 08:48:26

Message: 2 of 9

"John " <lkjds.fdskj@mailinator.com> writes:

> we are running a GNU/Linux (openSuSE) server with 4x8-core Opterons
> and 256 GiB of RAM for several MATLAB users (MATLAB version doesn't
> matter). As of late we encounter strange problems when (for example)
> two users simultaneously use MATLAB pools from the Distributed
> Computing Toolbox. We only have the local scheduler so up to 12 Tasks
> per MATLAB instance/user. Now, one symptom is that the matlabpool
> session of one user can be opened but won't do anything but hang in
> some wait call (seen when pressing CTRL+C) while the CPU load is at
> minimum. We have tried waiting up to a few hours but no change, it
> simply doesn't do anything. This always happens when beginning the
> parallel computation, as soon as it's running it gets somewhat better.
> In addition the MATLAB Gui becomes horribly slow from time to time up
> to the point where it's non-usable. As soon as one of the two users
> closes their pool the other user's session gets from sluggish to
> smooth and works normally.
>
> The worker tasks don't do anything specific - they are either invoked
> by parfor or by spmd and have each their share of I/O and computation.
>
> This seems also related to Linux or maybe Java: We had these problems
> before with openSuSE 12.1 and MATLAB 2011b. At some point I had the
> idea of changing the MATLAB built-in Java version to the system Java
> (via export MATLAB_JAVA) and from then on it worked. However, since
> the update to openSuSE 12.2 no change of Java version remedies the
> problems and we are more or less lost... Is there a general problem
> when to users operate the same MATLAB with parallel sessions?
>
> Has someone encountered similar problems before? It might not really
> be a MATLAB problem but since MATLAB is the only application we are
> noticing it seems logical to post it here.

A couple of questions:

1. Is MATLAB installed on the local disk of the machine? (If not, I
would definitely recommend at least trying this)

2. Do you notice this slow behaviour if you run 24 copies of MATLAB
'manually' rather than via matlabpool?

Cheers,

Edric.

Subject: Matlabpool performance/deadlock? problems

From: John

Date: 22 Oct, 2012 10:37:07

Message: 3 of 9

> 1. Is MATLAB installed on the local disk of the machine? (If not, I
> would definitely recommend at least trying this)

yes, it's all locally installed. (only the data is stored elsewhere)
 
> 2. Do you notice this slow behaviour if you run 24 copies of MATLAB
> 'manually' rather than via matlabpool?

How do I do this? I'm only aware of the matlabpool parallelization method.

I changed the kernel version again (to 12.2 stock Linux 3.4.6-2.10-default #1 SMP).
It went more or less fine for two days after restart but now it starts to get sluggish again. It's really annoying :/ but I am out of ideas what else to try.

Just now I aborted a "calculation" that hang in

blockExecutor.initiateComputation()
(Operation terminated by user during spmdlang.RemoteSpmdExecutor/initiateComputation
(line 96))

for 30 minutes...

Subject: Matlabpool performance/deadlock? problems

From: Edric M Ellis

Date: 22 Oct, 2012 14:36:34

Message: 4 of 9

"John " <lkjds.fdskj@mailinator.com> writes:

>> 1. Is MATLAB installed on the local disk of the machine? (If not, I
>> would definitely recommend at least trying this)
>
> yes, it's all locally installed. (only the data is stored elsewhere)

Ok, that's a good start.

>> 2. Do you notice this slow behaviour if you run 24 copies of MATLAB
>> 'manually' rather than via matlabpool?
>
> How do I do this? I'm only aware of the matlabpool parallelization
> method.

Literally start 24 instances of MATLAB from the command line, i.e.

$ for x in $(seq 1 24) ; do
matlab &
done

or similar.

> I changed the kernel version again (to 12.2 stock Linux 3.4.6-2.10-default #1 SMP).
> It went more or less fine for two days after restart but now it starts to get sluggish again. It's really annoying :/ but I am out of ideas what else to try.
>
> Just now I aborted a "calculation" that hang in
>
> blockExecutor.initiateComputation()
> (Operation terminated by user during spmdlang.RemoteSpmdExecutor/initiateComputation
> (line 96))
>
> for 30 minutes...

That definitely doesn't sound right, that means that the client hasn't
finished sending the SPMD blocks to the workers.

When this is happening, what's the memory usage on the machine like? Is
it swapping? How much RAM does the machine have?

Cheers,

Edric.

Subject: Matlabpool performance/deadlock? problems

From: John

Date: 22 Oct, 2012 16:03:08

Message: 5 of 9

> That definitely doesn't sound right, that means that the client hasn't
> finished sending the SPMD blocks to the workers.
>
> When this is happening, what's the memory usage on the machine like? Is
> it swapping? How much RAM does the machine have?

no swapping in progress, we have 256 GiB of RAM. Currently there is not really much truly free but most of it cached (Linux disk cache) which should be available to MATLAB instantaneously.
So the problem began to occur again today after two days of hassle-free operation. No different usage pattern, just two users doing matlabpool calculations. I tried another kernel (3.6.2-3-desktop #1 SMP PREEMPT) but it didn't help a bit :/

My users are beginning to get annoyed about the frequent restarts and the delays. Is it possible/advisable to escalate this to the MATLAB support or do they charge extra? We really need a working system.

Subject: Matlabpool performance/deadlock? problems

From: John

Date: 22 Oct, 2012 16:12:08

Message: 6 of 9

ok another strange thing (probably related?):

My code consists of two phases:
The first one loads data from disk to memory and the second one operates on this data.
both parts are separately parallelized via spmd in two different files.

The first part always seems to work perfectly and loads the data properly.
As soon as the second part accesses the data there *may* occur a long or infinite delay. Sometimes the delay first occurs when the second spmd distributes the data. What I also noticed as an indicator of my problems is that the load balancing between the workers doesn't work correctly i.e. when all workers should process data in parallel sometimes only one of them finishes its work from 0-100%. Then after some delay maybe a second one follows and with any luck the whole process eventually finishes.

Subject: Matlabpool performance/deadlock? problems

From: Edric M Ellis

Date: 23 Oct, 2012 06:44:55

Message: 7 of 9

"John " <lkjds.fdskj@mailinator.com> writes:

>> That definitely doesn't sound right, that means that the client
>> hasn't finished sending the SPMD blocks to the workers.
>>
>> When this is happening, what's the memory usage on the machine like?
>> Is it swapping? How much RAM does the machine have?
>
> no swapping in progress, we have 256 GiB of RAM. Currently there is
> not really much truly free but most of it cached (Linux disk cache)
> which should be available to MATLAB instantaneously. So the problem
> began to occur again today after two days of hassle-free operation. No
> different usage pattern, just two users doing matlabpool
> calculations. I tried another kernel (3.6.2-3-desktop #1 SMP PREEMPT)
> but it didn't help a bit :/

When you look at 'top' on the system when the problem is occurring, do
you notice anything different in terms of load, memory in use etc.?

> My users are beginning to get annoyed about the frequent restarts and
> the delays. Is it possible/advisable to escalate this to the MATLAB
> support or do they charge extra? We really need a working system.

I believe if your licences are in maintenance that should be fine.

Cheers,

Edric.

Subject: Matlabpool performance/deadlock? problems

From: John

Date: 28 Oct, 2012 11:53:07

Message: 8 of 9

Hi,

new findings: Regardless of Java version, Linux kernel version etc. the problem came back. It now appears that indeed the problem was related to RAM. As I wrote previously I had no real free RAM but most of it was used by the Linux file system cache. Usually this is immediately released upon a real memory request by an application.
However this somehow didn't work in conjunction with MATLAB. Whenever there was little 'free' RAM and say 220 GiB cached the matlabpool wouldn't allocate anything or even start at all.
Since purging the file system cache once in a while using
echo 1 > /proc/sys/vm/drop_caches
our problem seems to have vanished.
I still have only three days of experience with this supposed 'fix' but until now it looks promising and my users are happy :)

Subject: Matlabpool performance/deadlock? problems

From: stephan

Date: 1 Aug, 2013 12:05:08

Message: 9 of 9

Hi,

We have this same problem (poor Matlab performance, slow GUI, etc) when cache is full and there is no 'free' RAM. It has happened on SuSE Enterprise 10 & 11, all service packs, with different java versions, from Matlab 2009-2011, and it does not seem to appear only when using parallel computing toolbox.

Server daemons (backup jobs, etc) are regularly filling system cache via file access, and we are looking for a better solution than the ugly hack of drop_caches every 5-10min. Has anyone found a way to solve this problem? Is it really SuSE or parfor specific? (We use parallel toolbox for jobs, but not that regularly...) thanks,

stephan

"John" wrote in message <k6j6b3$8t5$1@newscl01ah.mathworks.com>...
> Hi,
>
> new findings: Regardless of Java version, Linux kernel version etc. the problem came back. It now appears that indeed the problem was related to RAM. As I wrote previously I had no real free RAM but most of it was used by the Linux file system cache. Usually this is immediately released upon a real memory request by an application.
> However this somehow didn't work in conjunction with MATLAB. Whenever there was little 'free' RAM and say 220 GiB cached the matlabpool wouldn't allocate anything or even start at all.
> Since purging the file system cache once in a while using
> echo 1 > /proc/sys/vm/drop_caches
> our problem seems to have vanished.
> I still have only three days of experience with this supposed 'fix' but until now it looks promising and my users are happy :)

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us