Thread Subject: Parfor and clusters

Subject: Parfor and clusters

From: Scott

Date: 14 Nov, 2008 22:11:02

Message: 1 of 4

I'm trying to run a parfor loop on a cluster and am having a rough time. There are two computers set up in my area, a dual-core and a single core machine. One machine has a job manager running and two workers, the other machine has a worker. Both have the parallel computing toolbox and distcomp. R2008 is on both. All are running on a LAN on the same subnet. The remote machine refuses to run anything. I think I'm missing a critical bit of information, and the documentation or forums haven't helped me.

My setup:
- nodestatus shows both the remote machine and local machine connected to the job manager.
- nodestatus shows the two workers on the main machine and the single worker on the remote machine are started.
- I can connect to the job manager via findResource on both machines.
- Both machines validate sucessfully in the configuration manager.

What I've tried:
- Running matlabpool on both machines with a jobmanager profile (files and paths point to the same shared folder that contains the project code). Both machines sucessfully download the folder when matlabpool starts.
- Running matlabpool on both machines with a local profile (works great, locally. Didn't expect otherwise).
- When I run matlabpool <myjobmanager> a job is instantly sent to the job manager and starts running, according to findResource. But it appears no resources (memory, CPU) are consumed, and I am returned to the command prompt.

Hypothesis:
My understanding is that the beauty of parfor loops is they should automatically run across a cluster without needing to submit a job (given that matlabpool is running). What am I missing? Am I misconfiguring matlabpool?

I've reached capitualtion after two full days and any help on the matter would be delightful. Thanks in advance.

Subject: Parfor and clusters

From: Steven Lord

Date: 17 Nov, 2008 05:08:23

Message: 2 of 4


"Scott " <crazyivan84__remove__@hotmail.com> wrote in message
news:gfkt1m$32n$1@fred.mathworks.com...
> I'm trying to run a parfor loop on a cluster and am having a rough time.
> There are two computers set up in my area, a dual-core and a single core
> machine. One machine has a job manager running and two workers, the other
> machine has a worker. Both have the parallel computing toolbox and
> distcomp. R2008 is on both. All are running on a LAN on the same subnet.
> The remote machine refuses to run anything. I think I'm missing a critical
> bit of information, and the documentation or forums haven't helped me.
>
> My setup:
> - nodestatus shows both the remote machine and local machine connected to
> the job manager.
> - nodestatus shows the two workers on the main machine and the single
> worker on the remote machine are started.
> - I can connect to the job manager via findResource on both machines.
> - Both machines validate sucessfully in the configuration manager.
>
> What I've tried:
> - Running matlabpool on both machines with a jobmanager profile (files and
> paths point to the same shared folder that contains the project code).
> Both machines sucessfully download the folder when matlabpool starts.
> - Running matlabpool on both machines with a local profile (works great,
> locally. Didn't expect otherwise).
> - When I run matlabpool <myjobmanager> a job is instantly sent to the job
> manager and starts running, according to findResource. But it appears no
> resources (memory, CPU) are consumed, and I am returned to the command
> prompt.
>
> Hypothesis:
> My understanding is that the beauty of parfor loops is they should
> automatically run across a cluster without needing to submit a job (given
> that matlabpool is running). What am I missing? Am I misconfiguring
> matlabpool?
>
> I've reached capitualtion after two full days and any help on the matter
> would be delightful. Thanks in advance.

If I understand your third "What I've tried" statement correctly, I think
you may have misunderstood what MATLABPOOL actually does. MATLABPOOL
doesn't run your PARFOR code; it just opens the pool. Once the pool is
open, you need to run your code that includes PARFOR on the machine where
you executed the "matlabpool <myjobmanager>" command; when MATLAB reaches
the PARFOR it will automatically make use of the pool that was opened by
MATLABPOOL. MATLABPOOL just makes the pool available; PARFOR actually makes
your code "dive into" the pool.

--
Steve Lord
slord@mathworks.com

Subject: Parfor and clusters

From: Scott

Date: 17 Nov, 2008 05:31:03

Message: 3 of 4

I think I wasn't the most articulate earlier. I understand that I need to open matlabpool and that when I run my code the parfor loop should automatically make use of all open matlabpools.

The parfor loop works beautifully with the matlabpool local configuration (utilizing up to four cores on the local machine). However, it has trouble with any matlabpool jobmanager config, it appears to only use a single core of the local machine and no resources on the remote machine, despite matlabpools being open on both of them.

I hope that lends some clarification.Hope that lends some clarification.


"Steven Lord" <slord@mathworks.com> wrote in message <gfqu87$20v$1@fred.mathworks.com>...
>
> "Scott " <crazyivan84__remove__@hotmail.com> wrote in message
> news:gfkt1m$32n$1@fred.mathworks.com...
> > I'm trying to run a parfor loop on a cluster and am having a rough time.
> > There are two computers set up in my area, a dual-core and a single core
> > machine. One machine has a job manager running and two workers, the other
> > machine has a worker. Both have the parallel computing toolbox and
> > distcomp. R2008 is on both. All are running on a LAN on the same subnet.
> > The remote machine refuses to run anything. I think I'm missing a critical
> > bit of information, and the documentation or forums haven't helped me.
> >
> > My setup:
> > - nodestatus shows both the remote machine and local machine connected to
> > the job manager.
> > - nodestatus shows the two workers on the main machine and the single
> > worker on the remote machine are started.
> > - I can connect to the job manager via findResource on both machines.
> > - Both machines validate sucessfully in the configuration manager.
> >
> > What I've tried:
> > - Running matlabpool on both machines with a jobmanager profile (files and
> > paths point to the same shared folder that contains the project code).
> > Both machines sucessfully download the folder when matlabpool starts.
> > - Running matlabpool on both machines with a local profile (works great,
> > locally. Didn't expect otherwise).
> > - When I run matlabpool <myjobmanager> a job is instantly sent to the job
> > manager and starts running, according to findResource. But it appears no
> > resources (memory, CPU) are consumed, and I am returned to the command
> > prompt.
> >
> > Hypothesis:
> > My understanding is that the beauty of parfor loops is they should
> > automatically run across a cluster without needing to submit a job (given
> > that matlabpool is running). What am I missing? Am I misconfiguring
> > matlabpool?
> >
> > I've reached capitualtion after two full days and any help on the matter
> > would be delightful. Thanks in advance.
>
> If I understand your third "What I've tried" statement correctly, I think
> you may have misunderstood what MATLABPOOL actually does. MATLABPOOL
> doesn't run your PARFOR code; it just opens the pool. Once the pool is
> open, you need to run your code that includes PARFOR on the machine where
> you executed the "matlabpool <myjobmanager>" command; when MATLAB reaches
> the PARFOR it will automatically make use of the pool that was opened by
> MATLABPOOL. MATLABPOOL just makes the pool available; PARFOR actually makes
> your code "dive into" the pool.
>
> --
> Steve Lord
> slord@mathworks.com
>

Subject: Parfor and clusters

From: Raymond Norris

Date: 18 Nov, 2008 04:39:01

Message: 4 of 4

Scott,

If I'm understand you correctly, you're running an interactive PARFOR locally (which works) and on the grid (which you're seeing problems with). If so, a best practice is to not run PARFOR interactively against the cluster, but rather just with the local scheduler. If you want to run a PARFOR job on the cluster, create a jobscript with createMatlabPoolJob.

What size are the variables that you are using in your PARFOR loop? I've seen before where the variables passed to the Workers are deemed to be too large for PARFOR. When this happens, the default behavior is to run the code locally.

That said, you could still run into the same problem (and most likely will), regardless of where you submit the job. Could you post your PARFOR code along with dimensionality of the matrices?

Raymond

"Scott " <crazyivan84__remove__@hotmail.com> wrote in message <gfqvin$fjb$1@fred.mathworks.com>...
> I think I wasn't the most articulate earlier. I understand that I need to open matlabpool and that when I run my code the parfor loop should automatically make use of all open matlabpools.
>
> The parfor loop works beautifully with the matlabpool local configuration (utilizing up to four cores on the local machine). However, it has trouble with any matlabpool jobmanager config, it appears to only use a single core of the local machine and no resources on the remote machine, despite matlabpools being open on both of them.
>
> I hope that lends some clarification.Hope that lends some clarification.
>
>
> "Steven Lord" <slord@mathworks.com> wrote in message <gfqu87$20v$1@fred.mathworks.com>...
> >
> > "Scott " <crazyivan84__remove__@hotmail.com> wrote in message
> > news:gfkt1m$32n$1@fred.mathworks.com...
> > > I'm trying to run a parfor loop on a cluster and am having a rough time.
> > > There are two computers set up in my area, a dual-core and a single core
> > > machine. One machine has a job manager running and two workers, the other
> > > machine has a worker. Both have the parallel computing toolbox and
> > > distcomp. R2008 is on both. All are running on a LAN on the same subnet.
> > > The remote machine refuses to run anything. I think I'm missing a critical
> > > bit of information, and the documentation or forums haven't helped me.
> > >
> > > My setup:
> > > - nodestatus shows both the remote machine and local machine connected to
> > > the job manager.
> > > - nodestatus shows the two workers on the main machine and the single
> > > worker on the remote machine are started.
> > > - I can connect to the job manager via findResource on both machines.
> > > - Both machines validate sucessfully in the configuration manager.
> > >
> > > What I've tried:
> > > - Running matlabpool on both machines with a jobmanager profile (files and
> > > paths point to the same shared folder that contains the project code).
> > > Both machines sucessfully download the folder when matlabpool starts.
> > > - Running matlabpool on both machines with a local profile (works great,
> > > locally. Didn't expect otherwise).
> > > - When I run matlabpool <myjobmanager> a job is instantly sent to the job
> > > manager and starts running, according to findResource. But it appears no
> > > resources (memory, CPU) are consumed, and I am returned to the command
> > > prompt.
> > >
> > > Hypothesis:
> > > My understanding is that the beauty of parfor loops is they should
> > > automatically run across a cluster without needing to submit a job (given
> > > that matlabpool is running). What am I missing? Am I misconfiguring
> > > matlabpool?
> > >
> > > I've reached capitualtion after two full days and any help on the matter
> > > would be delightful. Thanks in advance.
> >
> > If I understand your third "What I've tried" statement correctly, I think
> > you may have misunderstood what MATLABPOOL actually does. MATLABPOOL
> > doesn't run your PARFOR code; it just opens the pool. Once the pool is
> > open, you need to run your code that includes PARFOR on the machine where
> > you executed the "matlabpool <myjobmanager>" command; when MATLAB reaches
> > the PARFOR it will automatically make use of the pool that was opened by
> > MATLABPOOL. MATLABPOOL just makes the pool available; PARFOR actually makes
> > your code "dive into" the pool.
> >
> > --
> > Steve Lord
> > slord@mathworks.com
> >

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
parfor matlabpo... Raymond Norris 17 Nov, 2008 23:40:22
nodestatus Scott 14 Nov, 2008 17:15:08
mdce Scott 14 Nov, 2008 17:15:08
distcomp Scott 14 Nov, 2008 17:15:08
parallel Scott 14 Nov, 2008 17:15:08
cluster Scott 14 Nov, 2008 17:15:08
jobmanager Scott 14 Nov, 2008 17:15:08
matlabpool Scott 14 Nov, 2008 17:15:08
findresource Scott 14 Nov, 2008 17:15:08
parfor Scott 14 Nov, 2008 17:15:08
rssFeed for this Thread

Contact us at files@mathworks.com