Thread Subject: Distributed Computing with local Sched is SLOW

Subject: Distributed Computing with local Sched is SLOW

From: Abel Brown

Date: 5 Oct, 2007 01:25:58

Message: 1 of 5

So i have 60 files that i would like to read in using one of
my custom functions read_LC(file).

It takes about 0.5 seconds to read one file.

Now I would like to used the local scheduler with some
workers to do this much faster than 30 seconds.

But when i create a job with the local sched. and add 60
tasks it's slooooooooooow! The total time is almost 300
seconds! Iv read everything i can and im about to delete
the toolbox unless someone can give me a heads up.

If it matters, im on OSX, iMac intelCoreDuo 2Ghz 2GB ram
using matlab 2007a

Heres the code:

function HT = get_HT_dist(dir_list)


num_files=length(dir_list);


sched = findResource('scheduler', 'type', 'local');
job = createJob(sched);

for i = 1:num_files
    createTask(job, @read_LC_ht_only, 1,{dir_list(i).name});
end

submit(job);
waitForState(job, 'finished');

y = getAllOutputArguments(job);
destroy(job);

HT = cat(2, y{:})';


function ht = read_LC_ht_only(file)

%read the file
ht = textread(file,'%*u %*u %*f %*f %*f %f %*f %*f %*f %*f
%*f %*f %*f %*f %*f %*u %*u %*s');

Subject: Distributed Computing with local Sched is SLOW

From: Narfi

Date: 5 Oct, 2007 03:24:11

Message: 2 of 5

Abel,

Instead of creating 60 tasks, each of which reads 1 file,
try creating 2 tasks, each of which reads 30 files.

When you execute 60 tasks using the local scheduler, you
incur the cost of 60 MATLAB worker startups.
Splitting this into just 2 tasks means that you only pay the
cost of 2 MATLAB worker startups, hopefully creating a good
workload for your dualcore machine.

Best,

Narfi
"Abel Brown" <brown.2179@osu.edu> wrote in message
<fe43r6$5as$1@fred.mathworks.com>...
> So i have 60 files that i would like to read in using one of
> my custom functions read_LC(file).
>
> It takes about 0.5 seconds to read one file.
>
> Now I would like to used the local scheduler with some
> workers to do this much faster than 30 seconds.
>
> But when i create a job with the local sched. and add 60
> tasks it's slooooooooooow! The total time is almost 300
> seconds! Iv read everything i can and im about to delete
> the toolbox unless someone can give me a heads up.
>
> If it matters, im on OSX, iMac intelCoreDuo 2Ghz 2GB ram
> using matlab 2007a
>
> Heres the code:
>
> function HT = get_HT_dist(dir_list)
>
>
> num_files=length(dir_list);
>
>
> sched = findResource('scheduler', 'type', 'local');
> job = createJob(sched);
>
> for i = 1:num_files
> createTask(job, @read_LC_ht_only, 1,{dir_list(i).name});
> end
>
> submit(job);
> waitForState(job, 'finished');
>
> y = getAllOutputArguments(job);
> destroy(job);
>
> HT = cat(2, y{:})';
>
>
> function ht = read_LC_ht_only(file)
>
> %read the file
> ht = textread(file,'%*u %*u %*f %*f %*f %f %*f %*f %*f %*f
> %*f %*f %*f %*f %*f %*u %*u %*s');

Subject: Distributed Computing with local Sched is SLOW

From: Abel Brown

Date: 5 Oct, 2007 05:20:53

Message: 3 of 5

Hi Narfi

I read the doc "Dividing MATLAB Computations Into Tasks"
Where under the section best practices it recommends that if
the function can be evaluated quickly the create a task that
operates on many function values.

How do i do this with the dir_list that i have?

-abel






"Narfi " <narfi.stefansson@mathworks.com> wrote in message
<fe4aor$5ph$1@fred.mathworks.com>...
> Abel,
>
> Instead of creating 60 tasks, each of which reads 1 file,
> try creating 2 tasks, each of which reads 30 files.
>
> When you execute 60 tasks using the local scheduler, you
> incur the cost of 60 MATLAB worker startups.
> Splitting this into just 2 tasks means that you only pay the
> cost of 2 MATLAB worker startups, hopefully creating a good
> workload for your dualcore machine.
>
> Best,
>
> Narfi
> "Abel Brown" <brown.2179@osu.edu> wrote in message
> <fe43r6$5as$1@fred.mathworks.com>...
> > So i have 60 files that i would like to read in using one of
> > my custom functions read_LC(file).
> >
> > It takes about 0.5 seconds to read one file.
> >
> > Now I would like to used the local scheduler with some
> > workers to do this much faster than 30 seconds.
> >
> > But when i create a job with the local sched. and add 60
> > tasks it's slooooooooooow! The total time is almost 300
> > seconds! Iv read everything i can and im about to delete
> > the toolbox unless someone can give me a heads up.
> >
> > If it matters, im on OSX, iMac intelCoreDuo 2Ghz 2GB ram
> > using matlab 2007a
> >
> > Heres the code:
> >
> > function HT = get_HT_dist(dir_list)
> >
> >
> > num_files=length(dir_list);
> >
> >
> > sched = findResource('scheduler', 'type', 'local');
> > job = createJob(sched);
> >
> > for i = 1:num_files
> > createTask(job, @read_LC_ht_only, 1,{dir_list(i).name});
> > end
> >
> > submit(job);
> > waitForState(job, 'finished');
> >
> > y = getAllOutputArguments(job);
> > destroy(job);
> >
> > HT = cat(2, y{:})';
> >
> >
> > function ht = read_LC_ht_only(file)
> >
> > %read the file
> > ht = textread(file,'%*u %*u %*f %*f %*f %f %*f %*f %*f %*f
> > %*f %*f %*f %*f %*f %*u %*u %*s');
>

Subject: Distributed Computing with local Sched is SLOW

From: Narfi

Date: 5 Oct, 2007 17:15:19

Message: 4 of 5

Abel,

I recognize this is not as easy as it should be in 7a. It
is by far much easier and faster in 7b by using the new
parfor (see Loren Shure's blog,
http://blogs.mathworks.com/loren/2007/10/03/parfor-the-course),
but since you're running 7a, we'll try to march ahead using
7a features:

1. Create a vectorized wrapper around read_LC_ht_only(file)
that allows you to read multiple files in a single function
call:

function hts = readWrapper(files)
    hts = cell(size(files));
    for i = 1:length(files)
        hts{i} = read_LC_ht_only(files{i});
    end

You can then verify that readWrapper works correctly by
calling it in your regular matlab session:

names1 = {dir_list(1:30).name};
hts = readWrapper(names1);

You can then create 2 tasks that call readWrapper:
names1 = {dir_list(1:30).name};
names2 = {dir_list(31:end).name};

createTask(job, @readWrapper, 1, {names1});
createTask(job, @readWrapper, 1, {names2});

I hope this helps.

Best,

Narfi
ps. In 7b, the function get_HT_dist wouldn't need to create
any tasks. It could look something like the following:
function HT = get_HT_dist(dir_list)

num_files=length(dir_list);

parfor (i = 1:num_files_
    y{i} = read_LC_ht_only(dir_list(i).name});
end

HT = cat(2, y{:})';

"Abel Brown" <brown.2179@osu.edu> wrote in message
<fe4hjl$bd4$1@fred.mathworks.com>...
> Hi Narfi
>
> I read the doc "Dividing MATLAB Computations Into Tasks"
> Where under the section best practices it recommends that if
> the function can be evaluated quickly the create a task that
> operates on many function values.
>
> How do i do this with the dir_list that i have?
>
> -abel
>
>
>
>
>
>
> "Narfi " <narfi.stefansson@mathworks.com> wrote in message
> <fe4aor$5ph$1@fred.mathworks.com>...
> > Abel,
> >
> > Instead of creating 60 tasks, each of which reads 1 file,
> > try creating 2 tasks, each of which reads 30 files.
> >
> > When you execute 60 tasks using the local scheduler, you
> > incur the cost of 60 MATLAB worker startups.
> > Splitting this into just 2 tasks means that you only pay the
> > cost of 2 MATLAB worker startups, hopefully creating a good
> > workload for your dualcore machine.
> >
> > Best,
> >
> > Narfi
> > "Abel Brown" <brown.2179@osu.edu> wrote in message
> > <fe43r6$5as$1@fred.mathworks.com>...
> > > So i have 60 files that i would like to read in using
one of
> > > my custom functions read_LC(file).
> > >
> > > It takes about 0.5 seconds to read one file.
> > >
> > > Now I would like to used the local scheduler with some
> > > workers to do this much faster than 30 seconds.
> > >
> > > But when i create a job with the local sched. and add 60
> > > tasks it's slooooooooooow! The total time is almost 300
> > > seconds! Iv read everything i can and im about to delete
> > > the toolbox unless someone can give me a heads up.
> > >
> > > If it matters, im on OSX, iMac intelCoreDuo 2Ghz 2GB ram
> > > using matlab 2007a
> > >
> > > Heres the code:
> > >
> > > function HT = get_HT_dist(dir_list)
> > >
> > >
> > > num_files=length(dir_list);
> > >
> > >
> > > sched = findResource('scheduler', 'type', 'local');
> > > job = createJob(sched);
> > >
> > > for i = 1:num_files
> > > createTask(job, @read_LC_ht_only,
1,{dir_list(i).name});
> > > end
> > >
> > > submit(job);
> > > waitForState(job, 'finished');
> > >
> > > y = getAllOutputArguments(job);
> > > destroy(job);
> > >
> > > HT = cat(2, y{:})';
> > >
> > >
> > > function ht = read_LC_ht_only(file)
> > >
> > > %read the file
> > > ht = textread(file,'%*u %*u %*f %*f %*f %f %*f %*f %*f %*f
> > > %*f %*f %*f %*f %*f %*u %*u %*s');
> >
>

Subject: Distributed Computing with local Sched is SLOW

From: Abel Brown

Date: 8 Oct, 2007 13:34:43

Message: 5 of 5

Hi Narfi,

I implemented exactly as you stated. in the end i had to
give the createTask function {{names1}} instead of {names1}
. I think

createTask(job, @read_LC, 1, {names})

trys to put every element of names into one function call of
read_LC because i was getting the error "Too many Arguments"
when i submitted the job.

Thanks again for all your help!

-abel



"Narfi " <narfi.stefansson@mathworks.com> wrote in message
<fe5rf7$sm7$1@fred.mathworks.com>...
> Abel,
>
> I recognize this is not as easy as it should be in 7a. It
> is by far much easier and faster in 7b by using the new
> parfor (see Loren Shure's blog,
>
http://blogs.mathworks.com/loren/2007/10/03/parfor-the-course),
> but since you're running 7a, we'll try to march ahead using
> 7a features:
>
> 1. Create a vectorized wrapper around read_LC_ht_only(file)
> that allows you to read multiple files in a single function
> call:
>
> function hts = readWrapper(files)
> hts = cell(size(files));
> for i = 1:length(files)
> hts{i} = read_LC_ht_only(files{i});
> end
>
> You can then verify that readWrapper works correctly by
> calling it in your regular matlab session:
>
> names1 = {dir_list(1:30).name};
> hts = readWrapper(names1);
>
> You can then create 2 tasks that call readWrapper:
> names1 = {dir_list(1:30).name};
> names2 = {dir_list(31:end).name};
>
> createTask(job, @readWrapper, 1, {names1});
> createTask(job, @readWrapper, 1, {names2});
>
> I hope this helps.
>
> Best,
>
> Narfi
> ps. In 7b, the function get_HT_dist wouldn't need to create
> any tasks. It could look something like the following:
> function HT = get_HT_dist(dir_list)
>
> num_files=length(dir_list);
>
> parfor (i = 1:num_files_
> y{i} = read_LC_ht_only(dir_list(i).name});
> end
>
> HT = cat(2, y{:})';
>
> "Abel Brown" <brown.2179@osu.edu> wrote in message
> <fe4hjl$bd4$1@fred.mathworks.com>...
> > Hi Narfi
> >
> > I read the doc "Dividing MATLAB Computations Into Tasks"
> > Where under the section best practices it recommends that if
> > the function can be evaluated quickly the create a task that
> > operates on many function values.
> >
> > How do i do this with the dir_list that i have?
> >
> > -abel
> >
> >
> >
> >
> >
> >
> > "Narfi " <narfi.stefansson@mathworks.com> wrote in message
> > <fe4aor$5ph$1@fred.mathworks.com>...
> > > Abel,
> > >
> > > Instead of creating 60 tasks, each of which reads 1 file,
> > > try creating 2 tasks, each of which reads 30 files.
> > >
> > > When you execute 60 tasks using the local scheduler, you
> > > incur the cost of 60 MATLAB worker startups.
> > > Splitting this into just 2 tasks means that you only
pay the
> > > cost of 2 MATLAB worker startups, hopefully creating a
good
> > > workload for your dualcore machine.
> > >
> > > Best,
> > >
> > > Narfi
> > > "Abel Brown" <brown.2179@osu.edu> wrote in message
> > > <fe43r6$5as$1@fred.mathworks.com>...
> > > > So i have 60 files that i would like to read in using
> one of
> > > > my custom functions read_LC(file).
> > > >
> > > > It takes about 0.5 seconds to read one file.
> > > >
> > > > Now I would like to used the local scheduler with some
> > > > workers to do this much faster than 30 seconds.
> > > >
> > > > But when i create a job with the local sched. and add 60
> > > > tasks it's slooooooooooow! The total time is almost 300
> > > > seconds! Iv read everything i can and im about to
delete
> > > > the toolbox unless someone can give me a heads up.
> > > >
> > > > If it matters, im on OSX, iMac intelCoreDuo 2Ghz 2GB ram
> > > > using matlab 2007a
> > > >
> > > > Heres the code:
> > > >
> > > > function HT = get_HT_dist(dir_list)
> > > >
> > > >
> > > > num_files=length(dir_list);
> > > >
> > > >
> > > > sched = findResource('scheduler', 'type', 'local');
> > > > job = createJob(sched);
> > > >
> > > > for i = 1:num_files
> > > > createTask(job, @read_LC_ht_only,
> 1,{dir_list(i).name});
> > > > end
> > > >
> > > > submit(job);
> > > > waitForState(job, 'finished');
> > > >
> > > > y = getAllOutputArguments(job);
> > > > destroy(job);
> > > >
> > > > HT = cat(2, y{:})';
> > > >
> > > >
> > > > function ht = read_LC_ht_only(file)
> > > >
> > > > %read the file
> > > > ht = textread(file,'%*u %*u %*f %*f %*f %f %*f %*f
%*f %*f
> > > > %*f %*f %*f %*f %*f %*u %*u %*s');
> > >
> >
>

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
distributed com... Abel Brown 4 Oct, 2007 21:30:08
workers Abel Brown 4 Oct, 2007 21:30:08
slow Abel Brown 4 Oct, 2007 21:30:08
rssFeed for this Thread

Contact us at files@mathworks.com