Path: news.mathworks.com!not-for-mail
From: Edric M Ellis <eellis@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Parallel configuration validation in SGE env
Date: Thu, 05 Nov 2009 13:06:46 +0000
Organization: The Mathworks, Ltd.
Lines: 41
Message-ID: <ytwiqdpf7dl.fsf@uk-eellis-deb5-64.mathworks.co.uk>
References: <hcrjih$ep5$1@fred.mathworks.com> <ytwvdhqfude.fsf@uk-eellis-deb5-64.mathworks.co.uk> <hcrsl6$8d9$1@fred.mathworks.com> <ytwr5sefgln.fsf@uk-eellis-deb5-64.mathworks.co.uk> <hcuglu$iq4$1@fred.mathworks.com>
NNTP-Posting-Host: uk-eellis-deb5-64.mathworks.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: fred.mathworks.com 1257426407 29666 172.16.27.232 (5 Nov 2009 13:06:47 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 5 Nov 2009 13:06:47 +0000 (UTC)
X-Face: $Ahg}Iylezql"r1WV1Me5&)ng"a4v%D>==KMs-elCfj"o}$bh-VOt7lVXgLWsC?9mZ`mINT
 G6PDvca;nrgs$lfcr0l1ew'N]>nXKl}m|Zpg>,6*gLp~-N0N2*+b.iwv=u>@R$L4SEG&NYUU;lSR@u
 IHphdAy
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
Cancel-Lock: sha1:UpPoUkTQsXgKJSWa5uyu+DyQRps=
Xref: news.mathworks.com comp.soft-sys.matlab:582705


"Rafael " <rafael.fritz@physik.uni-marburg.de> writes:

> Edric M Ellis <eellis@mathworks.com> wrote in message <ytwr5sefgln.fsf@uk-eellis-deb5-64.mathworks.co.uk>...
>
>> Sorry, my mistake - I forgot that the generic scheduler doesn't have a
>> getDebugLog method - all it would do is print the contents of the output files,
>> like these:
>> 
>> /home/fritzra/matlab/hello_test_files/Job16_Task1.out
>> 
>> Is there anything interesting in there?
>> 
>> Cheers,
>> 
>> Edric.
>
> Let's see - in Job16_Task1.out one finds just the following:
>
> "Executing: /local/matlab/bin/worker "
>
> Thats always after starting distributed jobs.  But never any other output like
> it is the case for parallel jobs where I get something like Job15.mpiexec.out
> with content like "starting smpd on hosts ..." and so on.  This executed
> worker is just the unchanged worker script given by mathworks.  So, not really
> interesting content in this output... ?!?

That's really strange. I would expect to see at least the MATLAB startup banner
text and so on, even if there was something else going wrong. I assume that
"/local/matlab/bin/worker" is the right location on the cluster (otherwise
presumably the parallel stuff wouldn't work). 

Is there any chance you could work out which node on the cluster your
distributed job is being scheduled onto and trying to run
"/local/matlab/bin/worker" there? It wont do anything terribly useful, but would
at least confirm that MATLAB can start up there... (You could add a "hostname"
command to the line before the "exec" in sgeWrapper.sh to find out where the job
is running).

Cheers,

Edric.