|
I am running my codes on a remote cluster, my submit script like this
clusterHost = '****';
remoteDataLocation = '/home/me/Matlab';
sched = findResource('scheduler', 'type', 'generic');
set(sched, 'DataLocation', '/home/me/Matlab/jobData');
set(sched, 'ClusterMatlabRoot', '/share/apps/mtlb');
set(sched, 'HasSharedFilesystem', false);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @sgeGetJobState);
set(sched, 'DestroyJobFcn', @sgeDestroyJob);
set(sched, 'ParallelSubmitFcn', {@sgeNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
pjob = createMatlabPoolJob(sched);
set(pjob, 'FileDependencies', {'mycode.m'});
set(pjob, 'MaximumNumberOfWorkers', 12);
set(pjob, 'MinimumNumberOfWorkers', 12);
t = createTask(pjob,@mycode,1);
submit(pjob);
After I submit my job, I can qstat my job on cluster and it says my job is running. I used get(pjob) on client and it says my job is running.
After a long time, when I went to check my job, I found that I couldn't find anything when I typed qstat, the running job has gone. but get(pjob) on client still said my job is running. Why did this happen?
If I change my code to small one (only set parfor loop to small), everything worked well and I got the results. This small one only needs several minutes to run. So it worked well.
Thank you so much.
|