|
Thank you us and Edric,
I haven't had much luck iwth tech support since I have a student version.
------------------------------------------------
> Did you submit a non-interactive job, or did you open MATLABPOOL? Were you using
> remote machines, or only local workers?
------------------------------------------------
I don't know what a 'non-interactive job' is... but I did open a MATLABPOOL, and opened 4 local workers.
------------------------------------------------
> Were you using SPMD or PARFOR when doing this?
------------------------------------------------
I was using PARFOR
------------------------------------------------
> I appreciate that the error message is a little cryptic, this basically simply
> means that the connection from the desktop MATLAB to the workers was
> unexpectedly severed. Under certain circumstances, this error gets reported
> asynchronously - where possible, we do try to report the error in such a way
> that the execution gets aborted in the expected way. Without knowing more about
> your computation, it's hard to tell what went wrong.
------------------------------------------------
This same problem has occured dozens of times during the last week.
The program continues running indefinitely, but after some amount of time I notice in my task manager than matlab.exe has gone from using 100% cpu to 0% cpu.
So since matlab is no longer doing anything, I go to the command window and press CTRL+C.
The computation stops, and then about 10 seconds later I get the message:
??? A read error occurred while reading from lab 2. This is causing:
java.net.SocketException: Connection reset
This happens so consistently that I can actually predict what the error message will be when I notice that the cpu usage has gone to 0% and yet the program 'appears to be' still running.
What's strange is that when I rerun the EXACT same program with no modifications, the program sometimes proceeds to completion, but sometimes will not and in this case will give the above error message 10 seconds after pressing CTRL+C. (sometimes the error will be from reading lab 3 or lab 4)
So the problem is not deterministic, which makes me believe either ML is doing something non-deterministically, or something on my computer is interfering with it [ although this has also occured at night time with nothing else running ]
The way I got around this was writing a script that says "if CPU usage goes to 0% AND matlabpool is still open, exit matlab and rerun the exact same code"
And after about 5 iterations (sometimes less), the code will run to completion.
But it would still be nice to understand WHY this connection between the desktop ML and its labs gets severed, and how to avoid it in the future. I'm quite sure it's not depletion of RAM.
|