Why does matlab stop "executing" code after a long time?

I am using matlab on a Windows 10 machine with 8 gb of ram.
The system I am simulating has a degree of randomness in it which is why I need to make different runs of the same code in a loop and average over all of them. An outline of my code is like:-
% create and initialise variables 'mat' and others
for run=1:20
currMatForRun = includeRandomness(mat)
%processing with currMatForRun
end
One run takes around 2 hours.
The memory consumed seems to be constant at around 70%(I checked with resource monitor) and I don't think there is a memory leak. I am comfortably able to use my computer for browsing etc even when the code is running.
But after running for about 14-15 hrs, the code execution sort of freezes up, I get nothing on the console anymore. (I get an update roughly every two minutes). I tried waiting for 8 more hours but still nothing.
The memory and cpu usage in the resource monitor go down almost to zero and I get no error in the matlab console.
I tried the entire process 5 times, the time of freeze seemed to be similar for each time.
I am using a standalone version of matlab, I can run it without the internet. (So, I don't think there is a license polling problem, not completely sure though).
This is my first post, so not sure if I have given enough information, please tell me if you need anything else.
Edit:- I did another run and it stopped at a similar time. I did monitor resource consumption this time. Attached image. Legend is as follows
The blue is the processor use. Yellow is Working set, Red is private working set, green is page file bytes (virtual memory) and pink/magenta? is IO Data in Bytes per second.
I waited for a while before stopping the recording, and the time where the process stops can be seen clearly from the processor curve dip. If you think monitoring any other parameter might be useful, please tell me.
Things seem to be as I expected them (atleast on the surface), so still not sure what's wrong, ideas/thoughts appreciated!
Edit 2: I did two more things to check.
  • I took out the random part of the code and ran the basic part of the code meaning I was pretty much running the same code over and over again with no changes.Things were completely deterministic. The program still got stuck after a similar amount of time.
  • I ran a 10000 variable equation solving (as suggested by John in the comments) 5400 times which is ~ 15 hours and this ran without a hitch.
Overall the confusion has deepened, but hopefully the night is darkest before dawn.

15 Comments

We don't have your code, and have no clue what you are doing inside there. The problem could be multiple things. I might guess at virtual memory usage, so disk thrashing, or some sort of memory leak. Maybe it might be graphics related. Does your code involve user written Mex files?
When your code runs, check a monitor. Is it using all 8 GB of RAM? How much free disk space do you have?
When this happens, you might check to see if there is actual activity, in the form of disk thrashing, even though it seems nothing is happening. But really, you would need to provide some useful information, and even if you did provide code to test, someone might need to spend a day or so testing your code.
Thanks for your reply John.
From looking around online, I figured it might be virtual memory too, so I set maximum virtual memory to double the size of the RAM, that is 16 gb. But that had absolutely no effect on the program.
My code does not use any mex files, I've written everything as m files. And the memory consumption is pretty much constant at around 2.4 GB which is also why I don't think there is any memory leak.
The reason I am somewhat confounded is between my different runs, there is absolutely no difference apart from the randomness introduced, which should be well, random. But the program always seems to stop after running for around 15 hours with fairly constant memory use.
As for disk thrashing, I think that should make the entire system slow which is not something I've seen, adding to the confusion.
Also since I don't have any plotting or figures in my code, I have run it this time from command line. I am monitoring the memory, cpu usage and drive usage. I'll update the question here with the details to that after it is done.
Also, if you think there is something else I could do/monitor which would be useful, please do tell me.
Thanks.
Is there an iteration internal that could be triggering an infinite loop with just the right combination of values, perhaps...
What about for debugging purposes
  1. instrument the routines more thoroughly to see where is in code execution and
  2. use a consistent RNG seed so can reproduce runs when trying to track down what's causing the (apparent) hangup.
There are no things that stand out in my mind. Beyond watching the state of the system using a monitor, I'd want to add in sufficiently many writes to the command window, that you can identify where it is hanging. Not a lot of information, so it slows things to a crawl, but enough that you can see what exactly it may be doing when it decides to freeze.
The consistent time to failure suggests to me that you are running out of some resource. But what is running dry is not obvious from what you have said.
Since there is randomness in each iteration, suppose you FIXED the random seed Before each iteration? So that it is doing exactly the same computations in each iteration? If it still fails at the same point, then you know for sure there is some resource that has run dry.
It really sucks that this takes 15 hours just to see it go dead. Perhaps my best suggestion is to give up after 14 hours. Save everything. Then restart MATLAB, and restart the computation, getting another 14 hours. That is the give up option of course. :)
Thanks dpb, John.
I monitored the run this time and have edited the question with the details. Please take a look. Resource consumption seems to be pretty much constant.
Also as both of you suggested, I am doing the next try with the same seed for every run, let's see what happens then.
In the worst case, since I only need to make random changes, I am writing the output of every run, so it is not as though the complete run goes down the drain. :)
Where in the trace did the apparent freeze show up?
I'm thinking more in terms of some internal-to-Matlab issue like are you using higher-level data constructions like structures, table, even cell arrays that could be causing nesting issues in iterative loops or somesuch.
Is there any iteration of any sort in the computations or an ode solver or the like? If it were all "just" straight matrix algebra/computation it doesn't seem possible, but with the abstract data structures there's a lot of behind the scenes activity besides just algebra.
I'd've put my money on graphics handles or the like but you say not using graphics--by any chance using one of the routines that has optional graphics that has differing behavior with/without return values? Mayhaps one of those is generating some internal graphics handles even tho no visible plot owing to coding error? Grasping at anything can think of here, obviously...
The instrumented run to try to track down just where it is when it "dies" will probably be the trick...
I don't suppose if you try to profile it that it will run fast enough to be practical at all...
Is there any way you can factor the app and call sub-pieces of the calculation to check on their performance with high numbers of calls before putting the whole thing together so can eliminate pieces?
Or, conversely, can you comment out calls to lower levels and see if can find one that will make the symptom disappear? The latter would seem to necessitate having a second test machine as that time would be lost for any useful output whereas the complete case at least does generate something useful each test.
This is becoming frustrating. For you too I suppose. Is it possible that after 15 hours or so, if the sound is turned on, you hear the voice of either Cleve Moler or Bill Gates saying, "I'm getting bored. Lets go do something else." :)
You have not mentioned any user written MEX code, or something that uses graphics heavily. Those are the things I'd suspect first.
It feels like something hardware related. I recall a utility that would allow you to monitor the temperature of all system components. We used it on a Mac laptop that was getting a bit overheated with heavy use. Can you find something like that for Windows? Is it possible that your machine is getting overheated with heavy use? If that were so, then you would have problems browsing when MATLAB is frozen.
I tried this, watching the memory required on my machine. It goes to about 2.3GB, and stays there until the solve is done.
clear
A = rand(10000);
b = rand(10000,1);
tic,c = A\b;toc
Elapsed time is 18.660448 seconds.
15*60*60/18
ans =
3000
For example, I tried the above on my machine. MATLAB itself uses 0.8 GB for me, with a clear workspace. Solving a 10Kx10K liner system requires about 18 seconds for me, with 4 processors on the job. 3000 such solves should use about 15 hours. (And it would have my system fan running hard for that time.)
So you might run a test like this:
for i = 1:3000,c = A\b;end
Does a similar problem happen? If so, then it suggests a hardware problem.
Or, suppose you run your code on a different machine? Does it crap out in the same way?
Thanks for your comments guys.
@dpb I am using containers.Map() in my code but that is sort of a precursor. Basically, I used the adjacency map to make an adjacency Matrix. Therefore the part where program gets stuck in is pretty much only matrices.
The only non-trivial part is when I have to exponentiate the matrix where I am using the Pade approximation code by Expokit. But I am not sure if the problem does lie there, since I tried a run after removing all randomness and it still got stuck. I was pretty much running the same code in a loop and the fact that it still got stuck I think means that the problem is not there. (Atleast I think so, not 100% confident about this.)
@John I tried your piece of code and ran it for ~ 15 hrs and it ran quite smoothly. So not quite sure right now.
And I did run it on a linux mini-cluster some time ago and I remember something similar happening there though it was not as consistent. Sometimes it would run to completion and sometimes not. It had 96gb RAM and I did share it with other users meaning we had concurrent programs running.
Why do I feel like I'm in a game of "20 questions from hell", where the rules are not only don't we know what question to ask, but nobody knows the answer? :)
Lol, I know right! :)
I did find another question from 2013 where the poster had almost exactly the same problem. (atleast the symptoms match) Dunno if he ever figured out what was wrong.
Ok, so it looks like the problem is not your CPU. If the problem happens on another machine, then I'd bet the issue is in some code, perhaps containers.Map, or something that is written as Mex. My conjecture is the issue is with a bug in some C code, where something is not getting done right. If you are can, your best bet might be to send it into technical support, since this feels to me like a bug in their code.
(Oh, the worst part of 20 questions from hell, is the moderator is Calvin from the cartoon strip Calvin & Hobbes, who reserves the right to change the rules at a whim.)
I'm having very similar issues with my script ... did you end up finding any solutions?
@Tristan Graham: The description of the problem was not exact enough to locate the source of the problem. So how can you know, that you have similar issues? Please open a new thread and post more details. Use e.g. some output to the command window to narrow down where the problem occurs. Store intermediate states of the variables, such that you can start a debugging without the need to wait 15 hours.
I too am facing the same issues I have timeline to meet and variables calculated are in the matlab its not responding. I am waiting for the one hour after pressing the pause button this is quite not expected. Is there a place where Matlab saves all the variables which we can access and save before we shut down the matlab forcefully. I am using Mac OS X Big Sur.
Well, MATLAB has memory allocated for everything, yes, but it is not user-accessible from the outside by anything except a system monitor and all that will be is a memory dump...which won't be useful for your purpose.
As the OP of this thread noted, in his case he could write the results of each iteration to an output file during execution(*) and so retrieve what results were available; I'd suggest that would be your option as well -- save what you are computing and need/want periodically to disk.
(*) Which raises a Q? I don't think I recall being asked -- did he monitor that file -- could he have neglected to close file handle on that and that was the resource limit or somesuch?

Sign in to comment.

Answers (0)

Categories

Find more on Graphics Performance in Help Center and File Exchange

Asked:

on 10 Jun 2017

Commented:

dpb
on 18 May 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!