MATLAB Answers

0

Processor usage

Asked by Juan P. Viera on 23 Mar 2012
So, I wrote a quite complicated model of an air bearing. As I've been developing this model for three years, it started out very simple and has been getting complex. So, at the beginning I just used variables in the workspace and handled them to the functions one after the other. Now the model and the variables list is HUGE. Yesterday the code was running just fine, using both cores to 100%. Sometimes I ran it in another computer I have which has 4 cores and it would run the 4 cores to 100%. So yesterday I thought of the brilliant idea of grouping these variables to structures, so my code would be a lot more elegant and it would be easier for me to handle data from one function to the other and etc. I commited the BIG mistake of not saving a copy of this running code 'cause I never thought this would happen but, anyway. Today the code runs fine, but WON'T use the processors to 100%. I have tried reading on the web about this multi core stuff but I can't really understand what could have changed in the code for it to run that much slower. So I ran the profiler to see if by re writing a lot of the code and how functions handle data I wrote a bottleneck I did not expect and NO, the profiler recognizes the most heavy parts as lines that were just like that before I re-wrote the code. I like the new code a lot more since it is WAY cleaner than the huge-variable-list functions I used to have. So I would like to know what could have changed that would not allow the processor to run at full capacity.
To summarize, I had a code with a big variable list in the workspace, and I handled them to functions separately, so I thought of grouping these variables to different structures, and just handle 3 or 4 structures to each of the functions I call. That is basically all I did.
PS: And just to test if it was something wrong with my MATLAB or my PC, I ran another previous model I had and both processors went to 100% right from the beginning, so I is definitely something with the new structured code.
Thanks in advance.

  1 Comment

Stephen
on 23 Mar 2012
To my knowledge, extra processors will only kick in if the MATLAB algorithm forecasts a level of speed improvement above a certain threshold, because partitioning calculations to into multiple threads adds computation time. So if your computer has dormant cores, it could be because MATLAB doesn't want to steal the processor speed from internet browsing while you are waiting for results.
Without the original code it is hard to perform a benchmark test on your code, but the first step in solving this question is to check to see if the code is running faster or slower isn't it? If your code was inelegant and required all your processors to execute, and now your code is elegant and does not utilize all processing available, that could mean you changed something for the worse, OR it could mean you made your code much more efficient and the extra power isn't necessary.

Sign in to comment.

9 Answers

Answer by Juan P. Viera on 24 Mar 2012

Mmmmm. The thing is the task manager is showing that all of the cores are being used, there is no dormant core, it is just that neither of the cores will go 100%, for some reason.
I understand that without the code it is difficult for you or anyone to help me but the code kind of extensive, and fundamentally I have not changed too much, I just changed the way the code is structured.
The matlab profiler shows that the bottlenecks are lines that were writen before all this code changing, and the time those lines (extensive math calculations) consume is way, WAY higher than the time the new lines consume, so I don't really understand why can't the processors reach 100% and calculate things quick.
Thanks again
JP

  0 Comments

Sign in to comment.


Answer by Juan P. Viera on 26 Mar 2012

And to clarify even more, all these "times" I talk about, I obtained them by the MATLAB profiler. For example, the new lines (where I pack stuff into the structures, or unpackthem) take like 0.01 secs, while the old lines (that were just like that before I changed to the structured code) take like 14 secs. The actual time does not matter but the ratio between them is about that.
I don't know much about this, since it is the first time I have had these performance issues, but I THINK the problem is not related to the time MATLAB spends on each line, but to something I did that will not allow the processor to go 100% and calculate things quick. Maybe something related to me, adding functions to the code that are not multithreaded or something like that.
When writing the new structured code, I did add the ASSIGNIN function to unpack structures (because I did not want to re write functions, not because of being lazy but because the mathematic expressions would get really ugly in there with all the Struct.variable synthax). But again, the profiler says that the time spent in the function that unpacks the structures (which calls ASSIGNIN) is very close to zero, very small, even being called like 156.000 times, the total spent time on it won't reach 2 secs. Still, as I said before, I think the issue is not a bottleneck problem but something slowing down ALL of the processing.
Could it be memory related? I have 4GB and all the time I have more than 2 available so I don't know.
PLEASE, some light here!

  0 Comments

Sign in to comment.


Answer by Titus Edelhofer on 29 Mar 2012

Hi Juan,
this is indeed most likely a multithreading issue: unless you are using the Parallel Computing Toolbox ("explicit parallelism"), MATLAB will use more then 25% on a quad core (13% if with hyperthreading) only if it calls linear algebra routines that are multithreaded (search doc for multithreaded and you will find some explanations and functions that have been changed to support multithreading). But in any case the profile should tell you the lines that got slower (or faster). May be it's worth to start the profile including builtin functions:
profile on -detail builtin
Titus

  0 Comments

Sign in to comment.


Answer by Juan P. Viera on 29 Mar 2012

Thanks for your reply, really.
I also think it is a multithreading issue. But what doesn't make sense is that I could push the 4 cores to 100% before changing stuff in the code. Does this mean that:
1. ASSIGNIN not being multithreaded? It is the only weird function I added since I changed the code. It is important to remember that before the changes the code ran faster (I think it is because the 4 cores were running to the top (100%), which is not the case now). I mean, there is not a new bottleneck or critical line I added, the profiler shows that the critical lines are lines that I did not touch and where there waaay before all of this. It is just A LOT slower and now processors won't reach 100% load. So obviously it is NOT a more efficient code just using what it needs out of the processors.
2. If the problem is with ASSIGNIN, shouldn't the profiler detect that it is this functions which is slowing down everything else? Why does the profiler say that the fucntion that calls ASSIGNIN is super fast? Of course, if you run the whole code with just 1 core, then the profiler shows logical stuff, but to my opinion it should also show bottlenecks related to something in the code not letting your program keep the processors to their limit, or something like that.
I already read the docs on multithreading but they lack a lot of info about these cases, in my opinion. For example, if I have a script that can run everything multithreaded and then I add a function that is not multithreaded at the middle, will it run everything with just one core? Or will it switch from multithreading to just one core and back again to multi?
I am really confused, and what I really think is happening is that ASSIGNIN is not multithreaded, but MATLAB runs everything else multithreading, switches off to 1 core to run this functions, and then switches back to multi cores to run the rest (is this even possible?). If this is the case it means that this multi-mono core alternating REALLY slows down the processing, and this is not shown by the profiler or anything else. In this case the profiler should say something like "this line is the only line that you cannot process with N cores, this will slow your code like you wouldn't believe", or something like that haha
I don't have too much time to try and write the code without ASSIGNIN to see if that is the problem, but when I do I will post what I found. In the mean time more opinions are very welcomed.

  1 Comment

Titus Edelhofer on 29 Mar 2012
Hi,
one more comment on multithreading: general speaking, MATLAB is single threaded. It's only that some of builtin functions like eig, multiplication, qr etc. are multithreaded. So it's not, that a whole function or script is multithreaded.
Some other thing that comes to my mind: assignin could have another effect, it changes workspaces in a way the accelerator (JIT) in MATLAB doesn't like. It could be, that the JIT handles cases with explicit variable transfer better. But again, without taking a closer look it's probably viping dust only ...
Titus

Sign in to comment.


Answer by Titus Edelhofer on 29 Mar 2012

Hi Juan,
in the (are you using Windows?) taskmanager you can assign a process (i.e. MATLAB) to run on one (or more) cores. Doing this you would compare codes without multithreading differences ...
Assignin is definetely not multithreaded, since it "just" moves variables around. Although it is not a preferred way of variable passing, it on the other hand should not be responsible (really) ...
Sorry for not having much more advice ...
Titus

  0 Comments

Sign in to comment.


Answer by Juan P. Viera on 29 Mar 2012

That is what I thought about ASSIGNIN when first thinking of using it, but then I can't understand why the program wouldn't push all the cores to their limit, if the only change I did (fundamentally) was to pack variables into structures and the use of ASSIGNIN to unpack these structures inside one function.
I know what results will I get if I selec just one core for MATLAB in the task manager, it will be slower. I will try it out anyway, but I am sure it will be slower because from what I see, I am using both cores, just not to their limit like I did before.
Lets see if I have time tonight to do some tests.

  0 Comments

Sign in to comment.


Answer by Juan P. Viera on 12 Jul 2012

I thought I had posted here the solution, but it seems I did not. Anyway, like two days after my last post here I had a bit of free time that I could put into the code, removed the f*cking ASSIGNIN function and problem solved, both cores now are pushed to 100% all the time.
The real reasons for the program slowing down that much because ASSIGNIN was in the middle: I don't really understand. MY OPINION is that the program is running multithreaded (it is a numerical model, basically what it does is manipulate vector, matrices, and numerical data in general) and then gets to ASSIGNIN, has to switch to 1 core, and then switch back to 2. This alternating between 1 and 2 cores REALLY slows down the complete processing. That may be impossible and I may be saing things that don't make sense, I don't really know, but that is was happened and what the Windows task manager showed (both processors being used, but oscillating between 50% and 70% usage).
Now, when I hit F5 like a boss, both of my processors go up to 100% and stay there. The program runs MUCH faster.
Either way, if the real reason is what I think it is, or if it is not, I really think there should be more info on this kind of stuff. All you can find on performance is "run the profiler" but this problem clearly was not advised by the profiler.

  0 Comments

Sign in to comment.


Jan
Answer by Jan
on 12 Jul 2012

Multi-threading and 100% processor usage are no guarantee for small run-times. Variables created by the evil ASSIGNIN, EVAL or EVALIN create entries in the dynamic lookup table of variables, while the hard coded transfer of variables as input and output allow the JIT acceleration to use a direct adressing. After the dynamic lookup table has been populated, Matlab has to check it even for calls of built-in functions: Imagine that you create a variable called "sin" by ASSIGNIN and use "sin(2)" in the main function. The decision if "sin(2)" means the function or the variable wastes time. Btw. this is handled even differently in debug and non-debug mode - cruel!
It is very easy to create a multi-threaded function with 100% load on 8 cores, which needs exactly the same run-time as a single-threaded version: False cache sharing can move the bottleneck from the calculations to the memory access. A standard example is a large matrix of type single, which is process row-wise in different threads. Then each write access to the matrix cause a time-consuming update of the cachelines of the other cores. Therefore "multi-threading" does not only mean to let more cores process the same data, but an adjusted data representation is required for efficient processing.

  2 Comments

Juan P. Viera on 12 Jul 2012
Well, you clearly know a lot more about this than I do, sincerely. But, just for the sake of the documentation of this problem I had (so future users find this post and solve their similar problems), could you explain why you think the program was so much slower?
I am going to summarize everything that happened really quickly here. Basically, very simplified, the code flow was like this:
1. Main scrypt to define variables and stuff
2. Call for a big function (lets call it A), handling a huge list of variables.
3. Inside A, call for the function (B) that does the hardest processing. In this function there are a lot of variable coming from the arguements, and the mathematical expresions are really ugly.
4. Finish with B, back to A with the results of B, check some things, call for other not so trouble-maker functions that just do math, and then call B again for the next step.
5. Repeat 3-4 a lot of times and then get the final results.
The problem was that, as the bearing model was geating more complex each day, the arguments for functions A and B were getting HUGE, so I decided to pack all these variables into structures. One for geometrical data, one for the numerical method data, one for the operation conditions, and so on. So in the end I have like 5 structures, and they are not too big. Then, the idea was to handle just the 5 structures to the functions, instead of the huge list, making the code a lot cleaner, more user friendly, and easier to improve or add future variables and stuff.
BUT, because function B used a lot of these variables in its expressions, and because the mathematical expresions there get very long and ugly, I didn't want to re-write the functions to the new Structurename.variablename synthax or format, so I decided to make myself a functions that would "unpack" the structure, into variables in the workspace, that have the same names they have in the structure, as fields. And this is were I introduced ASSIGNIN.
So before I did all this, the 2 cores would reach 100% and the program would run fast (relatively, it can still take 15mins or so to do 1500 steps in A). Then I added the structure stuff and ASSIGNIN and stuff, and the program ran MUCH slower, I think it would last an hour or so to do the same 1500 steps. Then, I finally had time to fix the code, removing ASSIGNIN and the program runs just like before.
That summarizes everything that happened.
Could you explain (for normal people and users) the reason behind everything that happened?
Thanks again.
Jan
on 12 Jul 2012
For "normal" people or for Matlab users? :-)
Without seeing the code and a detailed analysis, explanations are more or less pure speculations only. So the following is not an educated guess:
ASSIGNIN creates a variable in the caller's (or base) workspace. There this variable appears magically, and the parser of the M-file cannot know, where this variable is comming from or which type it has. Therefore the variable (better: a pointer to it) has to be stored in a dynamically created list, a lookup table. When this variable appears the next time in the Matlab code, this table has to be checked at first to find the pointer to the actual data. But Matlab has to decide in addition, if the name is a local function, function in the current folder, user-defined function, builtin function or Java class.
In opposite to this hard coded variables and variables provided as output of other functions can be accessed more efficiently, because it is known already, that they are variables.
A similar problem appears, when the type of a variable is changed inside a function. Then the JIT acceleration cannot use the pointer to the variable directly, but obviously a lookup table method is applied.

Sign in to comment.


per isakson
Answer by per isakson
on 12 Jul 2012
Edited by per isakson
on 12 Jul 2012

In discussions on performance of Matlab version of Matlab, the OS and the use of the Parallel Toolbox are all important.
According to a previous answer by Titus Edelhofer (The Mathworks)
... MATLAB is single threaded. It's only that some of builtin
functions like eig, multiplication, qr etc. are multithreaded.
Thus, your code is dominated by builtin functions, which are multi-threaded. That's why you see 100% cpu usage.
You ask
Could it be memory related? I have 4GB and all the time I have
more than 2 available so I don't know.
I have fooled myself lately by focusing on Available memory. I run R2012a 64bit on Windows 7 with 8GB.
Does Free Memory ever decreases to low values (/zero)? If it does that is part of the problem. I find the behavior of Windows' System Cache difficult to understand. And the task managers way of showing memory usage a bit misleading. You don't read large files(?)
Your "unpack" function creates a bunch of new variables. That doesn't requires much memory until values of the new variables are changed (lazy-copy). Does that happen to large arrays?
It is possible to get information on the memory usage from profile. See: Undocumented profiler options
Matlab has this piece of magic code, which they call "Accelerator", part of which is a just in time compiler, JIT. (Or Accelerator is JIT?) The Mathworks develops the accelerator actively and they argue that we, the ordinary users, should not craft our code to fit the current state of the accelerator. The accelerator is not documented.
My guess: The major reason for the performance issues you see is related to the Accelerator. As Jan describe variables "popping up in the workspace" (my words) certainly causes problems to JIT. assignin, eval, load('...mat), etc. does that.
The Julia Language indicates that the accelerator can be further improved. However, backward compatibility makes things difficult - I guess. There was an informative video on Julia, but I cannot find it now.

  0 Comments

Sign in to comment.