Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
MATLAB calling Java methods (performance)

Subject: MATLAB calling Java methods (performance)

From: Ed Yu

Date: 10 Mar, 2009 22:23:01

Message: 1 of 6

Recently, I've written a Java component which loads a large amount of data from a database (55000 rows which is about 30Mbyte). After the load I had to do an insertion sort. I had the java call in a MATLAB tight loop essentially looping through the number of records brought back from the database:

% waitnow = 0;
% waitmax = numBuild;
% i = 0;
% while i<numBuild,
% myJavaComponent.build(i);
% i = i + 1;
% waitnow = waitnow + i;
% waitbar(waitnow/waitmax,wbHandle);
% end;

While this works fine, the result was that the JVM garbage collector constantly performing minor garbage collections (indicated by logging gc activities in java.opts) and chew up 100% of the CPU. After using a profiler (integrated to Eclipse 3.2) to profile my code, I found the problems and have them all taken care of. But the situation did not improve a whole lot when running inside MATLAB. The JVM is still performing a large number of minor garbage collections and the process still runs for about 1 minute. So I have to download another external profiler and attach to the JVM launched by MATLAB. Then it became apparent... MATLAB invokes com.mathworks.jmi.OpaqueJavaInterface.doesMethodExist() whenever a java method is invoked which generates a large amount of temporary objects of class java.lang.Appendable (200Mbyte+ in my case). So then I changed the above call to minimize the number of
calls to the java object (from 55000 to less than 50), the entire process takes less than 10 seconds.

So the lesson learned... If you have a large number of java method calls, find a way to minimize them because MATLAB invoke java methods is very expensive in terms of temporary objects creation.

Subject: MATLAB calling Java methods (performance)

From: Yair Altman

Date: 10 Mar, 2009 23:19:02

Message: 2 of 6

> While this works fine, the result was that the JVM garbage collector constantly performing minor garbage collections (indicated by logging gc activities in java.opts) and chew up 100% of the CPU. After using a profiler (integrated to Eclipse 3.2) to profile my code, I found the problems and have them all taken care of. But the situation did not improve a whole lot when running inside MATLAB. The JVM is still performing a large number of minor garbage collections and the process still runs for about 1 minute. So I have to download another external profiler and attach to the JVM launched by MATLAB. Then it became apparent... MATLAB invokes com.mathworks.jmi.OpaqueJavaInterface.doesMethodExist() whenever a java method is invoked which generates a large amount of temporary objects of class java.lang.Appendable (200Mbyte+ in my case)...

Ed - thanks for this post. For the record, please detail how you used Eclipse's profiler to solve your initial problem, and then how you connected an external profiler to Matlab's JVM. This sort of information has never been posted before. You touched on a similar subject, connecting a debugger in another post (http://www.mathworks.com/matlabcentral/newsreader/view_thread/155261 ), but I have never seen anything posted on Java profiling. I believe it would be of much value to Matlab-Java programmers, so the more detailed you could be the better.

Thanks,
Yair Altman

Subject: MATLAB calling Java methods (performance)

From: Ed Yu

Date: 11 Mar, 2009 16:10:20

Message: 3 of 6

"Yair Altman" <altmanyDEL@gmailDEL.comDEL> wrote in message <gp6sh6$975$1@fred.mathworks.com>...

> Ed - thanks for this post. For the record, please detail how you used Eclipse's profiler to solve your initial problem, and then how you connected an external profiler to Matlab's JVM. This sort of information has never been posted before. You touched on a similar subject, connecting a debugger in another post (http://www.mathworks.com/matlabcentral/newsreader/view_thread/155261 ), but I have never seen anything posted on Java profiling. I believe it would be of much value to Matlab-Java programmers, so the more detailed you could be the better.
>
> Thanks,
> Yair Altman

Yair, thanks for your interest in debugging and profiling. Many years ago I don't really believe in debuggers and profilers... As time goes, software written nowadays are much more complicated than before (data in, data out) because we are much more likely to use third party libraries now (especially with Java). So, as our software stack grows, debugger and profiler quickly becomes a critical part of our daily routine. You are right, there are plenty of online references as to how to profile java applications, but there are not a lot of real life stories about how to use profiler results to change implementation strategies to solve performance problems. And since we are working within a pretty unique environment having to work with Java through MATLAB. There truely aren't much resources we can go by with. So I'll try to outline my journey and hopefully it will help someone else...

Subject: MATLAB calling Java methods (performance)

From: Ed Yu

Date: 11 Mar, 2009 19:12:01

Message: 4 of 6

I will not actually show how to profile Java in this thread but I'll do it in
a separate thread.

Performance problems are truely very complicated to tackle. About the only way to even begin addressing what is going on is to use a profiler to look at your code execution in the 'exact' environment that it is running under to see the execution behavior and then formulate a new strategy for the next round of profiling. And hopefully after a couple of rounds, you'll have the right strategy which solves your performance problem. It is like pealing a large onion one layer at a time to get to the actual problem. The following is my journey to solve a real life Java, MATLAB performance bottleneck in my application.

For your information, I'm not disclosing the actual details of my work because I don't want to get permissions, so these are just analogy of the actual work I had performed for one of my clients.

Work Description

I had to load 2 database tables containing about 44000 and 11000 records respectively. Think of these as dictionaries of people names and company names which I've to maintain multiple sort orders, primarily the alphabetical order (people name) and their numeric key order (an id field that is unique within each table). For the bigger table (people names), I had to maintain an extra sort order of the 'name' field where I had to break it into 2 parts based on some rules (think of it as breaking first name and last name apart from the name field).

Original environment, approach and problem and action:

At the beginning, I had created a Java component NameManager.java to manage these name entries. At the time, I only have a couple of thousands (3000+) entries from a test data set. So the original approach is very simplistic, load the data from the database, keep a Vector of these objects for each sort order I need to maintain using java.util.Collections.sort().

Problem begins early on. I've realized that the "order by" results coming from the database is different than what Java normally would sort using java.util.Collections.sort(Vector, java.text.Collator.getInstance()).

The solution is then to sort all of these Vector in Java even when the database results come back are already in a sorted (but different) order!

New environment, problem and action:

Now legacy data migration has been completed by another team. The full scale of the problem begins to emerge. I had 44000 entries in the people names table and 11000 entries in the company names table. The original approach is to perform 5 sorts (id, name, and last name sorts on the people names table, company id, company name sorts on the company names table). This process took minutes (5-7 without truely measuring it...)

The action is to minimize the number of sorts. The obvious is to use HashMap to just store the id for people names and company names. This cuts down sorting to only 3.

New environment, problem and action:

After cutting down the number of sorts, things improved marginally (not by a whole lot, strange...) So I changed the default Java mergesort to quicksort I found on the net that uses no external storage for the 3 sorts but without improvement (and actually had degraded performance). So where does this leads? Rewrite the code not using Collections.sort() on existing Vector but use insertion sort to build new Vector containing object handles in the desired sort manner. Guess what! The situation did not improve as I expected... Since the entire system locks up during the sorts. This prompts me to mess around java.opts to log gc activities. Wow, lots of minor garbage collections one after another which obviously locks up the system (CPU usage 100%).

The action is to pull out a profiler and see what's going on during execution and especially looking at objects (garbage) creation. So I used the profiler within my IDE (JBuilder 2007, essentially Eclipse 3.2 with an additional profiler OptimizeIt). I'm fully aware that profiling my component is not within MATLAB at this point and I think the performance problem is within Java, not MATLAB so I started coding a mock execution in Java matching the MATLAB calls to this NameManager component.

New environment, problem and action:

During the profiling session, the memory allocation of an class caught my eyes. It actually comes from the database object relation mapper that I wrote and use for the last 7 years of my career. This object is a small class containing a HashMap to check data column types coming back from the database and to determine if the OR mapper can or cannot handle the data type. It turns out that this object (a new instance) is created whenever a database value is retrieved and stored within the java object wrapper. Think of it as whenever putting a value into a 2D matrix, a new object is created. A quick calculation results in 500,000 of these objects (temporal objects created within a method call and garbage collected) are created when loading 55000 records from the database, ouch!

So, time to fix the database object relation mapper and continue profiling.

New environment, problem and action:

Now with creating less half of a million objects, I ran the profiler again... Things looks better but again not by much (which surprise me)! So I profile the application again (outside MATLAB). Nothing caught my eyes other than int[] and char[] (large amount of them created) which comes from the java.util.Collator.compareTo() method. This obviously points to the fact that sorting (when using insertion sort and Collator) creates large number of temporary garbage... So I decided again to reduce the sorts from 3 sorts to a single sort. How? The problem is that the database results come back already sorted but since Java does not sort these entries in the same ordering (so I can use binary search), I had to sort them again in Java. BTW, I still have a service ticket opened with the database vendor (for 5 days without progress) to have the database order come back matching the Java sort
order... So I decided to look at how to create a custom java.text.RuleBasedCollator (which is actually the same object returned by calling the class method java.text.Collator.getInstance()) that would match the database sort. With a little bit of work, I inserted all the printable ANSI characters into a database table and select them and order by these characters. I found out that the database is actually using the ANSI character ordering (! is 33, " is 35, # is 35, $ is 36 and each subsequent character is "larger" than the previous one) to sort records.

So I created a customize RuleBasedCollator that Java can use to perform binary search of these Vector of objects that comes back from the database without sorts.

New environment, problem and action:

Now I have a Java customize RuleBasedCollator object that matches the order of records coming from the database (using the order by clause), I cut down the number of sorts from 3 to a single sort (the name field breaking out first and last name). Things has improved tremendously, now the process takes about 45 seconds. Good, but still doesn't make sense because anyone can write a sort and sort 55000 object in a heart beat! Something else must be going on... I'm kind of at the end of the road here because the app runs within my IDE and profiling looks ok (but seems to still generate garbage collections because we are still sorting once). A funny thing is that the application actually runs in about 10 seconds within the IDE mock but runs in 40-45 seconds in MATLAB (same behavior on my machine and on a co-worker machine which is dual core with 3G of RAM which is a lot beefier than mine
laptop 1.8Ghz single core with 2G RAM).

During profiling, there are logs of char[] coming back from getString() calls that traces back to String.substring(), which is where I break the name field into first and last name. So how about not calling String.substring()! Well, turns out that the name field actually comes from 2 tables that are joined into a single view at the database. So I decided to change the database view to return the actual first and last name as independent columns, hopefully creating less temporary objects by avoiding calling String.substring()!

New environment, problem and action:

With the database view changed, I profile the application again! Surprisingly, it did not do much! Definitely less temporal objects are created and less garbage is collected, but the actual time ran outside and inside MATLAB did not change! So there must be something else going on... The problem is that I'm still not profiling with MATLAB, just within my IDE with a mock class calling the component.

Profiling is not very well documented and lots of time they bundled arrays (int[] and char[]) into a big bucket... As of right now, I'm completely out of ideas... But, I've used a commercial profiler many years ago which actually helped (JProfiler). So I go and fill out the evaluation form and try to hook it up and profile the MATLAB launched JVM for the first time.

New environment, problem and action:

The debugging session was slow (to the point that I had to terminate the session and reboot my computer), but it had revealed something that I had never seen before...

After the data loaded from the database, I had to do an insertion sort (the only remaining sort). I had the Java call in a tight MATLAB loop essentially looping through the number of records brought back from the database:

% waitnow = 0;
% waitmax = numBuild;
% i = 0;
% while i<numBuild,
% myJavaComponent.build(i); % calling Java insertion sort
% i = i + 1;
% waitnow = waitnow + i;
% waitbar(waitnow/waitmax,wbHandle);
% end;

While this works fine, the result was that the JVM garbage collector constantly performing minor garbage collections (indicated by logging gc activities in java.opts and profiling sessions) and chew up 100% of the CPU. The profiler attached to the JVM launched by MATLAB has shown that MATLAB invokes method com.mathworks.jmi.OpaqueJavaInterface.doesMethodExist() whenever a Java method is invoked which generates a large amount of temporary objects of class java.lang.Appendable (200Mbyte+ in my case). So then I changed the above call to minimize the number of calls to the Java object (from 55000 to less than 50), the entire process takes about 10 seconds (which matches the run time profiled within just the IDE outside MATLAB).

At this point, I can relax... I finally cracked this.

Lessons learned

If you have a large number of Java method calls, find a way to minimize them because MATLAB invoke Java methods is very expensive in terms of temporary objects creation.

Subject: MATLAB calling Java methods (performance)

From: Ed Yu

Date: 13 Mar, 2009 19:02:01

Message: 5 of 6

To profile your Java application with Eclipse, here is the link:

http://www.eclipse.org/tptp/home/documents/tutorials/profilingtool/profilingexample_32.html

Here is the link to my tutorial on profiling Java code in JBuilder with Optimizeit:

http://www.mathworks.com/matlabcentral/fileexchange/23272

Here is the link to my tutorial on profiling java code within the JVM launched from MATLAB (2007a) with an external Java Profiler (JProfiler 5.2.1).

http://www.mathworks.com/matlabcentral/fileexchange/23275

Have fun!

Subject: MATLAB calling Java methods (performance)

From: Ed Yu

Date: 1 Apr, 2009 18:15:05

Message: 6 of 6

Here is the link to my tutorial on profiling java code within the JVM launched from MATLAB (2007a) with an external Java profiler YourKit Java Profiler (YourKit Java Profiler 8.0.5)

http://www.mathworks.com/matlabcentral/fileexchange/23513

Have fun!

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us