Speed issues with Matlab code compiled using SDK compiler for Python

Hi there
Wonder if anyone has a view on this. Some code that I need to deploy in Python (v3.7) but created in Matlab (2019a) was compiled using the Compiler SDK to generate the required CTF files. It comprises about 15 independent functions that are executed in a sequential way. Tested in Matlab, the *m native files, which are very efficiently coded, require (for a toy example) about 7 to 8 seconds to complete. When deployed using the MCR in Python the same toy example takes about 50 seconds.
Not only that, the problem is roughly linear in the size of the input variable so using an input twice the size in Matlab uses up 15-18 sec to complete, but slighly below 1 min in Python. Looking at a Profiler run in Matlab it was seen that only two of the routines took roughly 90% of the computation time, what was expected. Deployed in Python, they only took 32% of the computation time. That led us to think we had some form of overhead attached to the MCR (impacting every routine as they were processed sequentially) but, upon inspection, that did not seem to be the issue.
As per Matlab, programs compiled using the Compiler SDK should run "at the same speed as Matlab" but nothing of the sorts is happening.
Perhaps more interestingly, as we increased the dimension of the data input to the routine the computation time in Matlab rose roughly as expected/predicted by theory, yet in Python nothing of sorts seems to happen. That said, Python performance is consistently sub-par as compared to Matlab and we've ran out of gas in thinking what on earth might be going on.
BTW, in both instances we ran things on separate PCs of equivalent configuration. Hardware does not seem to be causing it.
Any ideas?
Thanks

Answers (2)

The performance in R2022a is orders of magnitude better than in previous releases, as described in the release notes. See the item "Performance/Python Package Integration: Improved performance with large multidimensional arrays in Python".

1 Comment

I’m seeing a 2x runtime increase when running in python, using 2022a. I’m stuck in a situation similar to OP; it is quite disappointing.

Sign in to comment.

Perhaps your deployed environment is different from your Matlab &D one. For example, perhaps you can access a GPU on the R&D env but not the deployment one; or perhaps the physical CPU/memory setup is different, or something similar. Try running your deployed Python program on the same R&D computer as the one that runs Matlab, and see what happens. At the very least it would tell you if the problem is the compilation process or the run-time machine.

4 Comments

Thanks Yair and apologies for the delayed reply.
TBH I think it's not the difference between the development environment and production one, as we've conducted tests using the same infrastructure backbone. We did however implement a simple test: created a stopwatch routine in Matlab that takes 60 seconds to run, with millisecond precision. Tested it on 10k simulation runs within Matlab to ensure consist results (deviations were a few milliseconds at either side). Then created the corresponding Python CTF binaries. Deployed it on the same infrastructure. What was the result?
We found, again using simulations, a 3.8 sec delay on average (so what took 60 sec natively in Matlab, as planned, demanded just under 64 seconds in its compiled-for-Python counterpart, on average). That is clearly the overhead from starting the MCR, when the "initiate" request is fired up in Python.
So. If you have a sequence of (individual) functions within a logical pipeline, functions that might not be called as part of the same process, each "initiate" will add 3.8 sec to run. Do that for 10-12 separate functions and you have a 40-48 sec penalty. Which is pretty much what we've observing in practice.
Nesting some of the functions does not work in our implementation, as they involve callbacks between two simulated nodes (each processing a specific set of instructions in a specific sequence). The nodes are "simulated" hence there is no overhead from messages parsing bewteen them. In the real-life implementation that will not be the case, so there will be an extra overhead.
One possible solution is to map the Matlab code into Python. That will take about a year, I reckon (including auditing and other ancillary work). We're looking at 7k lines of code plus auxilary routines, so +10k lines of code. Not guaranteed it will work better; a lot of work has been made already in writing pretty efficient Matlab code. Generating C code is another route, but that will require adapting some bits of code that are not optimized for C mapping by Matlab's compiler. Perhaps less time, but still not a guaranteed result.
Perhaps someone knows a trick up the sleeve in the way the MCR is implemented, hence my OP. But unless someone has come with a solution through a practical implementation, it seems we've struck a wall.
Cheers.
I suggest you contact support@mathworks.com. It seems a serious-enough issue that they should address.
If you ever come across a workaround or solution, please post it here for the benefit of others.
Certainly. I would have expected some of the Mathworks folks to look into this forum, that would make exchanges all the more productive for everyone?

Sign in to comment.

Categories

Products

Release

R2019a

Asked:

on 18 Jan 2021

Commented:

on 19 Jul 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!