An open exchange for the MATLAB and Simulink user community |
Hosted by The MathWorks |
Contest Analysis
Using the Profiler and JIT to Accelerate a Fractal CalculationStarting with some sample code found on the web, we make successive changes suggested by the MATLAB Profiler to accelerate the performance. The resulting speed-up may be surprising to people who are used to versions of MATLAB that preceded that Accelerator. Overview
Here is some code we found displaying the Mandelbrot set in MATLAB. This file can be found at http://www.students.tut.fi/~warp/MandScripts/matlab.html This is the code exactly as it appeared on that web site 1 xmax=2; 2 steps=200; 3 maxiter=32; 4 5 Z=0; 6 X=[-xmax:2*xmax/(steps-1):xmax]; 7 Y=[-xmax:2*xmax/(steps-1):xmax]; 8 9 for m=1:steps 10 c=i*(-xmax+2*xmax*m/steps); 11 for n=1:steps 12 c=-xmax+2*xmax*n/steps-.5 + i*imag(c); 13 z=c; 14 for r=0:maxiter 15 z=z*z+c; 16 if abs(z)>2 break 17 end 18 end 19 Z(m,n)=sqrt(r/maxiter); 20 end 21 end 22 23 mesh(X,Y,Z) And this is the plot that it makes ![]() Let's make some small changes to the plotting at the very end of this script so it plots an image instead of a mesh plot. So we're replacing 23 mesh(X,Y,Z) with 23 imagesc(Z) 24 axis off 25 axis equal in order to get a better-looking picture. How long does this script take to run? One way to find out is to use the commands tic and toc to determine the overall elapsed time. Initial version: elapsed time 2.323 sec ![]() A more helpful way to time your code is to use the MATLAB Profiler. Notice the file actually takes longer to run. Why? It takes longer because it's heavily instrumented in order to give us important information about the program. For instance, what line is taking the most time?
If you look at the display for the Profiler, it actually displays the busiest line with the darkest background color. File mandelbrot2, time = 3.285 sec ![]() Line 12 is taking the most time. 12 c=-xmax+2*xmax*n/steps-.5 + i*imag(c); Line 12 is taking 2.105 sec, or 64.1% of the total function time. What can be done to speed up that line? One thing we can do is pay attention to the messages provided by the MATLAB accelerator. Here's how. If you look at the file listing at the bottom of the Profiler, you see four columns: time, calls, acc, and line. The last of these is simply the line number. The first is the amount of time spent on a given line, and the second is the number of calls made to that line. What about the third column, acc? This refers to acceleration, and it is an indication of whether or not the line in question has been successfully accelerated by the MATLAB just-in-time compiler. A dot (".") in the column means the line did accelerate, whereas an x means that it did not. Clicking on the x will bring up a message as to why that line did not accelerate. For instance, line 12 did not accelerate. Why not? If we click on it, we get the following message. Reference to 'i' in a script cannot be accelerated unless 'i' is a defined variable. To apply the builtin 'i', use the notation '1i'. The problem is that we haven't defined i explicitly. MATLAB is able to figure out that we mean sqrt(-1), but it costs time to do that. We can save MATLAB the trouble by typing "1i" instead of "i", or simply by adding a new line where we explicitly say i = sqrt(-1). Let's do the second of these options. Our new version has the line. 8 i=sqrt(-1); How fast does it run now? File mandelbrot3, time = 1.362 sec ![]() That one change makes a big difference. What else can we try? Let's look once more to see where the most time is being spent. 17 if abs(z)>2 break Line 17 is taking 0.431 sec, or 31.6% of the total function time. Line 17 is taking the most time now. But we notice that there are still x marks in the Accelerator column. Since they're in the loop body, we know they have a big impact on the overall speed of the code. What is the x on line 12 complaining about? Variable 'r', used as a FOR loop control variable, was referred to on line 20 outside of the body of the FOR loop. Confine the loop variable to the loop So r, which is used as the loop control variable in the innermost loop (line 15), but then it is also used in a calculation outside the loop (line 20). We can fix that by introducing a new variable, rmax. The speed-up comes by removing all references to r outside the loop. File mandelbrot4, time = 1.612 sec ![]() Now instead of 15 for r=0:maxiter 16 z=z*z+c; 17 if abs(z)>2 break 18 end 19 end 20 Z(m,n)=sqrt(r/maxiter); we have 15 for r=0:maxiter 16 z=z*z+c; 17 rmax=r; 18 if abs(z)>2 break 19 end 20 end 21 Z(m,n)=sqrt(rmax/maxiter); Even this isn't good enough, because the Profiler now says that rmax is undefined. Why? Because rmax is defined for the first time in the body of a loop. But loop bodies might never be evaluated, as for instance, in the expression >> for n=2:1, disp('Never runs this'); endTherefore we need to define rmax up above the loop if we intend to refer to it below the loop. These are subtle considerations. Keep in mind that the code runs and gets the correct answer whether you make this optimization or not. It's just a matter of how badly you want your code to run fast. Let's add the rmax definition and see where we get. File mandelbrot4a, time = 1.091 sec ![]() 15 rmax=maxiter; 16 for r=0:maxiter 17 z=z*z+c; 18 rmax=r; 19 if abs(z)>2 break 20 end 21 end Because we've defined rmax outside the innermost loop, we're now free to avoid defining rmax ahead of the conditional. This saves a little time File mandelbrot4b, time = 0.941 sec ![]() 15 rmax=maxiter; 16 for r=0:maxiter 17 z=z*z+c; 18 if abs(z)>2 19 rmax=r; 20 break 21 end 22 end Where else can we save time? One thing we notice is that we're doing the square root and division calculations at every single time step. What happens if we move those out of the loop? Do we gain any efficiency? File mandelbrot5, time = 0.921 sec ![]() 16 for r=0:maxiter 17 z=z*z+c; 18 if abs(z)>2 19 rmax=r; 20 break 21 end 22 end 23 Z(m,n)=rmax; 24 end 25 end 26 27 Z=sqrt(Z/maxiter); The speed-up is modest, but nothing great. What are some other changes we might make to speed things up? Pre-allocating the space alloted to large matrices is always a good thing, because then you save the time required to "grow" a matrix that has been extended inside a loop. Let's pre-allocate the matrix Z, since we know up front exactly how big it's going to be. So replace 5 Z=0; with 5 Z=zeros(steps); Does this make much of a difference? File mandelbrot5b, time = 0.681 sec ![]() Yes it does. This is a significant improvement in performance for a very small change. Finally here are some minor improvements. Upon close inspection, we can see that the variables X and Y are defined but never used (because we changed the plot), so we can safely remove them. Additionally, we can simplify the initialization of the constant c and remove a call to the imag command. These are small tweaks that make the code more readable, but we don't expect them to change the speed of the code very much. File mandelbrot5c, time = 0.691 sec ![]() 8 for m=1:steps 9 for n=1:steps 10 c=-xmax+2*xmax*n/steps-.5 + i*(-xmax+2*xmax*m/steps); 11 z=c; One last change awaits us. We can take all the complex math calculations that MATLAB is managing for us and turn them into non-complex calculations. That is, we can manage the imaginary and real components as floating point doubles rather than letting MATLAB take care of it. This is somewhat inconvenient, but it pays big dividends in this case because non-complex math runs much faster in MATLAB version 6.5. File mandelbrot6, time = 0.311 sec ![]() Here is the entire final program. 1 xmax=2; 2 steps=200; 3 maxiter=32; 4 5 Z=zeros(steps); 6 7 for m=1:steps 8 ci=(-xmax+2*xmax*m/steps); 9 for n=1:steps 10 cr=-xmax+2*xmax*n/steps-.5; 11 zr=cr; 12 zi=ci; 13 rmax=maxiter; 14 for r=0:maxiter 15 zrn=zr*zr-zi*zi+cr; 16 zin=2*zi*zr+ci; 17 zi=zin; 18 zr=zrn; 19 if (zr*zr+zi*zi)>4, 20 rmax=r; 21 break 22 end 23 end 24 Z(m,n)=rmax; 25 end 26 end 27 28 Z=sqrt(Z/maxiter); 29 imagesc(Z) 30 axis off 31 axis equal Let's time all the programs that we've written. The numbers may vary a little bit from what we've seen above, but the basic results are the same. Original file Time = 3.435 sec Change display Time = 3.465 sec Define i Time = 1.372 sec Confine loop variable Time = 1.562 sec Define rmax above loop Time = 1.042 sec Move rmax definition behind conditional Time = 0.921 sec Consolidate sqrt calculation Time = 0.941 sec Pre-allocate memory Time = 0.681 sec Remove unused code Time = 0.702 sec Unroll complex math Time = 0.320 sec ![]() Run the timing loop with the JIT off Original file Time = 14.983 sec Change display Time = 15.012 sec Define i Time = 14.512 sec Confine loop variable Time = 17.195 sec Define rmax above loop Time = 17.786 sec Move rmax definition behind conditional Time = 15.753 sec Consolidate sqrt calculation Time = 15.192 sec Pre-allocate memory Time = 14.942 sec Remove unused code Time = 14.942 sec Unroll complex math Time = 26.089 sec ![]() Notice some of these trends move in complete opposite directions. ![]() A little bit easier to compare. ![]() Bar chart percent change comparison ![]() The end result of all our optimization was to knock around 90% off the original time required for the JIT. These same changes would have actually added around 70% to the unaccelerated version. Percent Change from inital to most optimized Accelerated: -90.68 Unaccelerated: 74.12 The moral of the story is...
Without proper instrumentation, you can never really tell what's slowing you down. As a final note, remember to optimize your code in line with your needs. Maximum speed might be more important than readability, but if it isn't, then there's no need to keep tweaking and tuning if your code is already well-structured and fast enough. |
|
|||||||
| Related Topics |
| New Products | Support | Documentation | Training | Webinars | Careers | Newsletters | RSS |
| Problems? Suggestions? Contact us at contest@mathworks.com | © 1994-${current_year} The MathWorks, Inc. Trademarks Privacy Policy |