Skip to Main Content Skip to Search
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com
 

Contest Analysis

Using the Profiler and JIT to Accelerate a Fractal Calculation

Starting with some sample code found on the web, we make successive changes suggested by the MATLAB Profiler to accelerate the performance. The resulting speed-up may be surprising to people who are used to versions of MATLAB that preceded that Accelerator.

Overview

The initial Mandelbrot script

Here is some code we found displaying the Mandelbrot set in MATLAB. This file can be found at http://www.students.tut.fi/~warp/MandScripts/matlab.html

This is the code exactly as it appeared on that web site

1     xmax=2;
2     steps=200;
3     maxiter=32;
4     
5     Z=0;
6     X=[-xmax:2*xmax/(steps-1):xmax];
7     Y=[-xmax:2*xmax/(steps-1):xmax];
8     
9     for m=1:steps
10        c=i*(-xmax+2*xmax*m/steps);
11        for n=1:steps
12            c=-xmax+2*xmax*n/steps-.5 + i*imag(c);
13            z=c;
14            for r=0:maxiter
15                z=z*z+c;
16                if abs(z)>2 break
17                end
18            end
19            Z(m,n)=sqrt(r/maxiter);
20        end
21    end
22    
23    mesh(X,Y,Z)

And this is the plot that it makes

Change the picture

Let's make some small changes to the plotting at the very end of this script so it plots an image instead of a mesh plot. So we're replacing

23    mesh(X,Y,Z)

with

23    imagesc(Z)
24    axis off
25    axis equal

in order to get a better-looking picture.

Profile the Script

How long does this script take to run? One way to find out is to use the commands tic and toc to determine the overall elapsed time.

Initial version: elapsed time 2.323 sec

A more helpful way to time your code is to use the MATLAB Profiler. Notice the file actually takes longer to run. Why? It takes longer because it's heavily instrumented in order to give us important information about the program. For instance, what line is taking the most time?

If you look at the display for the Profiler, it actually displays the busiest line with the darkest background color.

File mandelbrot2, time = 3.285 sec

Line 12 is taking the most time.

12            c=-xmax+2*xmax*n/steps-.5 + i*imag(c);

Line 12 is taking 2.105 sec, or 64.1% of the total function time.

What can be done to speed up that line? One thing we can do is pay attention to the messages provided by the MATLAB accelerator. Here's how.

If you look at the file listing at the bottom of the Profiler, you see four columns: time, calls, acc, and line. The last of these is simply the line number. The first is the amount of time spent on a given line, and the second is the number of calls made to that line. What about the third column, acc? This refers to acceleration, and it is an indication of whether or not the line in question has been successfully accelerated by the MATLAB just-in-time compiler. A dot (".") in the column means the line did accelerate, whereas an x means that it did not. Clicking on the x will bring up a message as to why that line did not accelerate.

For instance, line 12 did not accelerate. Why not? If we click on it, we get the following message.

Reference to 'i' in a script cannot be accelerated unless 'i' is a 
defined variable.  To apply the builtin 'i', use the notation '1i'.

Define i

The problem is that we haven't defined i explicitly. MATLAB is able to figure out that we mean sqrt(-1), but it costs time to do that. We can save MATLAB the trouble by typing "1i" instead of "i", or simply by adding a new line where we explicitly say i = sqrt(-1). Let's do the second of these options. Our new version has the line.

8     i=sqrt(-1);

How fast does it run now?

File mandelbrot3, time = 1.362 sec

That one change makes a big difference. What else can we try? Let's look once more to see where the most time is being spent.

17                if abs(z)>2 break

Line 17 is taking 0.431 sec, or 31.6% of the total function time.

Line 17 is taking the most time now. But we notice that there are still x marks in the Accelerator column. Since they're in the loop body, we know they have a big impact on the overall speed of the code. What is the x on line 12 complaining about?

Variable 'r', used as a FOR loop control variable, was referred to on 
line 20 outside of the body of the FOR loop.

Confine the loop variable to the loop

So r, which is used as the loop control variable in the innermost loop (line 15), but then it is also used in a calculation outside the loop (line 20). We can fix that by introducing a new variable, rmax. The speed-up comes by removing all references to r outside the loop.

File mandelbrot4, time = 1.612 sec

Now instead of

15            for r=0:maxiter
16                z=z*z+c;
17                if abs(z)>2 break
18                end
19            end
20            Z(m,n)=sqrt(r/maxiter);

we have

15            for r=0:maxiter
16                z=z*z+c;
17                rmax=r;
18                if abs(z)>2 break
19                end
20            end
21            Z(m,n)=sqrt(rmax/maxiter);

Even this isn't good enough, because the Profiler now says that rmax is undefined. Why? Because rmax is defined for the first time in the body of a loop. But loop bodies might never be evaluated, as for instance, in the expression

>> for n=2:1, disp('Never runs this'); end

Therefore we need to define rmax up above the loop if we intend to refer to it below the loop. These are subtle considerations. Keep in mind that the code runs and gets the correct answer whether you make this optimization or not. It's just a matter of how badly you want your code to run fast.

Let's add the rmax definition and see where we get.

File mandelbrot4a, time = 1.091 sec

15            rmax=maxiter;
16            for r=0:maxiter
17                z=z*z+c;
18                rmax=r;
19                if abs(z)>2 break
20                end
21            end

Because we've defined rmax outside the innermost loop, we're now free to avoid defining rmax ahead of the conditional. This saves a little time

File mandelbrot4b, time = 0.941 sec

15            rmax=maxiter;
16            for r=0:maxiter
17                z=z*z+c;
18                if abs(z)>2
19                    rmax=r;
20                    break
21                end
22            end

Perform square root once

Where else can we save time? One thing we notice is that we're doing the square root and division calculations at every single time step. What happens if we move those out of the loop? Do we gain any efficiency?

File mandelbrot5, time = 0.921 sec

16            for r=0:maxiter
17                z=z*z+c;
18                if abs(z)>2
19                    rmax=r;
20                    break
21                end
22            end
23            Z(m,n)=rmax;
24        end
25    end
26    
27    Z=sqrt(Z/maxiter);

Pre-allocate matrices

The speed-up is modest, but nothing great. What are some other changes we might make to speed things up? Pre-allocating the space alloted to large matrices is always a good thing, because then you save the time required to "grow" a matrix that has been extended inside a loop. Let's pre-allocate the matrix Z, since we know up front exactly how big it's going to be. So replace

5     Z=0;

with

5     Z=zeros(steps);

Does this make much of a difference?

File mandelbrot5b, time = 0.681 sec

Yes it does. This is a significant improvement in performance for a very small change.

Minor clean-up

Finally here are some minor improvements. Upon close inspection, we can see that the variables X and Y are defined but never used (because we changed the plot), so we can safely remove them. Additionally, we can simplify the initialization of the constant c and remove a call to the imag command. These are small tweaks that make the code more readable, but we don't expect them to change the speed of the code very much.

File mandelbrot5c, time = 0.691 sec

8     for m=1:steps
9         for n=1:steps
10            c=-xmax+2*xmax*n/steps-.5 + i*(-xmax+2*xmax*m/steps);
11            z=c;

Unroll the Complex Arithmetic

One last change awaits us. We can take all the complex math calculations that MATLAB is managing for us and turn them into non-complex calculations. That is, we can manage the imaginary and real components as floating point doubles rather than letting MATLAB take care of it. This is somewhat inconvenient, but it pays big dividends in this case because non-complex math runs much faster in MATLAB version 6.5.

File mandelbrot6, time = 0.311 sec

The final Mandelbrot script

Here is the entire final program.

1     xmax=2;
2     steps=200;
3     maxiter=32;
4     
5     Z=zeros(steps);
6     
7     for m=1:steps
8         ci=(-xmax+2*xmax*m/steps);
9         for n=1:steps
10            cr=-xmax+2*xmax*n/steps-.5;
11            zr=cr;
12            zi=ci;
13            rmax=maxiter;
14            for r=0:maxiter
15                zrn=zr*zr-zi*zi+cr;
16                zin=2*zi*zr+ci;
17                zi=zin;
18                zr=zrn;
19                if (zr*zr+zi*zi)>4,
20                    rmax=r;
21                    break
22                end
23            end
24            Z(m,n)=rmax;
25        end
26    end
27    
28    Z=sqrt(Z/maxiter);
29    imagesc(Z)
30    axis off
31    axis equal

Run the timing loop

Let's time all the programs that we've written. The numbers may vary a little bit from what we've seen above, but the basic results are the same.

Original file
  Time = 3.435 sec

Change display
  Time = 3.465 sec

Define i
  Time = 1.372 sec

Confine loop variable
  Time = 1.562 sec

Define rmax above loop
  Time = 1.042 sec

Move rmax definition behind conditional
  Time = 0.921 sec

Consolidate sqrt calculation
  Time = 0.941 sec

Pre-allocate memory
  Time = 0.681 sec

Remove unused code
  Time = 0.702 sec

Unroll complex math
  Time = 0.320 sec

Run the timing loop with the JIT off

Original file
  Time = 14.983 sec

Change display
  Time = 15.012 sec

Define i
  Time = 14.512 sec

Confine loop variable
  Time = 17.195 sec

Define rmax above loop
  Time = 17.786 sec

Move rmax definition behind conditional
  Time = 15.753 sec

Consolidate sqrt calculation
  Time = 15.192 sec

Pre-allocate memory
  Time = 14.942 sec

Remove unused code
  Time = 14.942 sec

Unroll complex math
  Time = 26.089 sec

Comparison

Notice some of these trends move in complete opposite directions.

Normalized comparison

A little bit easier to compare.

Bar chart percent change comparison

The end result of all our optimization was to knock around 90% off the original time required for the JIT. These same changes would have actually added around 70% to the unaccelerated version.

Percent Change from inital to most optimized
Accelerated: -90.68
Unaccelerated: 74.12

Conclusion

The moral of the story is...

  • Use the profiler
  • Pay attention to your accelerator messages
  • Instincts about what is slow in MATLAB may no longer be true

Without proper instrumentation, you can never really tell what's slowing you down. As a final note, remember to optimize your code in line with your needs. Maximum speed might be more important than readability, but if it isn't, then there's no need to keep tweaking and tuning if your code is already well-structured and fast enough.

 protein
 
Home

About the contest

Rules

Analysis

Newsgroup thread

Queue & Top 20

Complete rankings

Search for an entry

FAQ

Hall of fame

Send us feedback
 

<#assign current_year = request.date?string("#")?date("yyyyMMdd")?string("yyyy")>
Related Topics