Using the Parallel Profiler

Introduction

The parallel profiler provides an extension of the profile command and the profile viewer specifically for parallel jobs, to enable you to see how much time each lab spends evaluating each function and how much time communicating or waiting for communications with the other labs. Before using the parallel profiler, familiarize yourself with the standard profiler and its views, as described in Profiling for Improving Performance.

Collecting Parallel Profile Data

For parallel profiling, you use the mpiprofile command within your parallel job (often within pmode) in a similar way to how you use profile.

To turn on the parallel profiler to start collecting data, enter the following line in your parallel job task M-file, or type at the pmode prompt in the Parallel Command Window:

mpiprofile on

Now the profiler is collecting information about the execution of code on each lab and the communications between the labs. Such information includes:

With the parallel profiler on, you can proceed to execute your code while the profiler collects the data.

In the pmode Parallel Command Window, to find out if the profiler is on, type:

P>> mpiprofile status

For a complete list of options regarding profiler data details, clearing data, etc., see the mpiprofile reference page.

Viewing Parallel Profile Data

To open the parallel profile viewer from pmode, type in the Parallel Command Window:

P>> mpiprofile viewer

The remainder of this section is an example that illustrates some of the features of the parallel profile viewer. This example executes in a pmode session running on four local labs. Initiate pmode by typing in the MATLAB® Command Window:

pmode start local 4

When the Parallel Command Window (pmode) starts, type the following code at the pmode prompt:

P>> R1 = rand(16, distributor)
P>> R2 = rand(16, distributor)
P>> mpiprofile on
P>> P = R1*R2
P>> mpiprofile off
P>> mpiprofile viewer

The last command opens the Profiler window, first showing the Parallel Profile Summary (or function summary report) for lab 1.

The function summary report displays the data for each function executed on a lab in sortable columns with the following headers:

Column HeaderDescription
CallsHow many times the function was called on this lab
Total TimeThe total amount of time this lab spent executing this function
Self TimeThe time this lab spent inside this function, not within children or subfunctions
Total Comm TimeThe total time this lab spent transferring data with other labs, including waiting time to receive data
Self Comm Waiting TimeThe time this lab spent during this function waiting to receive data from other labs
Total Interlab DataThe amount of data transferred to and from this lab for this function
Computation Time RatioThe ratio of time spent in computation for this function vs. total time (which includes communication time) for this function
Total Time PlotBar graph showing relative size of Self Time, Self Comm Waiting Time, and Total Time for this function on this lab

Click the name of any function in the list for more details about the execution of that function. The function detail report for distributed.mtimes includes this listing:

The code that is displayed in the report is taken from the client. If the code has changed on the client since the parallel job ran on the labs, or if the labs are running a different version of the functions, the display might not accurately reflect what actually executed.

You can display information for each lab, or use the comparison controls to display information for several labs simultaneously. Two buttons provide Automatic Comparison Selection, allowing you to compare the data from the labs that took the most versus the least amount of time to execute the code, or data from the labs that spent the most versus the least amount of time in performing interlab communication. Manual Comparison Selection allows you to compare data from specific labs or labs that meet certain criteria.

The following listing from the summary report shows the result of using the Automatic Comparison Selection of Compare (max vs. min TotalTime). The comparison shows data from lab 4compared to lab 1because these are the labs that spend the most versus least amount of time executing the code.

The following figure shows a summary of all the functions executed during the profile collection time. The Manual Comparison Selection of max Time Aggregate means that data is considered from all the labs for all functions to determine which lab spent the maximum time on each function. Next to each function's name is the lab that took the longest time to execute that function. The other columns list the data from that lab.

The next figure shows a summary report for the labs that spend the most versus least time for each function. A Manual Comparison Selection of max Time Aggregate against min Time >0 Aggregate generated this summary. Both aggregate settings indicate that the profiler should consider data from all labs for all functions, for both maximum and minimum. This report lists the data for distributed.mtimes from labs 4 and 1 because they spent the maximum and minimum times on this function. Similarly, other functions are listed.

Click on a function name in the summary listing of a comparison to get a detailed comparison. The detailed comparison for distributed.mtimes looks like this, displaying line-by-line data from both labs:

To see plots of communication data, select Plot All PerLab Communication in the Show Figures menu. The top portion of the plot view report plots how much data each lab receives from each other lab for all functions.

To see only a plot of interlab communication times, select Plot CommTimePerLab in the Show Figures menu.

Plots like those in the previous two figures can help you determine the best way to balance work among your labs, perhaps by altering the partition scheme of your distributed arrays.

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS