The parallel profiler provides an extension of the
profile command and the profile viewer
specifically for communicating jobs, to enable you to see how much
time each worker spends evaluating each function and how much time
communicating or waiting for communications with the other workers.
Before using the parallel profiler, familiarize yourself with the
standard profiler and its views, as described in Profile to Improve Performance (MATLAB).
The parallel profiler works on communicating jobs, including
inside pmode. It does not work on
For parallel profiling, you use the
within your communicating job (often within pmode) in a similar way
to how you use
To turn on the parallel profiler to start collecting data, enter the following line in your communicating job task code file, or type at the pmode prompt in the Parallel Command Window:
Now the profiler is collecting information about the execution of code on each worker and the communications between the workers. Such information includes:
Execution time of each function on each worker
Execution time of each line of code in each function
Amount of data transferred between each worker
Amount of time each worker spends waiting for communications
With the parallel profiler on, you can proceed to execute your code while the profiler collects the data.
In the pmode Parallel Command Window, to find out if the profiler is on, type:
P>> mpiprofile status
For a complete list of options regarding profiler data details,
clearing data, etc., see the
To open the parallel profile viewer from pmode, type in the Parallel Command Window:
P>> mpiprofile viewer
The remainder of this section is an example that illustrates some of the features of the parallel profile viewer. This example executes in a pmode session running on four local workers. Initiate pmode by typing in the MATLAB® Command Window:
pmode start local 4
When the Parallel Command Window (pmode) starts, type the following code at the pmode prompt:
P>> R1 = rand(16, codistributor()) P>> R2 = rand(16, codistributor()) P>> mpiprofile on P>> P = R1*R2 P>> mpiprofile off P>> mpiprofile viewer
The last command opens the Profiler window, first showing the Parallel Profile Summary (or function summary report) for worker (lab) 1.
The function summary report displays the data for each function executed on a worker in sortable columns with the following headers:
|Calls||How many times the function was called on this worker|
|Total Time||The total amount of time this worker spent executing this function|
|Self Time||The time this worker spent inside this function, not within children or local functions|
|Total Comm Time||The total time this worker spent transferring data with other workers, including waiting time to receive data|
|Self Comm Waiting Time||The time this worker spent during this function waiting to receive data from other workers|
|Total Interlab Data||The amount of data transferred to and from this worker for this function|
|Computation Time Ratio||The ratio of time spent in computation for this function vs. total time (which includes communication time) for this function|
|Total Time Plot||Bar graph showing relative size of Self Time, Self Comm Waiting Time, and Total Time for this function on this worker|
Select the name of any function in the list
for more details about the execution of that function. The function
detail report for
The code that is displayed in the report is taken from the client. If the code has changed on the client since the communicating job ran on the workers, or if the workers are running a different version of the functions, the display might not accurately reflect what actually executed.
You can display information for each worker, or use the comparison controls to display information for several workers simultaneously. Two buttons provide Automatic Comparison Selection, allowing you to compare the data from the workers that took the most versus the least amount of time to execute the code, or data from the workers that spent the most versus the least amount of time in performing interworker communication. Manual Comparison Selection allows you to compare data from specific workers or workers that meet certain criteria.
The following listing from the summary report shows the result of using the Automatic Comparison Selection of Compare (max vs. min TotalTime). The comparison shows data from worker (lab) 3 compared to worker (lab) 1 because these are the workers that spend the most versus least amount of time executing the code.
The following figure shows a summary of all the functions executed during the profile collection time. The Manual Comparison Selection of max Time Aggregate means that data is considered from all the workers for all functions to determine which worker spent the maximum time on each function. Next to each function's name is the worker that took the longest time to execute that function. The other columns list the data from that worker.
The next figure shows a summary
report for the workers that spend the most versus least time for each
function. A Manual Comparison Selection of max
Time Aggregate against min Time >0 Aggregate generated
this summary. Both aggregate settings indicate that the profiler should
consider data from all workers for all functions, for both maximum
and minimum. This report lists the data for
workers 3 and 1, because they spent the maximum and minimum times
on this function. Similarly, other functions are listed.
Select a function name in
the summary listing of a comparison to get a detailed comparison.
The detailed comparison for
like this, displaying line-by-line data from both workers:
To see plots of communication data, select Plot All PerLab Communication in the Show Figures menu. The top portion of the plot view report plots how much data each worker receives from each other worker for all functions.
To see only a plot of interworker communication times, select Plot CommTimePerLab in the Show Figures menu.
Plots like those in the previous two figures can help you determine the best way to balance work among your workers, perhaps by altering the partition scheme of your codistributed arrays.