Use Experiment Manager to Train Networks in Parallel
By default, Experiment Manager runs one trial of your experiment at a time on a single CPU. If you have Parallel Computing Toolbox™, you can configure your experiment to run multiple trials at the same time or to run a single trial at a time on multiple GPUs, on a cluster, or in the cloud.
Training Scenario | Recommendation |
---|---|
Run multiple trials at the same time using one parallel worker for each trial. | Set up your parallel environment, set Mode to Alternatively, to offload
the experiment as a batch job, set Mode to
Experiment Manager does not support
|
Run a single trial at a time on multiple parallel workers. | Built-In Training Experiments: In the experiment setup function, set the
training option If you
are using a partitionable datastore, enable background dispatching by setting the
training option Set up
your parallel environment, set Mode to
Alternatively, to offload the
experiment as a batch job, set Mode to
|
Custom Training Experiments: In the experiment training function, set up your
parallel environment and use an Set
Mode to Alternatively, to offload the
experiment as a batch job, set Mode to
|
In built-in training experiments, the results table displays whether each trial runs on a
single CPU, a single GPU, multiple CPUs, or multiple GPUs. To show this information, click the
Show or hide columns button located above the results table and select
Execution Environment.
Tip
Load training and validation data from a location that is accessible to all your workers. For example, store your data outside the project and access the data by using an absolute path. Alternatively, create a datastore object that can access the data on another machine by setting up the
AlternateFileSystemRoots
property of the datastore. For more information, see Set Up Datastore for Processing on Different Machines or Clusters.To run an experiment in parallel using MATLAB® Online™, you must have access to a Cloud Center cluster. For more information, see Use Parallel Computing Toolbox with Cloud Center Cluster in MATLAB Online (Parallel Computing Toolbox).
Set Up Parallel Environment
Train on Multiple GPUs
If you have multiple GPUs, parallel execution typically increases the speed of your experiment. Using a GPU for deep learning requires Parallel Computing Toolbox and a supported GPU device. For more information, see GPU Computing Requirements (Parallel Computing Toolbox).
For built-in training experiments, GPU support is automatic. By default, these experiments use a GPU if one is available.
For custom training experiments, computations occur on a CPU by default. To train on a GPU, convert your data to
gpuArray
objects. To determine whether a usable GPU is available, call thecanUseGPU
function.
For best results, before you run your experiment, create a parallel pool with as many
workers as GPUs. You can check the number of available GPUs by using the gpuDeviceCount
(Parallel Computing Toolbox) function.
numGPUs = gpuDeviceCount("available");
parpool(numGPUs)
Note
If you create a parallel pool on a single GPU, all workers share that GPU, so you do not get the training speed-up and you increase the chances of the GPU running out of memory.
Train on Cluster or in Cloud
If your experiments take a long time to run on your local machine, you can accelerate training by using a computer cluster on your onsite network or by renting high-performance GPUs in the cloud. After you complete the initial setup, you can run your experiments with minimal changes to your code. Working on a cluster or in the cloud requires MATLAB Parallel Server™. For more information, see Deep Learning in the Cloud.
See Also
Apps
Functions
trainingOptions
|canUseGPU
|gpuDeviceCount
(Parallel Computing Toolbox) |parpool
(Parallel Computing Toolbox) |spmd
(Parallel Computing Toolbox)
Objects
gpuArray
(Parallel Computing Toolbox)