This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Deep Learning with Big Data on GPUs and in Parallel

Neural networks are inherently parallel algorithms. You can take advantage of this parallelism by using Parallel Computing Toolbox™ to distribute training across multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs.

Training deep networks is extremely computationally intensive and you can usually accelerate training by using a high performance GPU. If you do not have a suitable GPU, you can train on one or more CPU cores instead. You can train a convolutional neural network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a cluster. Using GPU or parallel options requires Parallel Computing Toolbox.

You do not need multiple computers to solve problems using data sets too big to fit in memory. You can use the imageDatastore function to work with batches of data without needing a cluster of machines. For an example, see Train a Convolutional Neural Network Using Data in ImageDatastore. However, if you have a cluster available, it can be helpful to take your code to the data repository rather than moving large amounts of data around.

Deep Learning Hardware and Memory ConsiderationsRecommendationsRequired Products
Data too large to fit in memoryTo import data from image collections that are too large to fit in memory, use the imageDatastore function. This function is designed to read batches of images for faster processing in machine learning and computer vision applications.


Neural Network Toolbox™

CPUIf you do not have a suitable GPU, you can train on a CPU instead. By default, the trainNetwork function uses the CPU if no GPU is available.


Neural Network Toolbox

GPUBy default, the trainNetwork function uses a GPU if available. Requires a CUDA® enabled NVIDIA® GPU with compute capability 3.0 or higher. Check your GPU using gpuDevice. Specify the execution environment using the trainingOptions function.


Neural Network Toolbox

Parallel Computing Toolbox

Parallel on your local machine using multiple GPUs or CPU coresTake advantage of multiple workers by specifying the execution environment with the trainingOptions function. If you have more than one GPU on your machine, specify 'multi-gpu'. Otherwise, specify 'parallel'.


Neural Network Toolbox

Parallel Computing Toolbox

Parallel on a cluster or in the cloudScale up to use workers on clusters or in the cloud to accelerate your deep learning computations. Use trainingOptions and specify 'parallel' to use a compute cluster. For more information, see Deep Learning in the Cloud.


Neural Network Toolbox

Parallel Computing Toolbox

MATLAB Distributed Computing Server™

Training with Multiple GPUs

Cutting-edge neural networks rely on increasingly large training datasets and networks structures. In turn, this requires increased training times and memory resources. To support training such networks, MATLAB provides support for training a single network using multiple GPUs in parallel. If you have more than one GPU on your local machine, enable multiple GPU training by setting the 'ExecutionEnvironment' option to 'multi-gpu' with the trainingOptions function. If you have access to a cluster or cloud, specify 'parallel'. To improve convergence and/or performance using multiple GPUs, try increasing the MiniBatchSize and learning rate.

For optimum performance, you need to experiment with the MiniBatchSize option that you specify with the trainingOptions function. Convolutional neural networks are typically trained iteratively using batches of images. This is done because the whole dataset is far too big to fit into GPU memory. The optimal batch size depends on your exact network, dataset, and GPU hardware, so you need to experiment. Too large a batch size can lead to slow convergence, while too small a batch size can lead to no convergence at all. Often the batch size is dictated by the GPU memory available. For larger networks, the memory requirements per image increases and the maximum batch size is reduced.

When training with multiple GPUs, each image batch is distributed between the GPUs. This effectively increases the total GPU memory available, allowing larger batch sizes. Depending on your application, a larger batch size can provide better convergence or classification accuracy.

Using multiple GPUs can provide a significant improvement in performance. To decide if you expect multi-GPU training to deliver a performance gain, consider the following factors:

  • How long is the iteration on each GPU? If each GPU iteration is short, the added overhead of communication between GPUs can dominate. Try increasing the computation per iteration by using a larger batch size.

  • Are all the GPUs on a single machine? Communication between GPUs on different machines introduces a significant communication delay.

Deep Learning in the Cloud

Try your deep learning applications with multiple high-performance GPUs on Amazon® Elastic Compute Cloud (Amazon EC2®). You can use MATLAB to perform deep learning in the cloud using Amazon EC2 with P2 instances and data stored in the cloud. If you do not have a suitable GPU available for faster training of a convolutional neural network, you can use this feature instead. You can try different numbers of GPUs per machine to accelerate training and use parallel computing to train multiple models at once on the same data, or train a single network using multiple GPUs. You can compare and explore the performance of multiple deep neural network configurations to look for the best tradeoff between accuracy and memory use.

To help you get started, see this white paper that outlines a complete workflow: Deep Learning in the Cloud with MATLAB White Paper.

See Also

Was this topic helpful?