Training a network and making predictions on new data require that images match the input
size of the network. To resize images to match the input size of the network, you can use
In addition to resizing images, you can perform additional preprocessing to augment training, validation, test, and prediction data sets. Augmenting training images helps to prevent the network from overfitting and memorizing the exact details of the training images.
To find the image input size of the network, get the first two elements of the
InputSize property of the
imageInputLayer of the network. For example, to get the image input size
for the AlexNet pretrained
net = alexnet; inputSize = net.Layers(1).InputSize(1:2)
inputSize = 227 227
The method to resize images depends on the image data type.
To rescale a 3-D array representing a single color image, a single
multispectral image, or a stack of grayscale images, use
imresize. For example, to resize images in the 3-D array
im = imresize(im3d,inputSize);
To rescale a 4-D array representing a stack of images, you can use
imresize. For example, to rescale images in the 4-D array
im = imresize(im4d,inputSize);
Alternatively, you can rescale or crop images in the 4-D array to the
desired size by using
augmentedImageDatastore. By default,
augmentedImageDatastore rescales images to the
desired size. If instead you want to crop images from the center or from
random positions in the image, you can use the
' name-value pair argument. For
example, to crop images in the 4-D array
im4d from the
center of each
auimds = augmentedImageDatastore(inputSize,im4d,'OutputSizeMode','centercrop');
auimds = augmentedImageDatastore(inputSize,imds);
You can use an augmented image datastore or a resized 4-D array for training, prediction, and classification. You can use a resized 3-D array for prediction and classification only.
In addition to resizing images, an
augmentedImageDatastore enables you to augment images with a combination
of rotation, reflection, shear, and translation transformations. The diagram shows how
trainNetwork uses an augmented image
datastore to transform training data for each epoch.
Define your training images. You can store the images as an
ImageDatastore, a 4-D numeric
array, or a table. An
ImageDatastore enables you to
import data from image collections that are too large to fit in memory. This
function is designed to read batches of images for faster processing in
machine learning and computer vision applications.
Configure image transformation options, such as the range of rotation
angles and whether to apply reflection at random, by creating an
To preview the transformations applied to sample images, use the
augmentedImageDatastore, specifying the training images, the
size of output images, and the
size of output images must be compatible with the size of the
imageInputLayer of the network.
Train the network, specifying the augmented image datastore as the data
trainNetwork. For each
iteration of training, the augmented image datastore applies a random
combination of transformations to the mini-batch of training data.
When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.
For an example of the workflow, see Train Network with Augmented Images.
If you want to perform image preprocessing beyond the transformations offered by
augmentedImageDatastore, then you can use a mini-batch
datastore to perform data augmentation. A mini-batch datastore refers to
any built-in or custom datastore that offers support for reading data in batches. You
can use a mini-batch datastore as a source of training, validation, and test data sets
for deep learning applications that use Deep Learning
These built-in mini-batch datastores perform specific image preprocessing operations when they read a batch of data:
|Type of Mini-Batch Datastore||Description|
|Apply random affine geometric transformations, including resizing, rotation, reflection, shear, and translation, for training deep neural networks. For an example, see Transfer Learning Using AlexNet.|
|Apply identical affine geometric transformations to images and corresponding ground truth labels for training semantic segmentation networks (requires Computer Vision System Toolbox™). For an example, see Semantic Segmentation Using Deep Learning.|
|Extract pairs of random patches from images or pixel label images (requires Image Processing Toolbox™). You optionally can apply identical random affine geometric transformations to the pairs of patches. For an example, see Single Image Super-Resolution Using Deep Learning.|
|Apply randomly generated Gaussian noise for training denoising networks (requires Image Processing Toolbox).|
To preprocess images using your own image processing pipeline, you can implement a custom mini-batch datastore. For more information, see Develop Custom Mini-Batch Datastore. For an example, see Define Custom Mini-Batch Datastore For Super-Resolution Networks.
When you define how your custom mini-batch datastore reads data, you can
augment data with random affine geometric transformations. Specify
transformation options by using an
imageDataAugmenter object, then transform data by using the
augment function. The
augment function can
apply identical transformations to input and response image pairs.