To train a network and make predictions on new data, your images must match the input size of the network. If you need to adjust the size of your images to match the network, then you can rescale or crop your data to the required size.
You can effectively increase the amount of training data by applying randomized
augmentation to your data. Augmentation also enables you to train
networks to be invariant to distortions in image data. For example, you can add randomized
rotations to input images so that a network is invariant to the presence of rotation in
input images. An
augmentedImageDatastore provides a convenient way to apply a limited set of
augmentations to 2-D images for classification problems.
For more advanced preprocessing operations, to preprocess images for regression problems,
or to preprocess 3-D volumetric images, you can start with a built-in datastore. You can
also preprocess images according to your own pipeline by using the
You can store image data as a numeric array, an
ImageDatastore object, or a table. An
enables you to import data in batches from image collections that are too large to fit
in memory. You can use an augmented image datastore or a resized 4-D array for training,
prediction, and classification. You can use a resized 3-D array for prediction and
There are two ways to resize image data to match the input size of a network.
Rescaling multiplies the height and width of the image by a scaling factor. If the scaling factor is not identical in the vertical and horizontal directions, then rescaling changes the spatial extents of the pixels and the aspect ratio.
Cropping extracts a subregion of the image and preserves the spatial extent of each pixel. You can crop images from the center or from random positions in the image.
|Resizing Option||Data Format||Resizing Function||Sample Code|
im = imresize(I,outputSize);
auimds = augmentedImageDatastore(outputSize,I);
im = imcrop(I,rect);
im = imcrop3(I,cuboid);
auimds = augmentedImageDatastore(outputSize,I,'OutputSizeMode',m);
For image classification problems, you can use an
augmentedImageDatastore to augment images with a random combination of
resizing, rotation, reflection, shear, and translation transformations.
The diagram shows how
trainNetwork uses an augmented image
datastore to transform training data for each epoch. When you use data augmentation, one
randomly augmented version of each image is used during each epoch of training. For an
example of the workflow, see Train Network with Augmented Images.
Specify training images.
Configure image transformation options, such as the range of rotation
angles and whether to apply reflection at random, by creating an
To preview the transformations applied to sample images, use the
augmentedImageDatastore. Specify the training images, the size
of output images, and the
imageDataAugmenter. The size of
output images must be compatible with the size of the
imageInputLayer of the network.
Train the network, specifying the augmented image datastore as the data
trainNetwork. For each
iteration of training, the augmented image datastore applies a random
combination of transformations to images in the mini-batch of training
When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.
Some datastores perform specific and limited image preprocessing operations when they
read a batch of data. These application-specific datastores are listed in the table. You
can use these datastores as a source of training, validation, and test data sets for
deep learning applications that use Deep Learning Toolbox™. All of these datastores return data in a format supported by
|Apply random affine geometric transformations, including resizing, rotation, reflection, shear, and translation, for training deep neural networks. For an example, see Transfer Learning Using Pretrained Network.|
|Apply identical affine geometric transformations to images and corresponding ground truth labels for training semantic segmentation networks (requires Computer Vision Toolbox™). For an example, see Semantic Segmentation Using Deep Learning.|
|Extract multiple pairs of random patches from images or pixel label images (requires Image Processing Toolbox™). You optionally can apply identical random affine geometric transformations to the pairs of patches. For an example, see Single Image Super-Resolution Using Deep Learning.|
|Apply randomly generated Gaussian noise for training denoising networks (requires Image Processing Toolbox).|
To perform more general and complex image preprocessing operations than offered by the
application-specific datastores, you can use the
functions. For more information, see Datastores for Deep Learning.
transform function creates an altered form of a datastore, called an
underlying datastore, by transforming the data read by
the underlying datastore according to a transformation function that you
The custom transformation function must accept data in the format returned by the
read function of the underlying datastore. For image data in
ImageDatastore, the format depends on the
ReadSize property .
ReadSize is 1, the transformation function
must accept an integer array. The size of the array is consistent with
the type of images in the
example, a grayscale image has dimensions
m-by-n, a truecolor image has
dimensions m-by-n-by-3, and a
multispectral image with c channels has dimensions
ReadSize is greater than 1, the transformation
function must accept a cell array of image data. Each element
corresponds to an image in the batch.
transform function must return data that matches the input
size of the network. The
transform function does not support
one-to-many observation mappings.
transform function supports prefetching when the
ImageDatastore reads a batch of JPG or PNG
image files. For these image types, do not use the
readFcn argument of
to apply image preprocessing, as this option is usually significantly
slower. If you use a custom read function, then
ImageDatastore does not prefetch.
combine function concatenates the data read from multiple datastores
and maintains parity between the datastores.
Concatenate data into a two-column table or two-column cell array for training networks with a single input, such as image-to-image regression networks.
Concatenate data to a (
array for training networks with multiple inputs.