Train Mask R-CNN network to perform instance segmentation
trains a Mask R-CNN network. A trained Mask R-CNN network object can perform instance
segmentation to detect and segment multiple object classes. This syntax supports transfer
learning on a pretrained Mask R-CNN network and training an uninitialized Mask R-CNN
trainedDetector = trainMaskRCNN(
This function requires that you have Deep Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
trainingData — Labeled ground truth
Labeled ground truth training data, specified as a datastore. Your data must be set
up so that calling the datastore with the
readall functions returns a cell array
with four columns. This table describes the format of each column.
RGB image that serves as a network input, specified as an H-by-W-by-3 numeric array.
Bounding boxes, specified as M-by-4 matrices, where M is the number of objects within the image. Each bounding box has the format [x y width height], where [x, y] represent the top-left coordinates of the bounding box.
Object class names, specified as an M-by-1 categorical vector. All categorical data returned by the datastore must contain the same categories.
Binary masks, specified as a logical array of size H-by-W-by-M. Each mask is the segmentation of one instance in the image.
You can create a datastore that returns data in the required format using these steps:
imageDatastorethat returns RGB image data
boxLabelDatastorethat returns bounding box data and instance labels as a two-element cell array
imageDatastoreand specify a custom read function that returns mask data as a binary matrix
Combine the three datastores using the
For more information, see Getting Started with Mask R-CNN for Instance Segmentation.
network — Mask R-CNN network to train
Mask R-CNN network to train, specified as a
options — Training options
TrainingOptionsSGDM object |
TrainingOptionsRMSProp object |
Training options, specified as a
object returned by the
trainingOptions (Deep Learning Toolbox) function. To specify the
solver name and other options for network training, use the
trainingOptions function. You must set the
BatchNormalizationStatistics property of the object as
"moving" and the
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
trainMaskRCNN(trainingData,network,options,NumRegionsToSample=64) samples 64
region proposals from each training image
PositiveOverlapRange — Bounding box overlap ratios for positive training samples
[0.5 1] (default) | two-element numeric vector
Bounding box overlap ratios for positive training samples, specified as a two-element numeric vector with values in the range [0, 1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.
The overlap ratio for bounding boxes A and B is:
NegativeOverlapRange — Bounding box overlap ratios for negative training samples
[0.1 0.5] (default) | two-element numeric vector
Bounding box overlap ratios for negative training samples, specified as a two-element numeric vector with values in the range [0, 1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.
The overlap ratio for bounding boxes A and B is:
NumStrongestRegions — Maximum number of strongest region proposals
1000 (default) | positive integer |
Maximum number of strongest region proposals to use for generating training
samples, specified as a positive integer. Reduce this value to speed up processing
time at the cost of training accuracy. To use all region proposals, set this value to
NumRegionsToSample — Number of region proposals
128 (default) | positive integer
Number of region proposals to randomly sample from each training image, specified as a positive integer. Reduce the number of regions to sample to reduce memory usage and speed up training. Reducing the value can also decrease training accuracy.
FreezeSubNetwork — Subnetworks to freeze
"none" (default) |
Subnetworks to freeze during training, specified as one of these values:
"none"— Do not freeze subnetworks
"backbone"— Freeze the feature extraction subnetwork, including the layers following the ROI align layer
"rpn"— Freeze the region proposal subnetwork
["backbone" "rpn"]— Freeze both the feature extraction and the region proposal subnetworks
The weight of layers in frozen subnetworks does not change during training.
ExperimentManager — Training experiment monitor
"none" (default) |
Training experiment monitor, specified as an
experiments.Monitor (Deep Learning Toolbox) object for
use with the Experiment
Manager (Deep Learning Toolbox) app. You can use this object to track the progress of training, update
information fields in the training results table, record values of the metrics used by
the training, and to produce training plots.
Information monitored during training:
Training loss at each iteration
Training accuracy at each iteration
Training root mean square error (RMSE) for the box regression layer
Training loss for the mask segmentation branch
Learning rate at each iteration
Validation information when the training
contains validation data:
Validation loss at each iteration
Validation accuracy at each iteration
Validation RMSE at each iteration
Validation loss for the mask segmentation branch
trainedDetector — Trained Mask R-CNN network
Trained Mask R-CNN network, returned as a
info — Training progress information
Training progress information, returned as a structure. Each field corresponds to a stage of training.
TrainingLoss— Training loss at each iteration. The loss is the combination of the region proposal network (RPN), classification, regression and mask loss used to train the Mask R-CNN network.
TrainingRPNLoss— Total RPN loss at the end of each iteration.
TrainingRMSE— Training root mean squared error (RMSE) for the box regression layer at the end of each iteration.
TrainingMaskLoss— Training cross-entropy loss for the mask segmentation branch at the end of each iteration.
LearnRate— Learning rate at each iteration.
ValidationLoss— Validation loss at each iteration.
ValidationRPNLoss— Validation RPN loss at each iteration.
ValidationRMSE— Validation RMSE at each iteration.
ValidationMaskLoss— Validation cross-entropy loss for the mask segmentation branch at each iteration.
Each field is a numeric vector with one element per training iteration. Values that
are not calculated at a specific iteration are assigned as
structure contains the
ValidationMaskLoss fields only when
specifies validation data.
trainMaskRCNNfunction has a high GPU memory requirement. It is recommended to train a Mask R-CNN network with at least 12 GB of available GPU memory.
When you want to perform transfer learning on a data set with similar content to the COCO data set, freezing the feature extraction and region proposal subnetworks can help the network training converge faster.