This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.


Train an R-CNN deep learning object detector


detector = trainRCNNObjectDetector(trainingData,network,options)
detector = trainRCNNObjectDetector(___,Name,Value)
detector = trainRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn)



detector = trainRCNNObjectDetector(trainingData,network,options) returns an R-CNN (regions with convolutional neural networks) based object detector. The function uses deep learning to train the detector for multiclass object detection.

This function requires that you have Neural Network Toolbox™ and Statistics and Machine Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher.

This function also supports parallel computing using multiple MATLAB® workers. Enable parallel computing using the Computer Vision System Toolbox Preferences dialog box. To open Computer Vision System Toolbox™ preferences, on the Home tab, in the Environment section, click Preferences. Select Computer Vision System Toolbox.

detector = trainRCNNObjectDetector(___,Name,Value) returns a detector object with optional input properties specified by one or more Name,Value pair arguments.

detector = trainRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn) optionally trains an R-CNN detector using a custom region proposal function.


collapse all

Load training data and network layers.

load('rcnnStopSigns.mat', 'stopSigns', 'layers')

Add the image directory to the MATLAB path.

imDir = fullfile(matlabroot, 'toolbox', 'vision', 'visiondata',...

Set network training options to use mini-batch size of 32 to reduce GPU memory usage. Lower the InitialLearningRate to reduce the rate at which network parameters are changed. This is beneficial when fine-tuning a pre-trained network and prevents the network from changing too rapidly.

options = trainingOptions('sgdm', ...
  'MiniBatchSize', 32, ...
  'InitialLearnRate', 1e-6, ...
  'MaxEpochs', 10);

Train the R-CNN detector. Training can take a few minutes to complete.

rcnn = trainRCNNObjectDetector(stopSigns, layers, options, 'NegativeOverlapRange', [0 0.3]);
Training an R-CNN Object Detector for the following object classes:

* stopSign

Step 1 of 3: Extracting region proposals from 27 training images...done.

Step 2 of 3: Training a neural network to classify objects in training data...

|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|            3 |           50 |         9.27 |       0.2895 |       96.88% |     0.000001 |
|            5 |          100 |        14.77 |       0.2443 |       93.75% |     0.000001 |
|            8 |          150 |        20.29 |       0.0013 |      100.00% |     0.000001 |
|           10 |          200 |        25.94 |       0.1524 |       96.88% |     0.000001 |

Network training complete.

Step 3 of 3: Training bounding box regression models for each object class...100.00%...done.

R-CNN training complete.

Test the R-CNN detector on a test image.

img = imread('stopSignTest.jpg');

[bbox, score, label] = detect(rcnn, img, 'MiniBatchSize', 32);

Display strongest detection result.

[score, idx] = max(score);

bbox = bbox(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

detectedImg = insertObjectAnnotation(img, 'rectangle', bbox, annotation);


Remove the image directory from the path.


Resume training an R-CNN object detector using additional data. To illustrate this procedure, half the ground truth data will be used to initially train the detector. Then, training is resumed using all the data.

Load training data and initialize training options.

load('rcnnStopSigns.mat', 'stopSigns', 'layers')

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...

options = trainingOptions('sgdm', ...
    'MiniBatchSize', 32, ...
    'InitialLearnRate', 1e-6, ...
    'MaxEpochs', 10, ...
    'Verbose', false);

Train the R-CNN detector with a portion of the ground truth.

rcnn = trainRCNNObjectDetector(stopSigns(1:10,:), layers, options, 'NegativeOverlapRange', [0 0.3]);

Get the trained network layers from the detector. When you pass in an array of network layers to trainRCNNObjectDetector, they are used as-is to continue training.

network = rcnn.Network;
layers = network.Layers;

Resume training using all the training data.

rcnnFinal = trainRCNNObjectDetector(stopSigns, layers, options);

Create an R-CNN object detector for two object classes: dogs and cats.

objectClasses = {'dogs','cats'};

The network must be able to classify both dogs, cats, and a "background" class in order to be trained using trainRCNNObjectDetector. In this example, a one is added to include the background.

numClassesPlusBackground = numel(objectClasses) + 1;

The final fully connected layer of a network defines the number of classes that the network can classify. Set the final fully connected layer to have an output size equal to the number of classes plus a background class.

layers = [ ...
    imageInputLayer([28 28 1])

These network layers can now be used to train an R-CNN two-class object detector.

Create an R-CNN object detector and set it up to use a saved network checkpoint. A network checkpoint is saved every epoch during network training when the trainingOptions 'CheckpointPath' parameter is set. Network checkpoints are useful in case your training session terminates unexpectedly.

Load the stop sign training data.


Add full path to image files.

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...

Set the 'CheckpointPath' using the trainingOptions function.

checkpointLocation = tempdir;
options = trainingOptions('sgdm','Verbose',false, ...

Train the R-CNN object detector with a few images.

rcnn = trainRCNNObjectDetector(stopSigns(1:3,:),layers,options);

Load a saved network checkpoint.

wildcardFilePath = fullfile(checkpointLocation,'convnet_checkpoint__*.mat');
contents = dir(wildcardFilePath);

Load one of the checkpoint networks.

filepath = fullfile(contents(1).folder,contents(1).name);
checkpoint = load(filepath);
ans = 

  SeriesNetwork with properties:

    Layers: [15×1 nnet.cnn.layer.Layer]

Create a new R-CNN object detector and set it up to use the saved network.

rcnnCheckPoint = rcnnObjectDetector();
rcnnCheckPoint.RegionProposalFcn = @rcnnObjectDetector.proposeRegions;

Set the Network to the saved network checkpoint.

rcnnCheckPoint.Network =
rcnnCheckPoint = 

  rcnnObjectDetector with properties:

              Network: [1×1 SeriesNetwork]
           ClassNames: {'stopSign'  'Background'}
    RegionProposalFcn: @rcnnObjectDetector.proposeRegions

Input Arguments

collapse all

Labeled ground truth images, specified as a table with two or more columns. The first column must contain path and file names to images that are either grayscale or true color (RGB). The remaining columns must contain bounding boxes related to the corresponding image. Each column represents a single object class, such as a car, dog, flower, or stop sign.

Each bounding box must be in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the object in the corresponding image. The table variable name defines the object class name. To create the ground truth table, use the Image Labeler app. Boxes smaller than 32-by-32 are not used for training.

Pretrained network, specified as a SeriesNetwork object or an array of Layer objects. For example,

layers = [imageInputLayer([28 28 3])
        convolution2dLayer([5 5],10)

The network is trained to classify object classes defined in the groundTruth table.

When the network is a SeriesNetwork, the network layers are automatically adjusted to support the number of object classes defined within the groundTruth training data. The background is added as an additional class.

When the network is an array of Layer objects, the network must have a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer. You can also use this input type to resume training from a previous session. Resuming the training is useful when the network requires additional rounds of fine-tuning, and when you want to train with additional training data.

Training options, specified as an object returned by the trainingOptions function from the Neural Network Toolbox. The training options define the training parameters of the neural network.

To fine-tune a pretrained network for detection, lower the initial learning rate to avoid changing the model parameters too rapidly. You can use the following syntax to adjust the learning rate:

options = trainingOptions('sgdm','InitialLearningRate',1e-6);
rcnn = trainRCNNObjectDetector(groundTruth,network,options);
Because network training can take a few hours, use the 'CheckpointPath' property of trainingOptions to save your progress periodically.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'PositiveOverlapRange',[0.5 1].

collapse all

Positive training sample ratios for range of bounding box overlap, specified as the comma-separated pair consisting of 'PositiveOverlapRange' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the PositiveOverlapRange and NegativeOverlapRange is defined as:


A and B are bounding boxes.

Negative training sample ratios for range of bounding box overlap, specified as the comma-separated pair consisting of 'NegativeOverlapRange' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of 'NumStrongestRegions' and an integer. Reduce this value to speed up processing time, although doing so decreases training accuracy. To use all region proposals, set this value to inf.

Custom region proposal function handle, specified as the comma-separated pair consisting of 'RegionProposalFcn' and the function name. If you do not specify a custom region proposal function, the default variant of the Edge Boxes algorithm [3] , set in rcnnObjectDetector, is used. A custom proposalFcn must have the following functional form:

 [bboxes,scores] = proposalFcn(I)

The input, I, is an image defined in the groundTruth table. The function must return rectangular bounding boxes in an M-by-4 array. Each row of bboxes contains a four-element vector, [x,y,width,height], that specifies the upper–left corner and size of a bounding box in pixels. The function must also return a score for each bounding box in an M-by-1 vector. Higher scores indicate that the bounding box is more likely to contain an object. The scores are used to select the strongest regions, which you can specify in NumStrongestRegions.

Output Arguments

collapse all

Trained R-CNN based object detector, returned as an rcnnObjectDetector object. You can train an R-CNN detector to detect multiple object classes.


  • This implementation of R-CNN does not train an SVM classifier for each object class.



[1] Girshick, R., J. Donahue, T. Darrell, and J. Malik. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.”Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, pp. 580–587.

[2] Girshick, R. “Fast R-CNN.” Proceedings of the IEEE International Conference on Computer Vision. 2015, pp. 1440–1448.

[3] Zitnick, C. Lawrence, and P. Dollar. “Edge Boxes: Locating Object Proposals from Edges.” Computer Vision-ECCV, Springer International Publishing. 2014, pp. 391–405.

Introduced in R2016b