Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

trainFastRCNNObjectDetector

Train a Fast R-CNN deep learning object detector

Syntax

trainedDetector = trainFastRCNNObjectDetector(trainingData,network,options)
trainedDetector = trainFastRCNNObjectDetector(trainingData,checkpoint,options)
trainedDetector = trainFastRCNNObjectDetector(trainingData,detector,options)
trainedDetector = trainFastRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn)
trainedDetector = trainFastRCNNObjectDetector(___,Name,Value)

Description

example

trainedDetector = trainFastRCNNObjectDetector(trainingData,network,options) trains a Fast R-CNN (regions with convolution neural networks) object detector using deep learning. You can train a Fast R-CNN detector to detect multiple object classes. Specify your ground truth training data, your pretrained network, and training options.

This function requires that you have Neural Network Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher.

trainedDetector = trainFastRCNNObjectDetector(trainingData,checkpoint,options) resumes training from a detector checkpoint.

trainedDetector = trainFastRCNNObjectDetector(trainingData,detector,options) continues training a detector with additional training data or performs more training iterations to improve detector accuracy.

trainedDetector = trainFastRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn) optionally trains a custom region proposal function, proposalFcn, using any of the previous inputs. If you do not specify a proposal function, the function uses a variation of the Edge Boxes algorithm.

trainedDetector = trainFastRCNNObjectDetector(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments.

Examples

collapse all

Load training data.

data = load('rcnnStopSigns.mat', 'stopSigns', 'fastRCNNLayers');
stopSigns = data.stopSigns;
fastRCNNLayers = data.fastRCNNLayers;

Add fullpath to image files.

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...
    stopSigns.imageFilename);

Set network training options:

  • Lower the InitialLearningRate to reduce the rate at which network parameters are changed.

  • Set the CheckpointPath to save detector checkpoints to a temporary directory. Change this to another location if required.

options = trainingOptions('sgdm', ...
    'InitialLearnRate', 1e-6, ...
    'MaxEpochs', 10, ...
    'CheckpointPath', tempdir);

Train the Fast R-CNN detector. Training can take a few minutes to complete.

frcnn = trainFastRCNNObjectDetector(stopSigns, fastRCNNLayers , options, ...
    'NegativeOverlapRange', [0 0.1], ...
    'PositiveOverlapRange', [0.7 1], ...
    'SmallestImageDimension', 600);
*******************************************************************
Training a Fast R-CNN Object Detector for the following object classes:

* stopSign

--> Extracting region proposals from 27 training images...done.

|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            1 |            1 |         0.19 |       0.0618 |      100.00% |       0.0000 |
|            3 |           50 |         9.72 |       0.0122 |      100.00% |       0.0000 |
|            5 |          100 |        19.41 |       0.0174 |      100.00% |       0.0000 |
|            8 |          150 |        29.57 |       0.0124 |      100.00% |       0.0000 |
|           10 |          200 |        40.43 |       0.0273 |      100.00% |       0.0000 |
|           10 |          210 |        42.59 |       0.0300 |      100.00% |       0.0000 |
|=========================================================================================|

Test the Fast R-CNN detector on a test image.

img = imread('stopSignTest.jpg');

Run the detector.

[bbox, score, label] = detect(frcnn, img);

Display detection results.

detectedImg = insertShape(img, 'Rectangle', bbox);
figure
imshow(detectedImg)

Input Arguments

collapse all

Labeled ground truth images, specified as a table with two or more columns. The first column must contain paths and file names to grayscale or truecolor (RGB) images. The remaining columns must contain bounding boxes related to the corresponding image. Each column represents a single object class, such as a car, dog, flower, or stop sign.

Each bounding box must be in the format [x y width height]. The format specifies the upper-left corner location and size of the object in the corresponding image. The table variable name defines the object class name. To create the ground truth table, use the Image Labeler app. Boxes smaller than 32-by-32 are not used for training.

Pretrained network, specified as a SeriesNetwork object or as an array of Layer objects. For example:

layers = [imageInputLayer([28 28 3])
        convolution2dLayer([5 5],10)
        reluLayer()
        fullyConnectedLayer(10)
        softmaxLayer()
        classificationLayer()];

The network is trained to classify the object classes defined in the trainingData table.

When the network is a SeriesNetwork object, the function adjusts the network layers to support the number of object classes defined within the specified trainingData. The background is added as an additional class.

When the network is an array of Layer objects, the network must have a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer.

The function replaces the last averagePooling2dLayer or maxPooling2dLayer with an ROI pooling layer.

Training parameters of the neural network, specified using the trainingOptions function.

To fine-tune a pre-trained network for detection, lower the initial learning rate to avoid changing the model parameters too rapidly. For example:

options = trainingOptions('sgdm', ...
                          'InitialLearningRate',1e-6, ...
                          'CheckpointPath',tempdir);
 
detector = trainFastRCNNObjectDetector(trainingData,network,options);

To save the detector after every epoch, set the 'CheckpointPath' property when using the trainingOptions function. Saving a checkpoint after every epoch is recommended because network training can take a few hours.

Saved detector checkpoint, specified as a fastRCNNObjectDetector object. To save the detector after every epoch, set the 'CheckpointPath' property when using the trainingOptions function. Saving a checkpoint after every epoch is recommended because network training can take a few hours.

To load a checkpoint for a previously trained detector, load the MAT-file from the checkpoint path. For example, if the 'CheckpointPath' property of options is '/tmp', load a checkpoint MAT-file using:

data = load('/tmp/faster_rcnn_checkpoint__105__2016_11_18__14_25_08.mat');

The name of the MAT-file includes the iteration number and timestamp of when the detector checkpoint was saved. The detector is saved in the detector variable of the file. Pass this file back into the trainFasterRCNNObjectDetector function:

frcnn = trainFastRCNNObjectDetector(stopSigns,...
                           data.detector,options);

Previously trained Fast R-CNN object detector, specified as a fastRCNNObjectDetector object.

Region proposal method, specified as a function handle. The function must have the form:

[bboxes,scores] = proposalFcn(I)

The input, I, is an image defined in the trainingData table. The function must return rectangular bound boxes, bboxes, in an m-by-4 array. Each row of bboxes contains a four-element vector, [x y width height]. This vector specifies the upper-left corner and size of a bounding box in pixels. The function must also return a score for each bounding box in an m-by-1 vector. Higher score values indicate that the bounding box is more likely to contain an object. The scores are used to select the strongest n regions, where n is defined by the value of NumStrongestRegions.

If you do not specify a custom proposal function, the function uses a variation of the Edge Boxes algorithm.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'PositiveOverlapRange',[0.75 1]

collapse all

Bounding box overlap ratios for positive training samples, specified as the comma-separated pair consisting of 'PositiveOverlapRange' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the PositiveOverlapRange and NegativeOverlapRange is defined as:

area(AB)area(AB)


A and B are bounding boxes.

Bounding box overlap ratios for negative training samples, specified as the comma-separated pair consisting of NegativeOverlapRange and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

The overlap ratio used for both the PositiveOverlapRange and NegativeOverlapRange is defined as:

area(AB)area(AB)


A and B are bounding boxes.

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of 'NumStrongestRegions' and a positive integer. Reduce this value to speed up processing time at the cost of training accuracy. To use all region proposals, set this value to Inf.

Length of smallest image dimension, either width or height, specified as the comma-separated pair consisting of 'SmallestImageDimension' and a positive integer. Training images are resized such that the length of the shortest dimension is equal to the specified integer. By default, training images are not resized. Resizing training images helps reduce computational costs and memory used when training images are large. Typical values range from 400–600 pixels.

Output Arguments

collapse all

Trained Fast R-CNN object detector, returned as a fastRCNNObjectDetector object.

References

[1] Girshick, Ross. "Fast R-CNN." Proceedings of the IEEE International Conference on Computer Vision. 2015.

[2] Zitnick, C. Lawrence, and Piotr Dollar. "Edge Boxes: Locating Object Proposals From Edges." Computer Vision-ECCV 2014. Springer International Publishing, 2014, pp. 391–405.

Introduced in R2017a