# Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

# trainRCNNObjectDetector

Train an R-CNN deep learning object detector

## Syntax

``detector = trainRCNNObjectDetector(trainingData,network,options)``
``detector = trainRCNNObjectDetector(___,Name,Value)``
``detector = trainRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn)``

## Description

example

````detector = trainRCNNObjectDetector(trainingData,network,options)` returns an R-CNN (regions with convolutional neural networks) based object detector. The function uses deep learning to train the detector for multiclass object detection.This function requires that you have Neural Network Toolbox™ and Statistics and Machine Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher.This function also supports parallel computing using multiple MATLAB® workers. Enable parallel computing using the Computer Vision System Toolbox Preferences dialog box. To open Computer Vision System Toolbox™ preferences, on the Home tab, in the Environment section, click Preferences. Select Computer Vision System Toolbox.```
````detector = trainRCNNObjectDetector(___,Name,Value)` returns a `detector` object with optional input properties specified by one or more `Name,Value` pair arguments.```
````detector = trainRCNNObjectDetector(___,'RegionProposalFcn',proposalFcn)` optionally trains an R-CNN detector using a custom region proposal function.```

## Examples

collapse all

Load training data and network layers.

```load('rcnnStopSigns.mat', 'stopSigns', 'layers') ```

Add the image directory to the MATLAB path.

```imDir = fullfile(matlabroot, 'toolbox', 'vision', 'visiondata',... 'stopSignImages'); addpath(imDir); ```

Set network training options to use mini-batch size of 32 to reduce GPU memory usage. Lower the InitialLearningRate to reduce the rate at which network parameters are changed. This is beneficial when fine-tuning a pre-trained network and prevents the network from changing too rapidly.

```options = trainingOptions('sgdm', ... 'MiniBatchSize', 32, ... 'InitialLearnRate', 1e-6, ... 'MaxEpochs', 10); ```

Train the R-CNN detector. Training can take a few minutes to complete.

```rcnn = trainRCNNObjectDetector(stopSigns, layers, options, 'NegativeOverlapRange', [0 0.3]); ```
```******************************************************************* Training an R-CNN Object Detector for the following object classes: * stopSign Step 1 of 3: Extracting region proposals from 27 training images...done. Step 2 of 3: Training a neural network to classify objects in training data... |=========================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning| | | | (seconds) | Loss | Accuracy | Rate | |=========================================================================================| | 3 | 50 | 9.27 | 0.2895 | 96.88% | 0.000001 | | 5 | 100 | 14.77 | 0.2443 | 93.75% | 0.000001 | | 8 | 150 | 20.29 | 0.0013 | 100.00% | 0.000001 | | 10 | 200 | 25.94 | 0.1524 | 96.88% | 0.000001 | |=========================================================================================| Network training complete. Step 3 of 3: Training bounding box regression models for each object class...100.00%...done. R-CNN training complete. ******************************************************************* ```

Test the R-CNN detector on a test image.

```img = imread('stopSignTest.jpg'); [bbox, score, label] = detect(rcnn, img, 'MiniBatchSize', 32); ```

Display strongest detection result.

```[score, idx] = max(score); bbox = bbox(idx, :); annotation = sprintf('%s: (Confidence = %f)', label(idx), score); detectedImg = insertObjectAnnotation(img, 'rectangle', bbox, annotation); figure imshow(detectedImg) ```

Remove the image directory from the path.

```rmpath(imDir); ```

Resume training an R-CNN object detector using additional data. To illustrate this procedure, half the ground truth data will be used to initially train the detector. Then, training is resumed using all the data.

Load training data and initialize training options.

```load('rcnnStopSigns.mat', 'stopSigns', 'layers') stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ... stopSigns.imageFilename); options = trainingOptions('sgdm', ... 'MiniBatchSize', 32, ... 'InitialLearnRate', 1e-6, ... 'MaxEpochs', 10, ... 'Verbose', false); ```

Train the R-CNN detector with a portion of the ground truth.

```rcnn = trainRCNNObjectDetector(stopSigns(1:10,:), layers, options, 'NegativeOverlapRange', [0 0.3]); ```

Get the trained network layers from the detector. When you pass in an array of network layers to `trainRCNNObjectDetector`, they are used as-is to continue training.

```network = rcnn.Network; layers = network.Layers; ```

Resume training using all the training data.

```rcnnFinal = trainRCNNObjectDetector(stopSigns, layers, options); ```

Create an R-CNN object detector for two object classes: dogs and cats.

`objectClasses = {'dogs','cats'};`

The network must be able to classify both dogs, cats, and a "background" class in order to be trained using `trainRCNNObjectDetector`. In this example, a one is added to include the background.

`numClassesPlusBackground = numel(objectClasses) + 1;`

The final fully connected layer of a network defines the number of classes that the network can classify. Set the final fully connected layer to have an output size equal to the number of classes plus a background class.

```layers = [ ... imageInputLayer([28 28 1]) convolution2dLayer(5,20) fullyConnectedLayer(numClassesPlusBackground); softmaxLayer() classificationLayer()];```

These network layers can now be used to train an R-CNN two-class object detector.

Create an R-CNN object detector and set it up to use a saved network checkpoint. A network checkpoint is saved every epoch during network training when the `trainingOptions` 'CheckpointPath' parameter is set. Network checkpoints are useful in case your training session terminates unexpectedly.

Load the stop sign training data.

```load('rcnnStopSigns.mat','stopSigns','layers') ```

Add full path to image files.

```stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ... stopSigns.imageFilename); ```

Set the 'CheckpointPath' using the `trainingOptions` function.

```checkpointLocation = tempdir; options = trainingOptions('sgdm','Verbose',false, ... 'CheckpointPath',checkpointLocation); ```

Train the R-CNN object detector with a few images.

```rcnn = trainRCNNObjectDetector(stopSigns(1:3,:),layers,options); ```

Load a saved network checkpoint.

```wildcardFilePath = fullfile(checkpointLocation,'convnet_checkpoint__*.mat'); contents = dir(wildcardFilePath); ```

Load one of the checkpoint networks.

```filepath = fullfile(contents(1).folder,contents(1).name); checkpoint = load(filepath); checkpoint.net ```
```ans = SeriesNetwork with properties: Layers: [15×1 nnet.cnn.layer.Layer] ```

Create a new R-CNN object detector and set it up to use the saved network.

```rcnnCheckPoint = rcnnObjectDetector(); rcnnCheckPoint.RegionProposalFcn = @rcnnObjectDetector.proposeRegions; ```

Set the Network to the saved network checkpoint.

```rcnnCheckPoint.Network = checkpoint.net ```
```rcnnCheckPoint = rcnnObjectDetector with properties: Network: [1×1 SeriesNetwork] ClassNames: {'stopSign' 'Background'} RegionProposalFcn: @rcnnObjectDetector.proposeRegions ```

## Input Arguments

collapse all

Labeled ground truth images, specified as a table with two or more columns. The first column must contain path and file names to images that are either grayscale or true color (RGB). The remaining columns must contain bounding boxes related to the corresponding image. Each column represents a single object class, such as a car, dog, flower, or stop sign.

Each bounding box must be in the format [x,y,width,height]. The format specifies the upper-left corner location and size of the object in the corresponding image. The table variable name defines the object class name. To create the ground truth table, use the Image Labeler app. Boxes smaller than 32-by-32 are not used for training.

Pretrained network, specified as a `SeriesNetwork` object or an array of `Layer` objects. For example,

```layers = [imageInputLayer([28 28 3]) convolution2dLayer([5 5],10) reluLayer() fullyConnectedLayer(10) softmaxLayer() classificationLayer()]; ```

The network is trained to classify object classes defined in the `groundTruth` table.

When the network is a `SeriesNetwork`, the network layers are automatically adjusted to support the number of object classes defined within the `groundTruth` training data. The background is added as an additional class.

When the network is an array of `Layer` objects, the network must have a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer. You can also use this input type to resume training from a previous session. Resuming the training is useful when the network requires additional rounds of fine-tuning, and when you want to train with additional training data.

Training options, specified as an object returned by the `trainingOptions` function from the Neural Network Toolbox. The training options define the training parameters of the neural network.

To fine-tune a pretrained network for detection, lower the initial learning rate to avoid changing the model parameters too rapidly. You can use the following syntax to adjust the learning rate:

```options = trainingOptions('sgdm','InitialLearningRate',1e-6); rcnn = trainRCNNObjectDetector(groundTruth,network,options); ```
Because network training can take a few hours, use the `'CheckpointPath'` property of `trainingOptions` to save your progress periodically.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'PositiveOverlapRange'`,```[0.5 1]```.

collapse all

Positive training sample ratios for range of bounding box overlap, specified as the comma-separated pair consisting of '`PositiveOverlapRange`' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the `PositiveOverlapRange` and `NegativeOverlapRange` is defined as:

`$\frac{area\left(A\cap B\right)}{area\left(A\cup B\right)}$`

A and B are bounding boxes.

Negative training sample ratios for range of bounding box overlap, specified as the comma-separated pair consisting of '`NegativeOverlapRange`' and a two-element vector. The vector contains values in the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of '`NumStrongestRegions`' and an integer. Reduce this value to speed up processing time, although doing so decreases training accuracy. To use all region proposals, set this value to `inf`.

Custom region proposal function handle, specified as the comma-separated pair consisting of '`RegionProposalFcn`' and the function name. If you do not specify a custom region proposal function, the default variant of the Edge Boxes algorithm [3] , set in `rcnnObjectDetector`, is used. A custom `proposalFcn` must have the following functional form:

` [bboxes,scores] = proposalFcn(I)`

The input, `I`, is an image defined in the `groundTruth` table. The function must return rectangular bounding boxes in an M-by-4 array. Each row of `bboxes` contains a four-element vector, [x,y,width,height], that specifies the upper–left corner and size of a bounding box in pixels. The function must also return a score for each bounding box in an M-by-1 vector. Higher scores indicate that the bounding box is more likely to contain an object. The scores are used to select the strongest regions, which you can specify in `NumStrongestRegions`.

## Output Arguments

collapse all

Trained R-CNN based object detector, returned as an `rcnnObjectDetector` object. You can train an R-CNN detector to detect multiple object classes.

## Limitations

• This implementation of R-CNN does not train an SVM classifier for each object class.

## References

[1] Girshick, R., J. Donahue, T. Darrell, and J. Malik. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.”Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, pp. 580–587.

[2] Girshick, R. “Fast R-CNN.” Proceedings of the IEEE International Conference on Computer Vision. 2015, pp. 1440–1448.

[3] Zitnick, C. Lawrence, and P. Dollar. “Edge Boxes: Locating Object Proposals from Edges.” Computer Vision-ECCV, Springer International Publishing. 2014, pp. 391–405.