This example shows how to modify a pretrained MobileNet v2 network to create a YOLO v2 object detection network. This approach offers additional flexibility compared to the
yolov2Layers function, which returns a canonical YOLO v2 object detector.
The procedure to convert a pretrained network into a YOLO v2 network is similar to the transfer learning procedure for image classification:
Load the pretrained network.
Select a layer from the pretrained network to use for feature extraction.
Remove all the layers after the feature extraction layer.
Add new layers to support the object detection task.
You can also implement this procedure using the
Load a pretrained MobileNet v2 network using
mobilenetv2. This requires the Deep Learning Toolbox Model for MobileNet v2 Network™.
% Load a pretrained network. net = mobilenetv2(); % Convert network into a layer graph object % in order to manipulate the layers. lgraph = layerGraph(net);
Change the image size of the network based on the training data requirements. To illustrate this step, assume the required image size is [300 300 3] for RGB images.
% Input size for detector. imageInputSize = [300 300 3]; % Create new image input layer. Set the new layer name % to the original layer name. imgLayer = imageInputLayer(imageInputSize,"Name","input_1")
imgLayer = ImageInputLayer with properties: Name: 'input_1' InputSize: [300 300 3] Hyperparameters DataAugmentation: 'none' Normalization: 'zerocenter' NormalizationDimension: 'auto' Mean: 
% Replace old image input layer. lgraph = replaceLayer(lgraph,"input_1",imgLayer);
A good feature extraction layer for YOLO v2 is one where the output feature width and height is between 8 and 16 times smaller than the input image. This amount of downsampling is a trade-off between spatial resolution and quality of output features. The
analyzeNetwork app or
deepNetworkDesigner app can be used to determine the output sizes of layers within a network. Note that selecting an optimal feature extraction layer requires empirical evaluation.
Set the feature extraction layer to
“block_12_add” from MobileNet v2. Because the required input size was previously set to [300 300], the output feature size is [19 19]. This results in a downsampling factor of about 16.
featureExtractionLayer = "block_12_add";
To easily remove layers from a deep network, such as MobileNet v2, use the
deepNetworkDesigner app. Import the network into the app to manually remove the layers after
"block_12_add". Export the modified network to your workspace. This example uses a pre-saved version of MobileNet v2 which was exported from the app.
% Load a network modified using Deep Network Designer. modified = load("mobilenetv2Block12Add.mat"); lgraph = modified.mobilenetv2Block12Add;
Alternatively, if you have a list of layers to remove, you can use the
removeLayers function to remove them manually.
The detection subnetwork consists of groups of serially connected convolution, ReLU, and batch normalization layers. These layers are followed by a yolov2TransformLayer and a yolov2OutputLayer.
Create the convolution, ReLU, and batch normalization portion of the detection sub-network.
% Set the convolution layer filter size to [3 3]. % This size is common in CNN architectures. filterSize = [3 3]; % Set the number of filters in the convolution layers % to match the number of channels in the % feature extraction layer output. numFilters = 96; % Create the detection subnetwork. % * The convolution layer uses "same" padding % to preserve the input size. detectionLayers = [ % group 1 convolution2dLayer(filterSize,numFilters,"Name","yolov2Conv1",... "Padding", "same", "WeightsInitializer",@(sz)randn(sz)*0.01) batchNormalizationLayer("Name","yolov2Batch1"); reluLayer("Name","yolov2Relu1"); % group 2 convolution2dLayer(filterSize,numFilters,"Name","yolov2Conv2",... "Padding", "same", "WeightsInitializer",@(sz)randn(sz)*0.01) batchNormalizationLayer("Name","yolov2Batch2"); reluLayer("Name","yolov2Relu2"); ]
detectionLayers = 6x1 Layer array with layers: 1 'yolov2Conv1' Convolution 96 3x3 convolutions with stride [1 1] and padding 'same' 2 'yolov2Batch1' Batch Normalization Batch normalization 3 'yolov2Relu1' ReLU ReLU 4 'yolov2Conv2' Convolution 96 3x3 convolutions with stride [1 1] and padding 'same' 5 'yolov2Batch2' Batch Normalization Batch normalization 6 'yolov2Relu2' ReLU ReLU
The remaining layers are configured based on application specific details such as number of object classes and anchor boxes.
% Define the number of classes to detect. numClasses = 5; % Define the anchor boxes. anchorBoxes = [ 16 16 32 16 ]; % Number of anchor boxes. numAnchors = size(anchorBoxes,1); % There are five predictions per anchor box: % * Predict the x, y, width, and height offset % for each anchor. % * Predict the intersection-over-union with ground % truth boxes. numPredictionsPerAnchor = 5; % Number of filters in last convolution layer. outputSize = numAnchors*(numClasses+numPredictionsPerAnchor);
% Final layers in detection sub-network. finalLayers = [ convolution2dLayer(1,outputSize,"Name","yolov2ClassConv",... "WeightsInitializer", @(sz)randn(sz)*0.01) yolov2TransformLayer(numAnchors,"Name","yolov2Transform") yolov2OutputLayer(anchorBoxes,"Name","yolov2OutputLayer") ];
Add the last layers to the network.
% Add the last layers to network. detectionLayers = [ detectionLayers finalLayers ]
detectionLayers = 9x1 Layer array with layers: 1 'yolov2Conv1' Convolution 96 3x3 convolutions with stride [1 1] and padding 'same' 2 'yolov2Batch1' Batch Normalization Batch normalization 3 'yolov2Relu1' ReLU ReLU 4 'yolov2Conv2' Convolution 96 3x3 convolutions with stride [1 1] and padding 'same' 5 'yolov2Batch2' Batch Normalization Batch normalization 6 'yolov2Relu2' ReLU ReLU 7 'yolov2ClassConv' Convolution 20 1x1 convolutions with stride [1 1] and padding [0 0 0 0] 8 'yolov2Transform' YOLO v2 Transform Layer YOLO v2 Transform Layer with 2 anchors 9 'yolov2OutputLayer' YOLO v2 Output YOLO v2 Output with 2 anchors
Attach the detection subnetwork to the feature extraction network.
% Add the detection subnetwork to the feature extraction network. lgraph = addLayers(lgraph,detectionLayers); % Connect the detection subnetwork to the feature extraction layer. lgraph = connectLayers(lgraph,featureExtractionLayer,"yolov2Conv1");
analyzeNetwork(lgraph) to check the network and then train a YOLO v2 object detector using the