Main Content

Estimate Anchor Boxes From Training Data

Anchor boxes are important parameters of deep learning object detectors such as Faster R-CNN and YOLO v2. The shape, scale, and number of anchor boxes impact the efficiency and accuracy of the detectors.

For more information, see Anchor Boxes for Object Detection.

Load Training Data

Load the vehicle dataset, which contains 295 images and associated box labels.

data = load('vehicleTrainingData.mat');
vehicleDataset = data.vehicleTrainingData;

Add the full path to the local vehicle data folder.

dataDir = fullfile(toolboxdir('vision'),'visiondata');
vehicleDataset.imageFilename = fullfile(dataDir,vehicleDataset.imageFilename);

Display the data set summary.


    imageFilename: 295×1 cell array of character vectors

    vehicle: 295×1 cell

Visualize Ground Truth Box Distribution

Visualize the labeled boxes to better understand the range of object sizes present in the data set.

Combine all the ground truth boxes into one array.

allBoxes = vertcat(vehicleDataset.vehicle{:});

Plot the box area versus the box aspect ratio.

aspectRatio = allBoxes(:,3) ./ allBoxes(:,4);
area = prod(allBoxes(:,3:4),2);

xlabel("Box Area")
ylabel("Aspect Ratio (width/height)");
title("Box Area vs. Aspect Ratio")

The plot shows a few groups of objects that are of similar size and shape, However, because the groups are spread out, manually choosing anchor boxes is difficult. A better way to estimate anchor boxes is to use a clustering algorithm that can group similar boxes together using a meaningful metric.

Estimate Anchor Boxes

Estimate anchor boxes from training data using the estimateAnchorBoxes function, which uses the intersection-over-union (IoU) distance metric.

A distance metric based on IoU is invariant to the size of boxes, unlike the Euclidean distance metric, which produces larger errors as the box sizes increase [1]. In addition, using an IoU distance metric leads to boxes of similar aspect ratios and sizes being clustered together, which results in anchor box estimates that fit the data.

Create a boxLabelDatastore using the ground truth boxes in the vehicle data set. If the preprocessing step for training an object detector involves resizing of the images, use transform and bboxresize to resize the bounding boxes in the boxLabelDatastore before estimating the anchor boxes.

trainingData = boxLabelDatastore(vehicleDataset(:,2:end));

Select the number of anchors and estimate the anchor boxes using estimateAnchorBoxes function.

numAnchors = 5;
[anchorBoxes,meanIoU] = estimateAnchorBoxes(trainingData,numAnchors);
anchorBoxes = 5×2

    21    27
    87   116
    67    92
    43    61
    86   105

Choosing the number of anchors is another training hyperparameter that requires careful selection using empirical analysis. One quality measure for judging the estimated anchor boxes is the mean IoU of the boxes in each cluster. The estimateAnchorBoxes function uses a k-means clustering algorithm with the IoU distance metric to calculate the overlap using the equation, 1 - bboxOverlapRatio(allBoxes,boxInCluster).

meanIoU = 0.8411

The mean IoU value greater than 0.5 ensures that the anchor boxes overlap well with the boxes in the training data. Increasing the number of anchors can improve the mean IoU measure. However, using more anchor boxes in an object detector can also increase the computation cost and lead to overfitting, which results in poor detector performance.

Sweep over a range of values and plot the mean IoU versus number of anchor boxes to measure the trade-off between number of anchors and mean IoU.

maxNumAnchors = 15;
meanIoU = zeros([maxNumAnchors,1]);
anchorBoxes = cell(maxNumAnchors, 1);
for k = 1:maxNumAnchors
    % Estimate anchors and mean IoU.
    [anchorBoxes{k},meanIoU(k)] = estimateAnchorBoxes(trainingData,k);    

ylabel("Mean IoU")
xlabel("Number of Anchors")
title("Number of Anchors vs. Mean IoU")

Using two anchor boxes results in a mean IoU value greater than 0.65, and using more than 7 anchor boxes yields only marginal improvement in mean IoU value. Given these results, the next step is to train and evaluate multiple object detectors using values between 2 and 6. This empirical analysis helps determine the number of anchor boxes required to satisfy application performance requirements, such as detection speed, or accuracy.


[1] Redmon, Joseph, and Ali Farhadi. “YOLO9000: Better, Faster, Stronger.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517–25. Honolulu, HI: IEEE, 2017.