|On this page…|
The vision.CascadeObjectDetector System object comes with several pretrained classifiers for detecting frontal faces, profile faces, noses, upperbody, and eyes. However, these classifiers may not be sufficient for a particular application. Computer Vision System Toolbox™ provides the trainCascadeObjectDetector function to train a custom classifier.
The Computer Vision System Toolbox cascade object detector can detect object categories whose aspect ratio does not vary significantly. Objects whose aspect ratio remains approximately fixed include faces, stop signs, or cars viewed from one side.
The vision.CascadeObjectDetector System object detects objects in images by sliding a window over the image. The detector then uses a cascade classifier to decide whether the window contains the object of interest. The size of the window varies to detect objects at different scales, but its aspect ratio remains fixed. The detector is very sensitive to out-of-plane rotation, because the aspect ratio changes for most 3-D objects. Thus, you need to train a detector for each orientation of the object. Training a single detector to handle all orientations will not work.
The cascade classifier consists of stages, where each stage is an ensemble of weak learners. The weak learners are simple classifiers called decision stumps. Each stage is trained using a technique called boosting. Boosting provides the ability to train a highly accurate classifier by taking a weighted average of the decisions made by the weak learners.
Each stage of the classifier labels the region defined by the current location of the sliding window as either positive or negative. Positive indicates an object was found and negative indicates no object. If the label is negative, the classification of this region is complete, and the detector slides the window to the next location. If the label is positive, the classifier passes the region to the next stage. The detector reports an object found at the current window location when the final stage classifies the region as positive.
The stages are designed to reject negative samples as fast as possible. The assumption is that the vast majority of windows do not contain the object of interest. Conversely, true positives are rare, and worth taking the time to verify. A true positive occurs when a positive sample is correctly classified. A false positive occurs when a negative sample is mistakenly classified as positive. A false negative occurs when a positive sample is mistakenly classified as negative. To work well, each stage in the cascade must have a low false negative rate. If a stage incorrectly labels an object as negative, the classification stops, and there is no way to correct the mistake. However, each stage may have a high false positive rate. Even if it incorrectly labels a nonobject as positive, the mistake can be corrected by subsequent stages.
The overall false positive rate of the cascade classifier is fs, where f is the false positive rate per stage in the range (0 1), and s is the number of stages. Similarly, the overall true positive rate is ts, where t is the true positive rate per stage in the range (0 1]. Thus, you can see that adding more stages reduces the overall false-postive rate, but it also reduces the overall true positive rate.
Cascade classifier training requires a set of positive samples and a set of negative images. You must provide a set of positive images with regions of interest specified to be used as positive samples. You can use the trainingImageLabeler app to label objects of interest with bounding boxes. The app outputs an array of structs to use for positive samples. You also must provide a set of negative images from which the function generates negative samples automatically. Set the number of stages, feature type, and other function parameters to achieve acceptable detector accuracy.
You want to select the function parameters to optimize the number of stages, false positive rate, true positive rate, and the type of features to use for training. When you set the parameters, consider the tradeoffs described in the following table.
|Have a large training set, (in the thousands).||You can increase the number of stages and set a higher false positive rate for each stage.|
|Have a small training set.||You may need to decrease the number of stages and set a lower false positive rate for each stage.|
|Want to reduce the probability of missing an object.||You should increase the true positive rate. However, a high true positive rate may prevent you from being able to achieve the desired false positive rate per stage. This means that the detector will be more likely to produce false detections.|
|Want to reduce the number of false detections.||You should increase the number of stages or decrease the false alarm rate per stage.|
Choose the feature which suits the type of object detection you need. The trainCascadeObjectDetector supports three types of features: Haar, Local Binary Patterns (LBP), and Histograms of Oriented Gradients (HOG). Historically, Haar and LBP features have been used for detecting faces. They work well for representing fine-scale textures. The HOG features have been used for detecting objects such as people and cars. They are useful for capturing the overall shape of an object. For example, in the following visualization of the HOG features, you can see the outline of the bicycle.
You may need to run the trainCascadeObjectDetector function multiple times to tune the parameters. To save time, you can try using LBP or HOG features on a small subset of your data because training a detector using Haar features takes much longer. After that, you can try the Haar features to see if the accuracy can be improved.
You can specify positive samples in two ways. One way is to specify rectangular regions in a larger image. The regions contain the objects of interest. The other approach is to crop out the object of interest from the image and save it as a separate image. Then, you can specify the region to be the entire image. You can also generate more positive samples from existing ones by adding rotation or noise, or by varying brightness or contrast.
Negative samples are not specified explicitly. Instead, the trainCascadeObjectDetector function automatically generates negative samples from user-supplied negative images that do not contain objects of interest. Before training each new stage, the function runs the detector consisting of the stages already trained on the negative images. If any objects are detected from these, they must be false positives. These false positives are used as negative samples. In this way, each new stage of the cascade is trained to correct mistakes made by previous stages.
As more stages are added, the detector's overall false positive rate decreases, causing generation of negative samples to be more difficult. For this reason, it is helpful to supply as many negative images as possible. To improve training accuracy, supply negative images that contain backgrounds typically associated with the objects of interest. Also, include negative images that contain nonobjects similar in appearance to the objects of interest. For example, if you are training a stop-sign detector, the negative images should contain other road signs and shapes.
There is a trade-off between fewer stages with a lower false positive rate per stage or more stages with a higher false positive rate per stage. Stages with a lower false positive rate are more complex because they contain a greater number of weak learners. Stages with a higher false positive rate contain fewer weak learners. Generally, it is better to have a greater number of simple stages because at each stage the overall false positive rate decreases exponentially. For example, if the false positive rate at each stage is 50%, then the overall false positive rate of a cascade classifier with two stages is 25%. With three stages, it becomes 12.5%, and so on. However, the greater the number of stages, the greater the amount of training data the classifier requires. Also, increasing the number of stages increases the false negative rate. This results in a greater chance of rejecting a positive sample by mistake. You should set the false positive rate (FalseAlarmRate) and the number of stages, (NumCascadeStages) to yield an acceptable overall false positive rate. Then, you can tune these two parameters experimentally.
There are cases when the training may terminate early. For example, training may stop after seven stages, even though you set the number of stages parameter to 20. This can happen when the function cannot generate enough negative samples. Note, that if you run the function again and set the number of stages to seven, you will not get the same result. This is because the number of positive and negative samples to use for each stage will be recalculated for the new number of stages.
Training a good detector requires thousands of training samples. Processing time for a large amount of data varies. It is likely to take on the order of hours or even days. During training, the function displays the time it took to train each stage in the MATLAB® command window. Training time depends on the type of feature you specify. Using Haar features takes much longer than using LBP or HOG features.
The trainCascadeObjectDetector function automatically determines the number of positive samples to use to train each stage. The number is based on the total number of positive samples supplied by the user and the values of the TruePositiveRate and NumCascadeStages parameters.
The number of available positive samples used to train each stage depends on the true positive rate. The rate specifies what percentage of positive samples the function may classify as negative. If a sample is classified as a negative by any stage, it never reaches subsequent stages. For example, suppose you set the TruePositiveRate to 0.9, and all of the available samples are used to train the first stage. In this case, 10% of the positive samples may be rejected as negatives, and only 90% of the total positive samples are available for training the second stage. If training continues, then each stage is trained with fewer and fewer samples. Each subsequent stage must solve an increasingly more difficult classification problem with fewer positive samples. With each stage getting fewer samples, the later stages are likely to over-fit the data.
Ideally, you want to use the same number of samples to train each stage. To do so, the number of positive samples used to train each stage must be less than the total number of available positive samples. The only exception is that — when the value of TruePositiveRate times the total number of positive samples is less than 1, no positive samples are rejected as negatives.
The function calculates the number of positive samples to use at each stage using the following formula:
number of positive samples = floor(totalPositiveSamples / (1 + (NumCascadeStages - 1) * (1 - TruePositiveRate)))
Unfortunately, this calculation does not guarantee that there is going to be the same number of positive samples available for each stage. The reason is that it is impossible to predict with certainty how many positive samples are going to be rejected as negatives. The training continues as long as the number of positive samples available to train a stage is greater than 10% of the number of samples the function determined automatically using the preceding formula. If there are not enough positive samples the training stops and the function issues a warning. It will output a classifier consisting of the stages it has been able to train up to this point. If the training stops, you can add more positive samples. Alternatively, you can increase TruePositiveRate. Reducing the number of stages would also work, but such reduction can also result in a higher overall false alarm rate.
The function calculates the number of negative samples used at each stage. This calculation is done by multiplying the number of positive samples used at each stage by the value of NegativeSamplesFactor.
Just as with positive samples, there is no guarantee that the calculated number of negative samples are always available for a particular stage. The trainCascadeObjectDetector function generates negative samples from the negative images. However, with each new stage, the overall false alarm rate of the cascade classifier decreases, making it less likely to find the negative samples.
The training continues as long as the number of negative samples available to train a stage is greater than 10% of the calculated number of negative samples. If there are not enough negative samples the training stops and the function issues a warning. It outputs a classifier consisting of the stages it has been able to train up to this point. When the training stops, the best approach is to add more negative images. Alternatively, you can try to reduce the number of stages, or increase the false positive rate.