Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When humans look at images or video, we can recognize and locate objects of interest within a matter of moments. The goal of object detection is to replicate this intelligence using a computer.
Object detection is a key technology behind advanced driver assistance systems (ADAS) that enable cars to detect driving lanes or perform pedestrian detection to improve road safety. Object detection is also useful in applications such as video surveillance or image retrieval systems.
You can use a variety of techniques to perform object detection. Popular deep learning–based approaches using convolutional neural networks (CNNs), such as R-CNN and YOLO v2, automatically learn to detect objects within images.
You can choose from two key approaches to get started with object detection using deep learning:
Whether you create a custom object detector or use a pretrained one, you will need to decide what type of object detection network you want to use: a two-stage network or a single-stage network.
The initial stage of two-stage networks, such as R-CNN and its variants, identifies region proposals, or subsets of the image that might contain an object. The second stage classifies the objects within the region proposals. Two-stage networks can achieve very accurate object detection results; however, they are typically slower than single-stage networks.
In single-stage networks, such as YOLO v2, the CNN produces network predictions for regions across the entire image using anchor boxes, and the predictions are decoded to generate the final bounding boxes for the objects. Single-stage networks can be much faster than two-stage networks, but they may not reach the same level of accuracy, especially for scenes containing small objects.
Machine learning techniques are also commonly used for object detection, and they offer different approaches than deep learning. Common machine learning techniques include:
Similar to deep learning–based approaches, you can choose to start with a pretrained object detector or create a custom object detector to suit your application. You will need to manually select the identifying features for an object when using machine learning, compared with automatic feature selection in a deep learning–based workflow.
Determining the best approach for object detection depends on your application and the problem you’re trying to solve. The main consideration to keep in mind when choosing between machine learning and deep learning is whether you have a powerful GPU and lots of labeled training images. If the answer to either of these questions is no, a machine learning approach might be the better choice. Deep learning techniques tend to work better when you have more images, and GPUs decrease the time needed to train the model.
In addition to deep learning– and machine learning–based object detection, there are several other common techniques that may be sufficient depending on your application, such as:
With just a few lines of MATLAB® code, you can build machine learning and deep learning models for object detection without having to be an expert.
MATLAB provides interactive apps to both prepare training data and customize convolutional neural networks. Labeling the test images for object detectors is tedious, and it can take a significant amount of time to get enough training data to create a performant object detector. The Image Labeler app lets you interactively label objects within a collection of images and provides built-in algorithms to automatically label your ground-truth data. For automated driving applications, you can use the Ground Truth Labeler app, and for video processing workflows, you can use the Video Labeler app.
Customizing an existing CNN or creating one from scratch can be prone to architectural problems that can waste valuable training time. The Deep Network Designer app enables you to interactively build, edit, and visualize deep learning networks while also providing an analysis tool to check for architectural issues before training the network.
With MATLAB, you can interoperate with networks and network architectures from frameworks like TensorFlow™-Keras, PyTorch and Caffe2 using ONNX™ (Open Neural Network Exchange) import and export capabilities.
After creating your algorithms with MATLAB, you can leverage automated workflows to generate TensorRT or CUDA® code with GPU Coder™ to perform hardware-in-the-loop testing. The generated code can be integrated with existing projects and can be used to verify object detection algorithms on desktop GPUs or embedded GPUs such as the NVIDIA® Jetson or NVIDIA Drive platform.