Object detection and recognition are used to locate, identify, and categorize objects in images and video. Computer Vision System Toolbox provides a comprehensive suite of algorithms and tools for object detection and recognition.
You can detect or recognize an object in an image by training an object classifier using pattern recognition algorithms that create classifiers based on training data from different object classes. The classifier accepts image data and assigns the appropriate object or class label.
Computer Vision System Toolbox provides algorithms to train image classification and image retrieval systems using the bag-of-words model. The system toolbox also provides feature extraction techniques you can use to create custom classifiers using supervised classification algorithms from Statistics and Machine Learning Toolbox™.
Object detection is used to determine the location of an object in an image. The system toolbox provides several algorithms and techniques to detect objects, including pretrained detection models, as well as functions and an app to train object detectors.
Motion-based object detection algorithms use motion extraction and segmentation techniques such as optical flow and Gaussian mixture model (GMM) foreground detection to locate moving objects in a scene.
Feature points are used for object detection by detecting a set of features in a reference image, extracting feature descriptors, and matching features between the reference image and an input. This method of object detection can detect reference objects despite scale and orientation changes and is robust to partial occlusions.
Training is the process of creating an object detector or classifier to detect or recognize a specific object of interest. The training process utilizes:
With the Training Image Labeler app, you can select and assign regions of interest (ROI) and label training images.
You can use machine learning techniques from Statistics and Machine Learning Toolbox with Computer Vision System Toolbox to create object recognition systems.
Camera calibration is the estimation of a camera’s intrinsic, extrinsic, and lens-distortion parameters. Typical uses of a calibrated camera are correction of optical distortion artifacts, estimating distance of an object from a camera, measuring the size of objects in an image, and constructing 3-D views for augmented reality systems.
Computer Vision System Toolbox provides apps and functions to perform all essential tasks in the camera calibration workflow:
The Camera Calibrator app is used to select and filter calibration images, choose the number and type of radial distortion coefficients, view reprojection errors, visualize extrinsic parameters, and export camera calibration parameters.
Stereo vision is the process of extracting the 3-D structure of a scene from multiple 2-D views. Computer Vision System Toolbox provides functions and algorithms to complete the following steps in the stereo vision workflow:
Stereo calibration is the process of finding the intrinsic and extrinsic parameters of a pair of cameras, as well as the relative positions and orientations of the cameras. Stereo calibration is a precursor to calibrated stereo rectification and 3-D scene reconstruction. Computer Vision System Toolbox provides algorithms, functions, and an app to calibrate a pair of stereo cameras using a checkerboard calibration pattern.
Stereo image rectification transforms a pair of stereo images so that a corresponding point in one image can be found in the corresponding row in the other image. This process reduces the 2-D stereo correspondence problem to a 1-D problem, and it simplifies how to determine the depth of each point in the scene. Computer Vision System Toolbox provides functionality for stereo rectification that includes:
The relative depths of points in a scene are represented in a stereo disparity map which is calculated by matching corresponding points in a pair of rectified stereo images. The system toolbox provides algorithms for disparity calculation including:
You can reconstruct the 3-D structure of a scene by projecting the 2-D contents of a scene to three dimensions using the disparity map and information from stereo calibration.
3-D point clouds represent a set of data points in a 3-D coordinate system. Point clouds are usually obtained from sensors such as stereo cameras, LIDAR, 3-D scanners, or RGB-D sensors such as Microsoft® Kinect® for Windows®. 3-D point cloud processing algorithms are used for visual sensing and navigation in robots and autonomous vehicles.
Computer Vision System Toolbox provides a suite of functions and algorithms for I/O, manipulation, registration, filtering, and processing of 3-D point clouds.
Computer Vision System Toolbox provides functions to read and write points in the PLY format. You can also acquire 3-D point clouds live from Kinect for Windows using Image Acquisition Toolbox™.
Computer Vision System Toolbox provides functions to plot point clouds, visualize streaming 3-D point cloud data, and display the differences between two point clouds.
Point cloud registration and stitching are used to reconstruct a 3-D view of a scene or object from a collection of point clouds. You can use these techniques to develop 3-D models of objects for inspection and visualization, and to generate 3-D world maps for simultaneous localization and mapping (SLAM) applications.
Functions and algorithms required to complete the point cloud registration and stitching workflow include:
Computer Vision System Toolbox provides a suite of functions to fit geometric shapes to 3-D point clouds using RANSAC. These algorithms are used in robotics to locate objects in 3-D for grasping, ground plane segmentation, and navigation.
Computer vision often involves the tracking of moving objects in video. Computer Vision System Toolbox provides a comprehensive set of algorithms and functions for object tracking and motion estimation tasks.
The system toolbox provides video tracking algorithms including:
Computer Vision System Toolbox provides an extensible framework to track multiple objects in a video stream and includes the following to facilitate multiple object tracking:
Motion estimation is the process of determining the movement of blocks between adjacent video frames. The system toolbox provides a variety of motion estimation algorithms, such as optical flow, block matching, and template matching. These algorithms create motion vectors, which relate to the whole image, blocks, arbitrary patches, or individual pixels. For block and template matching, the evaluation metrics for finding the best match include MSE, MAD, MaxAD, SAD, and SSD.
Computer Vision System Toolbox provides a suite of feature detectors and descriptors. Additionally, the system toolbox offers functionality to match two sets of feature vectors and visualize the results.
When combined into a single workflow, feature detection, extraction, and matching can be used to solve many computer vision design challenges, such as image registration, stereo vision, object detection, and tracking.
A feature is an interesting part of an image, such as a corner, blob, edge, or line. Feature extraction enables you to derive a set of feature vectors, also called descriptors, from a set of detected features. Computer Vision System Toolbox offers capabilities for feature detection and extraction that include:
Feature matching is the comparison of two sets of feature descriptors obtained from different images to provide point correspondences between images. Computer Vision System Toolbox offers functionality for feature matching that includes:
Image registration is the transformation of images from different camera views to use a unified co-coordinate system. Computer Vision System Toolbox supports an automatic approach to image registration by using features. Typical uses include video mosaicking, video stabilization, and image fusion.
Feature detection, extraction, and matching are the first steps in the feature-based automatic image registration workflow. You can remove the outliers in the matched feature sets using RANSAC to compute the geometric transformation between images and then apply the geometric transformation to align the two images.
Most System objects™, functions, and blocks in Computer Vision System Toolbox can generate ANSI/ISO C code using MATLAB Coder™, Simulink Coder™, or Embedded Coder®. You can select optimizations for specific processor architectures and integrate legacy C code with the generated code to leverage existing intellectual property. You can generate C code for both floating-point and fixed-point data types.
Many real-time systems use hardware that requires a fixed-point representation of your algorithm. Computer Vision System Toolbox supports fixed-point modeling in most blocks and System objects, with dialog boxes and object properties that help you with configuration.
System toolbox support for fixed point includes:
Computer Vision System Toolbox includes image processing primitives that support fixed-point data types and C-code generation. These System objects and Simulink blocks include: