Accelerating the pace of engineering and science

Computer Vision System Toolbox

Key Features

  • Object detection and tracking, including the Viola-Jones, Kanade-Lucas-Tomasi (KLT), and Kalman filtering methods
  • Training of object detection, object recognition, and image retrieval systems, including cascade object detection and bag-of-features methods
  • Camera calibration for single and stereo cameras, including automatic checkerboard detection and an app for workflow automation
  • Stereo vision, including rectification, disparity calculation, and 3-D reconstruction
  • 3-D point cloud processing, including I/O, visualization, registration, denoising, and geometric shape fitting
  • Feature detection, extraction, and matching
  • Support for C-code generation and fixed-point arithmetic with code generation products

Object Detection and Recognition

Object detection and recognition are used to locate, identify, and categorize objects in images and video. Computer Vision System Toolbox provides a comprehensive suite of algorithms and tools for object detection and recognition.

Object Classification

You can detect or recognize an object in an image by training an object classifier using pattern recognition algorithms that create classifiers based on training data from different object classes. The classifier accepts image data and assigns the appropriate object or class label.

Computer Vision System Toolbox provides algorithms to train image classification and image retrieval systems using the bag-of-words model. The system toolbox also provides feature extraction techniques you can use to create custom classifiers using supervised classification algorithms from Statistics and Machine Learning Toolbox™.

Image Classification

Image Classification
Image category classification using bag-of-visual words.

Image Retrieval

Image Retrieval
Searching image set for similar image.

Object Recognition and Tracking for Augmented Reality
Use object recognition and tracking to create an augmented reality application with a webcam in MATLAB®. Recognize an image in a scene, track its position, and augment the display by playing a video in the image’s place.

Text detection and optical character recognition (OCR)

Text Detection and Optical Character Recognition (OCR) (Example)
Recognizing text in natural images

Digit classification

Digit Classification (Example)
Classifying digits using support vector machines (SVM) and histogram of oriented gradient (HOG) feature extraction.

Object Detection

Object detection is used to determine the location of an object in an image. The system toolbox provides several algorithms and techniques to detect objects, including pretrained detection models, as well as functions and an app to train object detectors.

Face Detection Using Viola-Jones Algorithm

Face Detection Using Viola-Jones Algorithm
Using a cascade of classifiers to detect faces.

People Detector

People Detector
Detecting people using pretrained SVM with HOG features.

Motion-Based Object Detection

Motion-based object detection algorithms use motion extraction and segmentation techniques such as optical flow and Gaussian mixture model (GMM) foreground detection to locate moving objects in a scene.

Detect moving objects with optical flow

Detect Moving Objects with Optical Flow
Detect and track cars using optical flow estimation.

Detect Cars Using Gaussian Mixture Models

Detect Cars Using Gaussian Mixture Models
Detect and count cars using foreground detector.

Feature-Based Object Detection

Feature points are used for object detection by detecting a set of features in a reference image, extracting feature descriptors, and matching features between the reference image and an input. This method of object detection can detect reference objects despite scale and orientation changes and is robust to partial occlusions.

Matching features between two images.
Reference image of object (left), input image (right); the yellow lines indicate the corresponding matched features between the two images.

Training Object Detectors and Classifiers

Training is the process of creating an object detector or classifier to detect or recognize a specific object of interest. The training process utilizes:

  • Positive images of the object of interest at different scales and orientations
  • Negative images of backgrounds typically associated with the object of interest
  • Non-objects similar in appearance to the object of interest

With the Training Image Labeler app, you can select and assign regions of interest (ROI) and label training images.

Training image labeler app to select regions of interest (ROIs) in positive training images
Training Image Labeler app to select regions of interest (ROIs) in positive training images.

The system toolbox provides functions to train a Viola-Jones object detector to locate any object of interest. An app to train a detector is available on File Exchange.

Process of training a cascade object detector.
Process of training a cascade object detector.

You can use machine learning techniques from Statistics and Machine Learning Toolbox with Computer Vision System Toolbox to create object recognition systems.

Plot showing natural patterns in gene expression profiles obtained from baker's yeast.
Classification Learner app from Statistics and Machine Learning Toolbox, for performing common tasks such as interactively exploring data, selecting features, specifying cross-validation schemes, training models, and assessing results.

Camera Calibration

Camera calibration is the estimation of a camera’s intrinsic, extrinsic, and lens-distortion parameters. Typical uses of a calibrated camera are correction of optical distortion artifacts, estimating distance of an object from a camera, measuring the size of objects in an image, and constructing 3-D views for augmented reality systems.

Camera Calibration with MATLAB
Explore camera calibration capabilities in MATLAB®. Calibrate a camera using the camera calibrator app, perform image undistortion, and measure the actual size of an object using a calibrated camera.

Stereo Camera Calibration

Stereo Camera Calibration
Using the Stereo Camera Calibrator app.

Measuring the Size of Planar Objects

Measuring the Size of Planar Objects (Example)
Diameter of various objects measured using a single calibrated camera.

Structure from Motion

Structure from Motion
Use a single calibrated camera to generate a 3-D view of a scene.

Camera Calibrator App

Computer Vision System Toolbox provides apps and functions to perform all essential tasks in the camera calibration workflow:

  • Fully automatic detection and location of checkerboard calibration pattern including corner detection with subpixel accuracy
  • Estimation of all intrinsic and extrinsic parameters including axis skew
  • Calculation of radial and tangential lens distortion coefficients
  • Correction of optical distortion
  • Support for single camera and stereo calibration

The Camera Calibrator app is used to select and filter calibration images, choose the number and type of radial distortion coefficients, view reprojection errors, visualize extrinsic parameters, and export camera calibration parameters.

The Camera Calibration app in Computer Vision System Toolbox.
Camera Calibrator app. You can add or remove calibration images (left), view detected corners and reprojected points (center), plot reprojection errors (top right), and visualize extrinsic parameters (bottom right).

Stereo Vision

Stereo vision is the process of extracting the 3-D structure of a scene from multiple 2-D views. Computer Vision System Toolbox provides functions and algorithms to complete the following steps in the stereo vision workflow:

  • Stereo calibration
  • Stereo image rectification
  • Disparity map computation
  • 3-D scene reconstruction

Stereo Calibration

Stereo calibration is the process of finding the intrinsic and extrinsic parameters of a pair of cameras, as well as the relative positions and orientations of the cameras. Stereo calibration is a precursor to calibrated stereo rectification and 3-D scene reconstruction. Computer Vision System Toolbox provides algorithms, functions, and an app to calibrate a pair of stereo cameras using a checkerboard calibration pattern.

Stereo Camera Calibrator app.
Stereo Camera Calibrator app. You can add or remove pairs of calibration images (left), view rectified images (center), plot reprojection errors (bottom left), and visualize extrinsic parameters (bottom right).

Stereo Image Rectification

Stereo image rectification transforms a pair of stereo images so that a corresponding point in one image can be found in the corresponding row in the other image. This process reduces the 2-D stereo correspondence problem to a 1-D problem, and it simplifies how to determine the depth of each point in the scene. Computer Vision System Toolbox provides functionality for stereo rectification that includes:

  • Uncalibrated stereo rectification using feature matching and RANSAC to estimate the projective transform between cameras
  • Calibrated stereo rectification using stereo calibration to compute the fundamental matrix
Uncalibrated Stereo Image Rectification

Uncalibrated Stereo Image Rectification (Example)
Results from uncalibrated stereo image rectification. Non-overlapping areas are shown in red and cyan.

Calibrated Stereo Rectification

Calibrated Stereo Rectification (Example)
Rectify a pair of stereo images.

Disparity Computation and 3-D Scene Reconstruction

The relative depths of points in a scene are represented in a stereo disparity map which is calculated by matching corresponding points in a pair of rectified stereo images. The system toolbox provides algorithms for disparity calculation including:

  • Semi-global matching
  • Block matching
Stereo disparity map representing the relative depths of points in a scene
Stereo disparity map (right) representing the relative depths of points in a scene (left).

You can reconstruct the 3-D structure of a scene by projecting the 2-D contents of a scene to three dimensions using the disparity map and information from stereo calibration.

3-D Scene Reconstruction from Stereo Video

3-D Scene Reconstruction from Stereo Video (Example)
Reconstruct a scene in 3-D using a pair of stereo images.

Measure Distance to Objects

Measure Distance to Objects (Example)
Use triangulation to measure the distance to detected objects.

3-D Point Cloud Processing

3-D point clouds represent a set of data points in a 3-D coordinate system. Point clouds are usually obtained from sensors such as stereo cameras, LIDAR, 3-D scanners, or RGB-D sensors such as Microsoft® Kinect® for Windows®. 3-D point cloud processing algorithms are used for visual sensing and navigation in robots and autonomous vehicles.

Computer Vision System Toolbox provides a suite of functions and algorithms for I/O, manipulation, registration, filtering, and processing of 3-D point clouds.

Point Cloud I/O and Visualization

Computer Vision System Toolbox provides functions to read and write points in the PLY format. You can also acquire 3-D point clouds live from Kinect for Windows using Image Acquisition Toolbox™.

Live Acquisition of Point Clouds

Live Acquisition of Point Clouds
Acquire a 3-D point cloud using Kinect for Windows.

Computer Vision System Toolbox provides functions to plot point clouds, visualize streaming 3-D point cloud data, and display the differences between two point clouds.

Visualize Streaming Point Cloud Data

Visualize Streaming Point Cloud Data 
Display a rotating 3-D point cloud.

Plotting 3-D Point Clouds

Plotting 3-D Point Clouds
Plot a point cloud with texture mapping.

Visualize the Difference Between Point Clouds

Visualize the Difference Between Point Clouds (Example)
Visualize the difference between two point clouds in a sequence.

3-D Point Cloud Registration and Stitching

Point cloud registration and stitching are used to reconstruct a 3-D view of a scene or object from a collection of point clouds. You can use these techniques to develop 3-D models of objects for inspection and visualization, and to generate 3-D world maps for simultaneous localization and mapping (SLAM) applications.

Functions and algorithms required to complete the point cloud registration and stitching workflow include:

Register and Stitch a 3-D Point Cloud

Register and Stitch a 3-D Point Cloud (Example)
Create a 3-D world model from a sequence of point clouds.

Fitting Geometrical Shapes to Point Clouds

Computer Vision System Toolbox provides a suite of functions to fit geometric shapes to 3-D point clouds using RANSAC. These algorithms are used in robotics to locate objects in 3-D for grasping, ground plane segmentation, and navigation.

The system toolbox provides functions to fit spheres, planes, and cylinders to 3-D point clouds.

Figure comparing geometric planes, spheres and cylinders to 3-D point cloud scene
Geometric models of planes, spheres, and cylinders fit (left) to 3-D point cloud of scene (right). ALT Text: Geometric models of planes, spheres, and cylinders.

Object Tracking and Motion Estimation

Computer vision often involves the tracking of moving objects in video. Computer Vision System Toolbox provides a comprehensive set of algorithms and functions for object tracking and motion estimation tasks.

Object Tracking

The system toolbox provides video tracking algorithms including:

  • Continuously adaptive mean shift (CAMShift)
  • Kanade-Lucas-Tomasi (KLT)
  • Kalman filtering
Face Detection and Tracking with Live Video Acquisition

Face Detection and Tracking with Live Video Acquisition (Example)
Automatically detect and track a face in a live video stream, using the KLT algorithm.

Introduction to Kalman Filters for Object Tracking
Discover how to use configureKalmanFilter and vision.KalmanFilter to track a moving object in video. Learn how to handle the challenges of inaccurate or missing object detection while keeping track of its location in video.

Tracking Pedestrians from a Moving Car

Tracking Pedestrians from a Moving Car (Example)
How to track pedestrians using a camera mounted in a moving car.

Object Recognition and Tracking for Augmented Reality
Use object recognition and tracking to create an augmented reality application with a webcam in MATLAB®. Recognize an image in a scene, track its position, and augment the display by playing a video in the image’s place.

Detect and Track Multiple Faces

Detect and Track Multiple Faces (File Exchange)
A simple system for detecting and tracking multiple faces from live video.

Multiple Object Tracking Framework

Computer Vision System Toolbox provides an extensible framework to track multiple objects in a video stream and includes the following to facilitate multiple object tracking:

  • Kalman filtering to predict a physical object's future location, reduce noise in the detected location, and help associate multiple objects with their corresponding tracks
  • Hungarian algorithm to assign object detections to tracks
  • Moving object detection using blob analysis and foreground detection
  • Annotation capabilities to visualize object location and add object labels
Multiple objects tracked using the Computer Vision System Toolbox multiple object tracking framework
Multiple objects tracked using the system toolbox’s multiple object tracking framework. The trails indicate trajectories of tracked objects.

Motion Estimation

Motion estimation is the process of determining the movement of blocks between adjacent video frames. The system toolbox provides a variety of motion estimation algorithms, such as optical flow, block matching, and template matching. These algorithms create motion vectors, which relate to the whole image, blocks, arbitrary patches, or individual pixels. For block and template matching, the evaluation metrics for finding the best match include MSE, MAD, MaxAD, SAD, and SSD.

Detecting moving objects using a stationary camera. Optical flow is calculated and detected motion is shown by overlaying the flow field on top of each frame.
Detecting moving objects using a stationary camera. Optical flow is calculated and detected motion is shown by overlaying the flow field on top of the current frame.

Feature Detection, Extraction, and Matching

Computer Vision System Toolbox provides a suite of feature detectors and descriptors. Additionally, the system toolbox offers functionality to match two sets of feature vectors and visualize the results.

When combined into a single workflow, feature detection, extraction, and matching can be used to solve many computer vision design challenges, such as image registration, stereo vision, object detection, and tracking.

Feature Detection and Extraction

A feature is an interesting part of an image, such as a corner, blob, edge, or line. Feature extraction enables you to derive a set of feature vectors, also called descriptors, from a set of detected features. Computer Vision System Toolbox offers capabilities for feature detection and extraction that include:

  • Corner detection, including Shi & Tomasi, Harris, and FAST methods
  • BRISK, MSER, and SURF detection for blobs and regions
  • Extraction of BRISK, FREAK, SURF, and simple pixel neighborhood descriptors
  • Histogram of oriented gradients (HOG) and local binary pattern (LBP) feature extraction
  • Visualization of feature location, scale, and orientation
SURF, MSER, and corner detection with Computer Vision System Toolbox.
SURF (left), MSER (center), and corner detection (right) with Computer Vision System Toolbox. Using the same image, three different feature types are detected and results plotted over the original image.
Histogram of Oriented Gradients (HOG) feature extraction of image
Histogram of oriented gradients (HOG) feature extraction of image (top). Feature vectors of different sizes are created to represent the image by varying cell size.

Feature Matching

Feature matching is the comparison of two sets of feature descriptors obtained from different images to provide point correspondences between images. Computer Vision System Toolbox offers functionality for feature matching that includes:

  • Configurable matching metrics, including SAD, SSD, and normalized cross-correlation
  • Hamming distance for binary features
  • Matching methods including nearest neighbor ratio, nearest neighbor, and threshold
  • Multicore support for faster execution on large feature sets
Detected features indicated by red circles and green crosses.
Detected features indicated by red circles (left) and green crosses (right). The yellow lines indicate the corresponding matched features between the two images.

Statistically robust methods such as RANSAC can be used to filter outliers in matched feature sets while estimating the geometric transformation or fundamental matrix. This is useful when using feature matching for image registration, object detection, or stereo vision applications.

Feature-Based Image Registration

Image registration is the transformation of images from different camera views to use a unified co-coordinate system. Computer Vision System Toolbox supports an automatic approach to image registration by using features. Typical uses include video mosaicking, video stabilization, and image fusion.

Feature detection, extraction, and matching are the first steps in the feature-based automatic image registration workflow. You can remove the outliers in the matched feature sets using RANSAC to compute the geometric transformation between images and then apply the geometric transformation to align the two images.

Feature-based registration, as used for video stabilization.
Feature-based registration, used for video stabilization. The system toolbox detects interest points in two sequential video frames using corner features (top); the putative matches are determined with numerous outliers (bottom left), and outliers are removed using the RANSAC method (bottom right).

Code Generation and Fixed Point

Computer Vision System Toolbox supports the creation of system-level test benches, fixed-point modeling, and code generation within MATLAB and Simulink. This support lets you integrate algorithm development with rapid prototyping, implementation, and verification workflows.

Code Generation Support

Most System objects™, functions, and blocks in Computer Vision System Toolbox can generate ANSI/ISO C code using MATLAB Coder™, Simulink Coder™, or Embedded Coder®. You can select optimizations for specific processor architectures and integrate legacy C code with the generated code to leverage existing intellectual property. You can generate C code for both floating-point and fixed-point data types.

Introduction to Code Generation with Feature Matching and Registration

Introduction to Code Generation with Feature Matching and Registration (Example)
How to use MATLAB Coder to generate C code for a MATLAB file.

Code Generation for Face Tracking with PackNGo

Code Generation for Face Tracking with PackNGo (Example)
How to generate code with packNGo function.

Code Generation for Depth Estimation from Stereo Video

Code Generation for Depth Estimation from Stereo Video (Example)
How to use MATLAB Coder to generate C code for a MATLAB function.

Fixed-Point Modeling

Many real-time systems use hardware that requires a fixed-point representation of your algorithm. Computer Vision System Toolbox supports fixed-point modeling in most blocks and System objects, with dialog boxes and object properties that help you with configuration.

System toolbox support for fixed point includes:

  • Word sizes from 1 to 128 bits
  • Arbitrary binary-point placement
  • Overflow handling methods (wrap or saturation)
  • Rounding methods, including ceiling, convergent, floor, nearest, round, simplest, and zero
Simulink model designed to create code for a specific hardware target.
Simulink model designed to create code for a specific hardware target. This model generates C code for a video stabilization system and embeds the algorithm into a digital signal processor (DSP).

Image Processing Primitives

Computer Vision System Toolbox includes image processing primitives that support fixed-point data types and C-code generation. These System objects and Simulink blocks include:

  • 2-D spatial and frequency filtering
  • Image preprocessing and postprocessing algorithms
  • Morphological operators
  • Geometric transformations
  • Color space conversions

Design a Pick and Place Robotics Application with MATLAB and...

View webinar