Local Feature Detection and Extraction

Local features and their descriptors, which are a compact vector representations of a local neighborhood, are the building blocks of many computer vision algorithms. Their applications include image registration, object detection and classification, tracking, and motion estimation. Using local features enables these algorithms to better handle scale changes, rotation, and occlusion. The Computer Vision Toolbox™ provides the FAST, Harris, ORB, and Shi & Tomasi methods for detecting corner features, and the SIFT, SURF, KAZE, and MSER methods for detecting blob features. The toolbox includes the SIFT, SURF, KAZE, FREAK, BRISK, ORB, and HOG descriptors. You can mix and match the detectors and the descriptors depending on the requirements of your application. For more details, see Point Feature Types.

What Are Local Features?

Local features refer to a pattern or distinct structure found in an image, such as a point, edge, or small image patch. They are usually associated with an image patch that differs from its immediate surroundings by texture, color, or intensity. What the feature actually represents does not matter, just that it is distinct from its surroundings. Examples of local features are blobs, corners, and edge pixels.

Example 1. Example of Corner Detection

I = imread("circuit.tif");
corners = detectFASTFeatures(I,MinContrast=0.1);
J = insertMarker(I,corners,"circle");
imshow(J)

Benefits and Applications of Local Features

Local features let you find image correspondences regardless of occlusion, changes in viewing conditions, or the presence of clutter. In addition, the properties of local features make them suitable for image classification, such as in Image Classification with Bag of Visual Words.

Local features are used in two fundamental ways:

To localize anchor points for use in image stitching or 3-D reconstruction.
To represent image contents compactly for detection or classification, without requiring image segmentation.

Application	MATLAB Examples
Image registration and stitching	Feature Based Panoramic Image Stitching
Object detection	Object Detection in a Cluttered Scene Using Point Feature Matching
Object recognition	Digit Classification Using HOG Features
Object tracking	Face Detection and Tracking Using the KLT Algorithm
Image category recognition	Image Category Classification Using Bag of Features
Finding geometry of a stereo system	Uncalibrated Stereo Image Rectification
3-D reconstruction	Structure from Motion from Two Views, Structure from Motion from Multiple Views
Image retrieval	Image Retrieval Using Customized Bag of Features

What Makes a Good Local Feature?

Detectors that rely on gradient-based and intensity variation approaches detect good local features. These features include edges, blobs, and regions. Good local features exhibit the following properties:

Repeatable detections:
When given two images of the same scene, most features that the detector finds in both images are the same. The features are robust to changes in viewing conditions and noise.
Distinctive:
The neighborhood around the feature center varies enough to allow for a reliable comparison between the features.
Localizable:
The feature has a unique location assigned to it. Changes in viewing conditions do not affect its location.

Feature Detection and Feature Extraction

Feature detection selects regions of an image that have unique content, such as corners or blobs. Use feature detection to find points of interest that you can use for further processing. These points do not necessarily correspond to physical structures, such as the corners of a table. The key to feature detection is to find features that remain locally invariant so that you can detect them even in the presence of rotation or scale change.

Feature extraction involves computing a descriptor, which is typically done on regions centered around detected features. Descriptors rely on image processing to transform a local pixel neighborhood into a compact vector representation. This new representation permits comparison between neighborhoods regardless of changes in scale or orientation. Descriptors, such as SIFT or SURF, rely on local gradient computations. Binary descriptors, such as BRISK, ORB or FREAK, rely on pairs of local intensity differences, which are then encoded into a binary vector.

Choose a Feature Detector and Descriptor

Select the best feature detector and descriptor by considering the criteria of your application and the nature of your data. The first table helps you understand the general criteria to drive your selection. The next two tables provide details on the detectors and descriptors available in Computer Vision Toolbox.

Considerations for Selecting a Detector and Descriptor

Criteria	Suggestion
Type of features in your image	Use a detector appropriate for your data. For example, if your image contains an image of bacteria cells, use the blob detector rather than the corner detector. If your image is an aerial view of a city, you can use the corner detector to find man-made structures.
Context in which you are using the features: Matching key points Classification	The HOG, SURF, and KAZE descriptors are suitable for classification tasks. In contrast, binary descriptors, such as ORB, BRISK and FREAK, are typically used for finding point correspondences between images, which are used for registration.
Type of distortion present in your image	Choose a detector and descriptor that addresses the distortion in your data. For example, if there is no scale change present, consider a corner detector that does not handle scale. If your data contains a higher level of distortion, such as scale and rotation, then use SIFT, SURF, ORB, or KAZE feature detector and descriptor. The SURF and the KAZE methods are computationally intensive.
Performance requirements: Real-time performance required Accuracy versus speed	Binary descriptors are generally faster but less accurate than gradient-based descriptors. For greater accuracy, use several detectors and descriptors at the same time.

Choose a Detection Function Based on Feature Type

Detector	Feature Type	Function	Scale Independent
FAST [1]	Corner	`detectFASTFeatures`	No
Minimum eigenvalue algorithm [4]	Corner	`detectMinEigenFeatures`	No
Corner detector [3]	Corner	`detectHarrisFeatures`	No
SIFT [14]	Blob	`detectSIFTFeatures`	Yes
SURF [11]	Blob	`detectSURFFeatures`	Yes
KAZE [12]	Blob	`detectKAZEFeatures`	Yes
BRISK [6]	Corner	`detectBRISKFeatures`	Yes
MSER [8]	Region with uniform intensity	`detectMSERFeatures`	Yes
ORB [13]	Corner	`detectORBFeatures`	No

Note

Detection functions return objects that contain information about the features. The extractHOGFeatures and extractFeatures functions use these objects to create descriptors.

Choose a Descriptor Method

Descriptor	Binary	Function and Method	Invariance		Typical Use
Descriptor	Binary	Function and Method	Scale	Rotation	Finding Point Correspondences	Classification
HOG	No	`extractHOGFeatures`(`I`, ...)	No	No	No	Yes
LBP	No	`extractLBPFeatures`(`I`, ...)	No	Yes	No	Yes
SIFT	No	`extractFeatures`(`I`,`points`,`Method`=`"SIFT"`)	Yes	Yes	Yes	Yes
SURF	No	`extractFeatures`(`I`,`points`,`Method`=`"SURF"`)	Yes	Yes	Yes	Yes
KAZE	No	`extractFeatures`(`I`,`points`,`Method`=`"KAZE"`)	Yes	Yes	Yes	Yes
FREAK	Yes	`extractFeatures`(`I`,`points`,'`Method`=`"FREAK"`)	Yes	Yes	Yes	No
BRISK	Yes	`extractFeatures`(`I`,`points`,'`Method`=`"BRISK"`)	Yes	Yes	Yes	No
ORB	Yes	`extractFeatures`(`I`,`points`,'`Method`=`"ORB"`)	No	Yes	Yes	No
Block Simple pixel neighborhood around a keypoint	No	`extractFeatures`(`I`,`points`,'`Method`=`"Block"`)	No	No	Yes	Yes

Note

The extractFeatures function provides different extraction methods to best match the requirements of your application. When you do not specify the 'Method' input for the extractFeatures function, the function automatically selects the method based on the type of input point class.
Binary descriptors are fast but less precise in terms of localization. They are not suitable for classification tasks. The extractFeatures function returns a binaryFeatures object. This object enables the Hamming-distance-based matching metric used in the matchFeatures function.

Use Local Features

Open Live Script

Registering two images is a simple way to understand local features. This example finds a geometric transformation between two images. It uses local features to find well-localized anchor points.

Display two images

The first image is the original image.

original = imread('cameraman.tif');
figure
imshow(original);

The second image is the original image rotated and scaled.

scale = 1.3;
J = imresize(original,scale);
theta = 31;
distorted = imrotate(J,theta);
figure
imshow(distorted)

Detect matching features between the original and distorted image

Detecting the matching SURF features is the first step in determining the transform needed to correct the distorted image.

ptsOriginal = detectSURFFeatures(original);
ptsDistorted = detectSURFFeatures(distorted);

Extract features and compare the detected blobs between the two images

The detection step found several roughly corresponding blob structures in both images. Compare the detected blob features. This process is facilitated by feature extraction, which determines a local patch descriptor.

[featuresOriginal,validPtsOriginal] = ...
            extractFeatures(original,ptsOriginal);
[featuresDistorted,validPtsDistorted] = ...
            extractFeatures(distorted,ptsDistorted);

It is possible that not all of the original points were used to extract descriptors. Points might have been rejected if they were too close to the image border. Therefore, the valid points are returned in addition to the feature descriptors.

The patch size used to compute the descriptors is determined during the feature extraction step. The patch size corresponds to the scale at which the feature is detected. Regardless of the patch size, the two feature vectors, featuresOriginal and featuresDistorted, are computed in such a way that they are of equal length. The descriptors enable you to compare detected features, regardless of their size and rotation.

Find candidate matches

Obtain candidate matches between the features by inputting the descriptors to the matchFeatures function. Candidate matches imply that the results can contain some invalid matches. Two patches that match can indicate like features but might not be a correct match. A table corner can look like a chair corner, but the two features are obviously not a match.

indexPairs = matchFeatures(featuresOriginal,featuresDistorted);

Find point locations from both images

Each row of the returned indexPairs contains two indices of candidate feature matches between the images. Use the indices to collect the actual point locations from both images.

matchedOriginal = validPtsOriginal(indexPairs(:,1));
matchedDistorted = validPtsDistorted(indexPairs(:,2));

Display the candidate matches

figure
showMatchedFeatures(original,distorted,matchedOriginal,matchedDistorted)
title('Candidate matched points (including outliers)')

Analyze the feature locations

If there are a sufficient number of valid matches, remove the false matches. An effective technique for this scenario is the RANSAC algorithm. The estgeotform2d function implements M-estimator sample consensus (MSAC), which is a variant of the RANSAC algorithm. MSAC finds a geometric transform and separates the inliers (correct matches) from the outliers (spurious matches).

[tform,inlierIdx] = estgeotform2d(matchedDistorted, ...
    matchedOriginal,'similarity');
inlierDistorted = matchedDistorted(inlierIdx,:);
inlierOriginal = matchedOriginal(inlierIdx,:);

Display the matching points

figure
showMatchedFeatures(original,distorted,inlierOriginal,inlierDistorted)
title('Matching points (inliers only)')
legend('ptsOriginal','ptsDistorted')

Verify the computed geometric transform

Apply the computed geometric transform to the distorted image.

outputView = imref2d(size(original));
recovered = imwarp(distorted,tform,OutputView=outputView);

Display the recovered image and the original image.

figure
imshowpair(original,recovered,'montage')

Image Registration Using Multiple Features

Open Live Script

This example builds on the results of the "Use Local Features" example. Using more than one detector and descriptor pair enables you to combine and reinforce your results. Multiple pairs are also useful for when you cannot obtain enough good matches (inliers) using a single feature detector.

Load the original image.

original = imread('cameraman.tif');
figure
imshow(original);
text(size(original,2),size(original,1)+15, ...
    'Image courtesy of Massachusetts Institute of Technology', ...
    FontSize=7,HorizontalAlignment='right');

Scale and rotate the original image to create the distorted image.

scale = 1.3;
J = imresize(original,scale);
 
theta = 31;
distorted = imrotate(J,theta);
figure
imshow(distorted)

Detect the features in both images. Use the BRISK detectors first, followed by the SURF detectors.

ptsOriginalBRISK = detectBRISKFeatures(original,MinContrast=0.01);
ptsDistortedBRISK = detectBRISKFeatures(distorted,MinContrast=0.01);

ptsOriginalSURF = detectSURFFeatures(original);
ptsDistortedSURF = detectSURFFeatures(distorted);

Extract descriptors from the original and distorted images. The BRISK features use the FREAK descriptor by default.

[featuresOriginalFREAK,validPtsOriginalBRISK] = ...
    extractFeatures(original,ptsOriginalBRISK);
[featuresDistortedFREAK,validPtsDistortedBRISK] = ...
        extractFeatures(distorted,ptsDistortedBRISK);

[featuresOriginalSURF,validPtsOriginalSURF]  = ...
        extractFeatures(original,ptsOriginalSURF);
[featuresDistortedSURF,validPtsDistortedSURF] = ...
        extractFeatures(distorted,ptsDistortedSURF);

Determine candidate matches by matching FREAK descriptors first, and then SURF descriptors. To obtain as many feature matches as possible, start with detector and matching thresholds that are lower than the default values. Once you get a working solution, you can gradually increase the thresholds to reduce the computational load required to extract and match features.

indexPairsBRISK = matchFeatures(featuresOriginalFREAK,...
            featuresDistortedFREAK,MatchThreshold=40,MaxRatio=0.8);

indexPairsSURF = matchFeatures(featuresOriginalSURF,featuresDistortedSURF);

Obtain candidate matched points for BRISK and SURF.

matchedOriginalBRISK = validPtsOriginalBRISK(indexPairsBRISK(:,1));
matchedDistortedBRISK = validPtsDistortedBRISK(indexPairsBRISK(:,2));

matchedOriginalSURF = validPtsOriginalSURF(indexPairsSURF(:,1));
matchedDistortedSURF = validPtsDistortedSURF(indexPairsSURF(:,2));

Visualize the BRISK putative matches.

figure
showMatchedFeatures(original,distorted,matchedOriginalBRISK,...
            matchedDistortedBRISK)
title('Putative matches using BRISK & FREAK')
legend('ptsOriginalBRISK','ptsDistortedBRISK')

Combine the candidate matched BRISK and SURF local features. Use the Location property to combine the point locations from BRISK and SURF features.

matchedOriginalXY  = ...
    [matchedOriginalSURF.Location; matchedOriginalBRISK.Location];
matchedDistortedXY = ...
    [matchedDistortedSURF.Location; matchedDistortedBRISK.Location];

Determine the inlier points and the geometric transform of the BRISK and SURF features.

[tformTotal,inlierIdx] = estgeotform2d(matchedDistortedXY,...
        matchedOriginalXY,'similarity');
inlierDistortedXY = matchedDistortedXY(inlierIdx, :);
inlierOriginalXY = matchedOriginalXY(inlierIdx, :);

Display the results. The result provides several more matches than the example that used a single feature detector.

figure
showMatchedFeatures(original,distorted,inlierOriginalXY,inlierDistortedXY)
title('Matching points using SURF and BRISK (inliers only)')
legend('ptsOriginal','ptsDistorted')

Compare the original and recovered image.

outputView = imref2d(size(original));
recovered = imwarp(distorted,tformTotal,OutputView=outputView);
 
figure
imshowpair(original,recovered,'montage')

References

[1] Rosten, E., and T. Drummond. “Machine Learning for High-Speed Corner Detection.” 9th European Conference on Computer Vision. Vol. 1, 2006, pp. 430–443.

[2] Mikolajczyk, K., and C. Schmid. “A performance evaluation of local descriptors.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 27, Issue 10, 2005, pp. 1615–1630.

[3] Harris, C., and M. J. Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey Vision Conference. August 1988, pp. 147–152.

[4] Shi, J., and C. Tomasi. “Good Features to Track.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. June 1994, pp. 593–600.

[5] Tuytelaars, T., and K. Mikolajczyk. “Local Invariant Feature Detectors: A Survey.” Foundations and Trends in Computer Graphics and Vision. Vol. 3, Issue 3, 2007, pp. 177–280.

[6] Leutenegger, S., M. Chli, and R. Siegwart. “BRISK: Binary Robust Invariant Scalable Keypoints.” Proceedings of the IEEE International Conference. ICCV, 2011.

[7] Nister, D., and H. Stewenius. "Linear Time Maximally Stable Extremal Regions." 10th European Conference on Computer Vision. Marseille, France: 2008, No. 5303, pp. 183–196.

[8] Matas, J., O. Chum, M. Urba, and T. Pajdla. "Robust wide-baseline stereo from maximally stable extremal regions."Proceedings of British Machine Vision Conference. 2002, pp. 384–396.

[9] Obdrzalek D., S. Basovnik, L. Mach, and A. Mikulik. "Detecting Scene Elements Using Maximally Stable Colour Regions."Communications in Computer and Information Science. La Ferte-Bernard, France: 2009, Vol. 82 CCIS (2010 12 01), pp. 107–115.

[10] Mikolajczyk, K., T. Tuytelaars, C. Schmid, A. Zisserman, T. Kadir, and L. Van Gool. "A Comparison of Affine Region Detectors. "International Journal of Computer Vision. Vol. 65, No. 1–2, November 2005, pp. 43–72.

[11] Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. “SURF: Speeded Up Robust Features.” Computer Vision and Image Understanding (CVIU). Vol. 110, No. 3, 2008, pp. 346–359.

[12] Alcantarilla, P.F., A. Bartoli, and A.J. Davison. "KAZE Features", ECCV 2012, Part VI, LNCS 7577 pp. 214, 2012

[13] Rublee, E., V. Rabaud, K. Konolige and G. Bradski. "ORB: An efficient alternative to SIFT or SURF." In Proceedings of the 2011 International Conference on Computer Vision, 2564–2571. Barcelona, Spain, 2011.

[14] Lowe, David G.. "Distinctive Image Features from Scale-Invariant Keypoints." Int. J. Comput. Vision 60 , no. 2 (2004): 91--110.