Quantcast

Computer Vision System Toolbox

Digit Classification Using HOG Features

This example shows how to classify digits using HOG features and an SVM classifier.

Object classification is an important task in many computer vision applications, including surveillance, automotive safety, and image retrieval. For example, in an automotive safety application, you may need to classify nearby objects as pedestrians or vehicles. Regardless of the type of object being classified, the basic procedure for creating an object classifier is:

  • Acquire a labeled data set with images of the desired object.

  • Partition the data set into a training set and a test set.

  • Train the classifier using features extracted from the training set.

  • Test the classifier using features extracted from the test set.

To illustrate, this example shows how to classify numerical digits using HOG (Histogram of Oriented Gradient) features [1] and an SVM (Support Vector Machine) classifier. This type of classification is often used in many Optical Character Recognition (OCR) applications.

The example uses the svmtrain and svmclassify functions from the Statistics Toolbox™ and the extractHOGFeatures function from the Computer Vision System Toolbox™.

function HOGDigitClassificationExample

Digit Data Set

For training, synthetic images are created using the insertText function from the Computer Vision System Toolbox™. The training images each contain a digit surrounded by other digits, which mimics how digits are normally seen together. Using synthetic images is convenient and it enables the creation of a variety of training samples without having to manually collect them. For testing, scans of handwritten digits are used to validate how well the classifier performs on data that is different than the synthetic training data. Although this is not the most representative data set, there is enough data to train and test a classifier, and show the feasibility of the approach.

% Load training and test data
load('digitDataSet.mat', 'trainingImages', 'trainingLabels', 'testImages');

% Update file name relative to matlabroot
dataSetDir     = fullfile(matlabroot,'toolbox','vision','visiondemos');
trainingImages = fullfile(dataSetDir, trainingImages);
testImages     = fullfile(dataSetDir, testImages);

trainingImages is a 200-by-10 cell array of training image file names; each column contains both the positive and negative training images for a digit. trainingLabels is a 200-by-10 matrix containing a label for each image in the trainingImage cell array. The labels are logical values indicating whether or not the image is a positive instance or a negative instance for a digit. testImages is a 12-by-10 cell array containing the image file names of the handwritten digit images. There are 12 examples per digit.

% Show training and test samples
figure;
subplot(2,3,1); imshow(trainingImages{3,2});
subplot(2,3,2); imshow(trainingImages{23,4});
subplot(2,3,3); imshow(trainingImages{4,9});

subplot(2,3,4); imshow(testImages{2,2});
subplot(2,3,5); imshow(testImages{5,4});
subplot(2,3,6); imshow(testImages{8,9});

Note that prior to training and testing a classifier the following pre-processing step is applied to images from this dataset:

    function J = preProcess(I)
        lvl = graythresh(I);
        J   = im2bw(I,lvl);
    end

This pre-processing step removes noise artifacts introduced while collecting the image samples and helps provide better feature vectors for training the classifier. For example, the output of this pre-processing step on a couple of training and test images is shown next:

exTestImage  = imread(testImages{5,4});
exTrainImage = imread(trainingImages{23,4});

figure;
subplot(2,2,1); imshow(exTrainImage);
subplot(2,2,2); imshow(preProcess(exTrainImage));
subplot(2,2,3); imshow(exTestImage);
subplot(2,2,4); imshow(preProcess(exTestImage));

Using HOG Features

The data used to train the SVM classifier are HOG feature vectors extracted from the training images. Therefore, it is important to make sure the HOG feature vector encodes the right amount of information about the object. The extractHOGFeatures function returns a visualization output that can help form some intuition about just what the "right amount of information" means. By varying the HOG cell size parameter and visualizing the result, you can see the effect the cell size parameter has on the amount of shape information encoded in the feature vector:

img = imread(trainingImages{4,3});

% Extract HOG features and HOG visualization
[hog_2x2, vis2x2] = extractHOGFeatures(img,'CellSize',[2 2]);
[hog_4x4, vis4x4] = extractHOGFeatures(img,'CellSize',[4 4]);
[hog_8x8, vis8x8] = extractHOGFeatures(img,'CellSize',[8 8]);

% Show the original image
figure;
subplot(2,3,1:3); imshow(img);

% Visualize the HOG features
subplot(2,3,4);
plot(vis2x2);
title({'CellSize = [2 2]'; ['Feature length = ' num2str(length(hog_2x2))]});

subplot(2,3,5);
plot(vis4x4);
title({'CellSize = [4 4]'; ['Feature length = ' num2str(length(hog_4x4))]});

subplot(2,3,6);
plot(vis8x8);
title({'CellSize = [8 8]'; ['Feature length = ' num2str(length(hog_8x8))]});

The visualization shows that a cell size of [8 8] does not encode much shape information, while a cell size of [2 2] encodes a lot of shape information but increases the dimensionality of the HOG feature vector significantly. A good compromise is a 4-by-4 cell size. This size setting encodes enough spatial information to visually identify a digit shape while limiting the number of dimensions in the HOG feature vector, which helps speed up training. In practice, the HOG parameters should be varied with repeated classifier training and testing to identify the optimal parameter settings.

cellSize = [4 4];
hogFeatureSize = length(hog_4x4);

Train the Classifier

Digit classification is a multi-class classification problem, where you have to classify an object into one out of the ten possible digit classes. The SVM algorithm in the Statistics Toolbox™, however, produces a binary classifier, which means that it is able to classify an object into one of two classes. In order to use a binary SVM for digit classification, 10 such classifiers are required; each one trained for a specific digit. This is a common technique used to solve multi-class classification problems with binary classifiers and is known as "one-versus-all" or "one-versus-rest" classification.

% Train an SVM classifier for each digit
digits = char('0'):char('9');

for d = 1:numel(digits)

    % Pre-allocate trainingFeatures array
    numTrainingImages = size(trainingImages,1);
    trainingFeatures  = zeros(numTrainingImages,hogFeatureSize,'single');

    % Extract HOG features from each training image. trainingImages
    % contains both positive and negative image samples.
    for i = 1:numTrainingImages
        img = imread(trainingImages{i,d});

        img = preProcess(img);

        trainingFeatures(i,:) = extractHOGFeatures(img,'CellSize',cellSize);
    end

    % Train a classifier for a digit. Each row of trainingFeatures contains
    % the HOG features extracted for a single training image. The
    % trainingLabels indicate if the features are extracted from positive
    % (true) or negative (false) training images.
    svm(d) = svmtrain(trainingFeatures, trainingLabels(:,d));
end

Test the Classifier

Now the SVM classifiers can be tested using the handwritten digit images shown earlier.

% Run each SVM classifier on the test images
for d = 1:numel(digits)

    % Pre-allocate testFeatures array
    numImages    = size(testImages,1);
    testFeatures = zeros(numImages, hogFeatureSize, 'single');

    % Extract features from each test image
    for i = 1:numImages
        img = imread(testImages{i,d});

        img = preProcess(img);

        testFeatures(i,:) = extractHOGFeatures(img,'CellSize',cellSize);
    end

    % Run all the SVM classifiers
    for digit = 1:numel(svm)
        predictedLabels(:,digit,d) = svmclassify(svm(digit), testFeatures);
    end

end

Results

Tabulate the classification results for each SVM classifier.
displayTable(predictedLabels)
digit  | svm(0)   svm(1)   svm(2)   svm(3)   svm(4)   svm(5)   svm(6)   svm(7)   svm(8)   svm(9)   
---------------------------------------------------------------------------------------------------
0      | 6        0        0        0        0        0        6        0        2        0        
1      | 3        10       0        0        0        0        0        2        0        0        
2      | 0        2        8        0        0        0        1        1        0        0        
3      | 0        0        0        7        0        0        4        0        0        0        
4      | 0        0        0        0        9        0        0        0        0        1        
5      | 0        0        0        0        0        4        7        0        1        0        
6      | 0        0        0        0        2        0        6        0        3        0        
7      | 0        0        0        1        0        0        0        5        0        1        
8      | 0        0        0        1        0        0        0        1        5        2        
9      | 0        1        0        1        1        1        0        0        0        2        

The columns of the table contain the classification results for each SVM classifier. Ideally, the table would be a diagonal matrix, where each diagonal element equals the number of images per digit (12 in this example). Based on this data set, digit 1, 2, 3, and 4 are easier to recognize compared to digit 6, where there are many false positives. Using more representative data sets like MNIST [2] or SVHN [3], which contain thousands of handwritten characters, is likely to produce a better classifier compared with the one created using this example data set.

Summary

This example illustrated the basic procedure for creating an object classifier using the extractHOGfeatures function from the Computer Vision System Toolbox and the svmclassify and svmtrain functions from the Statistics Toolbox™. Although HOG features and SVM classifiers were used here, other features and machine learning algorithms can be used in the same way. For instance, you can explore using different feature types for training the classifier; or you can see the effect of using other machine learning algorithms available in the Statistics Toolbox™ such as k-nearest neighbors.

References

[1] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection", Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005.

[2] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324.

[3] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A.Y. Ng, Reading Digits in Natural Images with Unsupervised Feature Learning NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.

Appendix - Helper functions

    function displayTable(labels)
        colHeadings = arrayfun(@(x)sprintf('svm(%d)',x),0:9,'UniformOutput',false);
        format = repmat('%-9s',1,11);
        header = sprintf(format,'digit  |',colHeadings{:});
        fprintf('\n%s\n%s\n',header,repmat('-',size(header)));
        for idx = 1:numel(digits)
            fprintf('%-9s', [digits(idx) '      |']);
            fprintf('%-9d', sum(labels(:,:,idx)));
            fprintf('\n')
        end
    end
end