Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Semantic Segmentation Examples

Analyze Training Data for Semantic Segmentation

To train a semantic segmentation network you need a collection of images and its corresponding collection of pixel labeled images. A pixel labeled image is an image where every pixel value represents the categorical label of that pixel.

The following code loads a small set of images and their corresponding pixel labeled images:

dataDir = fullfile(toolboxdir('vision'),'visiondata');
imDir = fullfile(dataDir,'building');
pxDir = fullfile(dataDir,'buildingPixelLabels');

Load the image data using an imageDatastore. An image datastore can efficiently represent a large collection of images because images are only read into memory when needed.

imds = imageDatastore(imDir);

Read and display the first image.

I = readimage(imds,1);
figure
imshow(I)

Load the pixel label images using a pixelLabelDatastore to define the mapping between label IDs and categorical names. In the dataset used here, the labels are "sky", "grass", "building", and "sidewalk". The label IDs for these classes are 1, 2, 3, 4, respectively.

Define the class names.

classNames = ["sky" "grass" "building" "sidewalk"];

Define the label ID for each class name.

pixelLabelID = [1 2 3 4];

Create a pixelLabelDatastore.

pxds = pixelLabelDatastore(pxDir,classNames,pixelLabelID);

Read the first pixel label image.

C = readimage(pxds,1);

The output C is a categorical matrix where C(i,j) is the categorical label of pixel I(i,j).

C(5,5)
ans = categorical
     sky 

Overlay the pixel labels on the image to see how different parts of the image are labeled.

B = labeloverlay(I,C);
figure
imshow(B)

The categorical output format simplifies tasks that require doing things by class names. For instance, you can create a binary mask of just the building:

buildingMask = C == 'building';

figure
imshowpair(I, buildingMask,'montage')

Create a Semantic Segmentation Network

Create a simple semantic segmentation network and learn about common layers found in many semantic segmentation networks. A common pattern in semantic segmentation networks requires the downsampling of an image between convolutional and ReLU layers, and then upsample the output to match the input size. This operation is analogous to the standard scale-space analysis using image pyramids. During this process however, a network performs the operations using non-linear filters optimized for a specific set of classes that you want to segment.

Create An Image Input Layer

A semantic segmentation network starts with an imageInputLayer, which defines the smallest image size the network can process. Most semantic segmentation networks are fully convolutional, which means they can process images that are larger than the specified input size. Here, an image size of [32 32 3] is used for the network to process 64x64 RGB images.

inputSize = [32 32 3];
imgLayer = imageInputLayer(inputSize)
imgLayer = 
  ImageInputLayer with properties:

                Name: ''
           InputSize: [32 32 3]

   Hyperparameters
    DataAugmentation: 'none'
       Normalization: 'zerocenter'

Create Downsampling Network

Start with the convolution and ReLU layers. The convolution layer padding is selected such that the output size of the convolution layer is the same as the input size. This makes it easier to construct a network because the input and output sizes between most layers remain the same as you progress through the network.

filterSize = 3;
numFilters = 32;
conv = convolution2dLayer(filterSize,numFilters,'Padding',1);
relu = reluLayer();

The downsampling is performed using a max pooling layer. Create a max pooling layer to downsample the input by a factor of 2 by setting the 'Stride' parameter to 2.

poolSize = 2;
maxPoolDownsample2x = maxPooling2dLayer(poolSize,'Stride',2);

Stack the convolution, ReLU, and max pooling layers to create a network that downsamples its input by a factor of 4.

downsamplingLayers = [
    conv
    relu
    maxPoolDownsample2x
    conv
    relu
    maxPoolDownsample2x
    ]
downsamplingLayers = 
  6x1 Layer array with layers:

     1   ''   Convolution   32 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     2   ''   ReLU          ReLU
     3   ''   Max Pooling   2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     4   ''   Convolution   32 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     5   ''   ReLU          ReLU
     6   ''   Max Pooling   2x2 max pooling with stride [2  2] and padding [0  0  0  0]

Create Upsampling Network

The upsampling is done using the tranposed convolution layer (also commonly referred to as "deconv" or "deconvolution" layer). When a transposed convolution is used for upsampling, it performs the upsampling and the filtering at the same time.

Create a transposed convolution layer to upsample by 2.

filterSize = 4;
transposedConvUpsample2x = transposedConv2dLayer(4,numFilters,'Stride',2,'Cropping',1);

The 'Cropping' parameter is set to 1 to make the output size equal twice the input size.

Stack the transposed convolution and relu layers. An input to this set of layers is upsampled by 4.

upsamplingLayers = [
    transposedConvUpsample2x
    relu
    transposedConvUpsample2x
    relu
    ]
upsamplingLayers = 
  4x1 Layer array with layers:

     1   ''   Transposed Convolution   32 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     2   ''   ReLU                     ReLU
     3   ''   Transposed Convolution   32 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     4   ''   ReLU                     ReLU

Create A Pixel Classification Layer

The final set of layers are responsible for making pixel classifications. These final layers process an input that has the same spatial dimensions (height and width) as the input image. However, the number of channels (third dimension) is larger and is equal to number of filters in the last transposed convolution layer. This third dimension needs to be squeezed down to the number of classes we wish to segment. This can be done using a 1-by-1 convolution layer whose number of filters equal the number of classes, e.g. 3.

Create a convolution layer to combine the third dimension of the input feature maps down to the number of classes.

numClasses = 3;
conv1x1 = convolution2dLayer(1,numClasses);

Following this 1-by-1 convolution layer are the softmax and pixel classification layers. These two layers combine to predict the categorical label for each image pixel.

finalLayers = [
    conv1x1
    softmaxLayer()
    pixelClassificationLayer()
    ]
finalLayers = 
  3x1 Layer array with layers:

     1   ''   Convolution                  3 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
     2   ''   Softmax                      softmax
     3   ''   Pixel Classification Layer   Cross-entropy loss 

Stack All Layers

Stack all the layers to complete the semantic segmentation network.

net = [
    imgLayer    
    downsamplingLayers
    upsamplingLayers
    finalLayers
    ]
net = 
  14x1 Layer array with layers:

     1   ''   Image Input                  32x32x3 images with 'zerocenter' normalization
     2   ''   Convolution                  32 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   ''   ReLU                         ReLU
     4   ''   Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   ''   Convolution                  32 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     6   ''   ReLU                         ReLU
     7   ''   Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     8   ''   Transposed Convolution       32 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     9   ''   ReLU                         ReLU
    10   ''   Transposed Convolution       32 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
    11   ''   ReLU                         ReLU
    12   ''   Convolution                  3 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
    13   ''   Softmax                      softmax
    14   ''   Pixel Classification Layer   Cross-entropy loss 

This network is ready to be trained using trainNetwork from Neural Network Toolbox™.

Train A Semantic Segmentation Network

Load the training data.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore for the images.

imds = imageDatastore(imageDir);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle","background"];
labelIDs   = [255 0];
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Visualize training images and ground truth pixel labels.

I = read(imds);
C = read(pxds);

I = imresize(I,5);
L = imresize(uint8(C),5);
imshowpair(I,L,'montage')

Create a semantic segmentation network. This network uses a simple semantic segmentation network based on a downsampling and upsampling design.

numFilters = 64;
filterSize = 3;
numClasses = 2;
layers = [
    imageInputLayer([32 32 1])
    convolution2dLayer(filterSize,numFilters,'Padding',1)
    reluLayer()
    maxPooling2dLayer(2,'Stride',2)
    convolution2dLayer(filterSize,numFilters,'Padding',1)
    reluLayer()
    transposedConv2dLayer(4,numFilters,'Stride',2,'Cropping',1);
    convolution2dLayer(1,numClasses);
    softmaxLayer()
    pixelClassificationLayer()
    ]
layers = 
  10x1 Layer array with layers:

     1   ''   Image Input                  32x32x1 images with 'zerocenter' normalization
     2   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   ''   ReLU                         ReLU
     4   ''   Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     6   ''   ReLU                         ReLU
     7   ''   Transposed Convolution       64 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     8   ''   Convolution                  2 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
     9   ''   Softmax                      softmax
    10   ''   Pixel Classification Layer   Cross-entropy loss 

Setup training options.

opts = trainingOptions('sgdm', ...
    'InitialLearnRate',1e-3, ...
    'MaxEpochs',100, ...
    'MiniBatchSize',64);

Create a pixel label image datastore that contains training data.

trainingData = pixelLabelImageDatastore(imds,pxds);

Train the network.

net = trainNetwork(trainingData,layers,opts);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       56.29% |       0.6931 |          0.0010 |
|      17 |          50 |       00:00:04 |       95.07% |       0.5546 |          0.0010 |
|      34 |         100 |       00:00:09 |       95.38% |       0.4408 |          0.0010 |
|      50 |         150 |       00:00:13 |       94.53% |       0.3785 |          0.0010 |
|      67 |         200 |       00:00:18 |       95.07% |       0.3278 |          0.0010 |
|      84 |         250 |       00:00:22 |       95.38% |       0.2923 |          0.0010 |
|     100 |         300 |       00:00:27 |       94.53% |       0.2820 |          0.0010 |
|========================================================================================|

Read and display a test image.

testImage = imread('triangleTest.jpg');
imshow(testImage)

Segment the test image and display the results.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)

Improve the results

The network failed to segment the triangles and classified every pixel as "background". The training appeared to be going well with training accuracies greater than 90%. However, the network only learned to classify the background class. To understand why this happened, you can count the occurrence of each pixel label across the dataset.

tbl = countEachLabel(trainingData)
tbl=2×3 table
        Name        PixelCount    ImagePixelCount
    ____________    __________    _______________

    'triangle'           10326       2.048e+05   
    'background'    1.9447e+05       2.048e+05   

The majority of pixel labels are for the background. The poor results are due to the class imbalance. Class imbalance biases the learning process in favor of the dominant class. That's why every pixel is classified as "background". To fix this, use class weighting to balance the classes. There are several methods for computing class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This increases weight given to under-represented classes.

totalNumberOfPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / totalNumberOfPixels;
classWeights = 1./frequency
classWeights = 2×1

   19.8334
    1.0531

Class weights can be specified using the pixelClassificationLayer. Update the last layer to use a pixelClassificationLayer with inverse class weights.

layers(end) = pixelClassificationLayer('ClassNames',tbl.Name,'ClassWeights',classWeights);

Train network again.

net = trainNetwork(trainingData,layers,opts);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       57.17% |       0.6945 |          0.0010 |
|      17 |          50 |       00:00:04 |       78.32% |       0.6677 |          0.0010 |
|      34 |         100 |       00:00:09 |       80.11% |       0.4283 |          0.0010 |
|      50 |         150 |       00:00:14 |       85.69% |       0.3923 |          0.0010 |
|      67 |         200 |       00:00:19 |       87.64% |       0.3395 |          0.0010 |
|      84 |         250 |       00:00:24 |       89.53% |       0.3038 |          0.0010 |
|     100 |         300 |       00:00:28 |       90.40% |       0.2855 |          0.0010 |
|========================================================================================|

Try to segment the test image again.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)

Using class weighting to balance the classes produced a better segmentation result. Additional steps to improve the results include increasing the number of epochs used for training, adding more training data, or modifying the network.

Evaluate and inspect the results of semantic segmentation

Import a test data set, run a pre-trained semantic segmentation network, and evaluate and inspect semantic segmentation quality metrics for the predicted results.

Import a data set

The triangleImages data set has 100 test images with ground truth labels. Define the location of the data set.

dataSetDir = fullfile(toolboxdir('vision'), 'visiondata', 'triangleImages');

Define the location of the test images.

testImagesDir = fullfile(dataSetDir, 'testImages');

Create an imageDatastore object holding the test images.

imds = imageDatastore(testImagesDir);

Define the location of the ground truth labels.

testLabelsDir = fullfile(dataSetDir, 'testLabels');

Define the class names and their associated label IDs. The label IDs are the pixel values used in the image files to represent each class.

classNames = ["triangle" "background"];
labelIDs   = [255 0];

Create a pixelLabelDatastore object holding the ground truth pixel labels for the test images.

pxdsTruth = pixelLabelDatastore(testLabelsDir, classNames, labelIDs);

Run a semantic segmentation classifier

Load a semantic segmentation network that has been trained on the training images of triangleImages.

net = load('triangleSegmentationNetwork.mat');
net = net.net;

Run the network on the test images. Predicted labels are written to disk in a temporary directory and returned as a pixelLabelDatastore object.

pxdsResults = semanticseg(imds, net, "WriteLocation", tempdir);
Running semantic segmentation network
-------------------------------------
* Processing 100 images.
* Progress: 100.00%

Evaluate the quality of the prediction

The predicted labels are compared to the ground truth labels. While the semantic segmentation metrics are being computed, progress is printed to the Command Window.

metrics = evaluateSemanticSegmentation(pxdsResults, pxdsTruth);
Evaluating semantic segmentation results
----------------------------------[==================================================] 100%
Elapsed time: 00:00:06
Estimated time remaining: 00:00:00
* Finalizing... Done.
* Data set metrics:

    GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
    ______________    ____________    _______    ___________    ___________

       0.90624          0.95085       0.61588      0.87529        0.40652  

Inspect class metrics

Display the classification accuracy, the intersection over union (IoU), and the boundary F-1 score for each class in the data set.

metrics.ClassMetrics
ans=2×3 table
                  Accuracy      IoU      MeanBFScore
                  ________    _______    ___________

    triangle            1     0.33005     0.028664  
    background     0.9017      0.9017      0.78438  

Display the confusion matrix

Display the confusion matrix in the Command Window.

metrics.ConfusionMatrix
ans=2×2 table
                  triangle    background
                  ________    __________

    triangle        4730            0   
    background      9601        88069   

Visualize the normalized confusion matrix as a heat map in a figure window.

normConfMatData = metrics.NormalizedConfusionMatrix.Variables;
figure
h = heatmap(classNames, classNames, 100 * normConfMatData);
h.XLabel = 'Predicted Class';
h.YLabel = 'True Class';
h.Title  = 'Normalized Confusion Matrix (%)';

Inspect an image metric

Visualize the histogram of the per-image intersection over union (IoU).

imageIoU = metrics.ImageMetrics.MeanIoU;
figure
histogram(imageIoU)
title('Image Mean IoU')

Find the test image with the lowest IoU.

[minIoU, worstImageIndex] = min(imageIoU);
minIoU = minIoU(1);
worstImageIndex = worstImageIndex(1);

Read the test image with the worst IoU, its ground truth labels, and its predicted labels for comparison.

worstTestImage = readimage(imds, worstImageIndex);
worstTrueLabels = readimage(pxdsTruth, worstImageIndex);
worstPredictedLabels = readimage(pxdsResults, worstImageIndex);

Convert the label images to images that can be displayed in a figure window.

worstTrueLabelImage = im2uint8(worstTrueLabels == classNames(1));
worstPredictedLabelImage = im2uint8(worstPredictedLabels == classNames(1));

Display the worst test image, the ground truth, and the prediction.

worstMontage = cat(4,worstTestImage, worstTrueLabelImage, worstPredictedLabelImage);
worstMontage = imresize(worstMontage, 4, "nearest");
figure
montage(worstMontage, 'Size', [1 3])
title(['Test image vs. Truth vs. Prediction. IoU = ' num2str(minIoU)])

Similarly, find the test image with the highest IoU.

[maxIoU, bestImageIndex] = max(imageIoU);
maxIoU = maxIoU(1);
bestImageIndex = bestImageIndex(1);

Repeat the previous steps to read, convert, and display the test image with the best IoU with its ground truth and predicted labels.

bestTestImage = readimage(imds, bestImageIndex);
bestTrueLabels = readimage(pxdsTruth, bestImageIndex);
bestPredictedLabels = readimage(pxdsResults, bestImageIndex);

bestTrueLabelImage = im2uint8(bestTrueLabels == classNames(1));
bestPredictedLabelImage = im2uint8(bestPredictedLabels == classNames(1));

bestMontage = cat(4,bestTestImage, bestTrueLabelImage, bestPredictedLabelImage);
bestMontage = imresize(bestMontage, 4, "nearest");
figure
montage(bestMontage, 'Size', [1 3])
title(['Test image vs. Truth vs. Prediction. IoU = ' num2str(maxIoU)])

Specify metrics to evaluate

Optionally, list the metric(s) you would like to evaluate using the 'Metrics' parameter.

Define the metrics to compute.

evaluationMetrics = ["accuracy" "iou"];

Compute these metrics for the triangleImages test data set.

metrics = evaluateSemanticSegmentation(pxdsResults, pxdsTruth, "Metrics", evaluationMetrics);
Evaluating semantic segmentation result[==================================================] 100%
Elapsed time: 00:00:01
Estimated time remaining: 00:00:00
* Finalizing... Done.
* Data set metrics:

    MeanAccuracy    MeanIoU
    ____________    _______

      0.95085       0.61588

Display the chosen metrics for each class.

metrics.ClassMetrics
ans=2×2 table
                  Accuracy      IoU  
                  ________    _______

    triangle            1     0.33005
    background     0.9017      0.9017

Import Pixel Labeled Dataset For Semantic Segmentation

This example shows you how to import a pixel labeled dataset for semantic segmentation networks.

A pixel labeled dataset is a collection of images and a corresponding set of ground truth pixel labels used for training semantic segmentation networks. There are many public datasets that provide annotated images with per-pixel labels. To illustrate the steps for importing these types of datasets, the example uses the CamVid dataset from the University of Cambridge [1].

The CamVid dataset is a collection of images containing street level views obtained while driving. The dataset provides pixel-level labels for 32 semantic classes including car, pedestrian, and road. The steps shown to import CamVid can be used to import other pixel labeled datasets.

Download CamVid Dataset

Download the CamVid image data from the following URLs:

imageURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/files/701_StillsRaw_full.zip';
labelURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/data/LabeledApproved_full.zip';

outputFolder = fullfile(tempdir, 'CamVid');
imageDir = fullfile(outputFolder,'images');
labelDir = fullfile(outputFolder,'labels');

if ~exist(outputFolder, 'dir')
    disp('Downloading 557 MB CamVid data set...');
    
    unzip(imageURL, imageDir);
    unzip(labelURL, labelDir);
end

Note: Download time of the data depends on your internet connection. The commands used above will block MATLAB® until the download is complete. Alternatively, you can use your web browser to first download the dataset to your local disk. To use the file you downloaded from the web, change the outputFolder variable above to the location of the downloaded file.

CamVid Pixel Labels

The CamVid data set encodes the pixel labels as RGB images, where each class is represented by an RGB color. Here are the classes the dataset defines along with their RGB encodings.

classNames = [ ...
    "Animal", ...
    "Archway", ...
    "Bicyclist", ...
    "Bridge", ...
    "Building", ...
    "Car", ...
    "CartLuggagePram", ...
    "Child", ...
    "Column_Pole", ...
    "Fence", ...
    "LaneMkgsDriv", ...
    "LaneMkgsNonDriv", ...
    "Misc_Text", ...
    "MotorcycleScooter", ...
    "OtherMoving", ...
    "ParkingBlock", ...
    "Pedestrian", ...
    "Road", ...
    "RoadShoulder", ...
    "Sidewalk", ...
    "SignSymbol", ...
    "Sky", ...
    "SUVPickupTruck", ...
    "TrafficCone", ...
    "TrafficLight", ...
    "Train", ...
    "Tree", ...
    "Truck_Bus", ...
    "Tunnel", ...
    "VegetationMisc", ...
    "Wall"];

Define the mapping between label indices and class names such that classNames(k) corresponds to labelIDs(k,:).

labelIDs = [ ...
    064 128 064; ... % "Animal"
    192 000 128; ... % "Archway"
    000 128 192; ... % "Bicyclist"
    000 128 064; ... % "Bridge"
    128 000 000; ... % "Building"
    064 000 128; ... % "Car"
    064 000 192; ... % "CartLuggagePram"
    192 128 064; ... % "Child"
    192 192 128; ... % "Column_Pole"
    064 064 128; ... % "Fence"
    128 000 192; ... % "LaneMkgsDriv"
    192 000 064; ... % "LaneMkgsNonDriv"
    128 128 064; ... % "Misc_Text"
    192 000 192; ... % "MotorcycleScooter"
    128 064 064; ... % "OtherMoving"
    064 192 128; ... % "ParkingBlock"
    064 064 000; ... % "Pedestrian"
    128 064 128; ... % "Road"
    128 128 192; ... % "RoadShoulder"
    000 000 192; ... % "Sidewalk"
    192 128 128; ... % "SignSymbol"
    128 128 128; ... % "Sky"
    064 128 192; ... % "SUVPickupTruck"
    000 000 064; ... % "TrafficCone"
    000 064 064; ... % "TrafficLight"
    192 064 128; ... % "Train"
    128 128 000; ... % "Tree"
    192 128 192; ... % "Truck_Bus"
    064 000 064; ... % "Tunnel"
    192 192 000; ... % "VegetationMisc"
    064 192 000];    % "Wall"

Note that other datasets have different formats of encoding data. For example, the PASCAL VOC [2] dataset uses numeric label IDs between 0 and 21 to encode their class labels.

Visualize the pixel labels for one of the CamVid images.

labels = imread(fullfile(labelDir,'0001TP_006690_L.png'));
figure
imshow(labels)

% Add colorbar to show class to color mapping.
N = numel(classNames);
ticks = 1/(N*2):1/N:1;
colorbar('TickLabels',cellstr(classNames),'Ticks',ticks,'TickLength',0,'TickLabelInterpreter','none');
colormap(labelIDs./255)

Load CamVid Data

A pixel labeled dataset can be loaded using an imageDatastore and a pixelLabelDatastore.

Create an imageDatastore to load the CamVid images.

imds = imageDatastore(fullfile(imageDir,'701_StillsRaw_full'));

Create a pixelLabelDatastore to load the CamVid pixel labels.

pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Read the 10th image and corresponding pixel label image.

I = readimage(imds,10);
C = readimage(pxds,10);

The pixel label image is returned as a categorical array where C(i,j) is the categorical label assigned to pixel I(i,j). Display the pixel label image on top of the image.

B = labeloverlay(I,C,'Colormap',labelIDs./255);
figure
imshow(B)

% Add a colorbar 
N = numel(classNames);
ticks = 1/(N*2):1/N:1;
colorbar('TickLabels',cellstr(classNames),'Ticks',ticks,'TickLength',0,'TickLabelInterpreter','none');
colormap(labelIDs./255)

Undefined or Void Labels

It is common for pixel labeled datasets to include "undefined" or "void" labels. These are used to designate pixels that were not labeled. For example, in CamVid, the label ID [0 0 0] is used to designate the "void" class. Training algorithms and evaluation algorithms are not expected to include these labels in any computations.

The "void" class need not be explicitly named when using pixelLabelDatastore. Any label ID that is not mapped to a class name is automatically labeled "undefined" and is excluded from computations. To see the undefined pixels, use isundefined to create a mask and then display it on top of the image.

undefinedPixels = isundefined(C);
B = labeloverlay(I,undefinedPixels);
figure
imshow(B)
title('Undefined Pixel Labels')

Combine Classes

When working with public datasets, you may need to combine some of the classes to better suit your application. For example, you may want to train a semantic segmentation network that segments a scene into 4 classes: road, sky, vehicle, pedestrian, and background. To do this with the CamVid dataset, group the label IDs defined above to fit the new classes. First, define the new class names.

newClassNames = ["road","sky","vehicle","pedestrian","background"];

Next, group label IDs using a cell array of M-by-3 matrices.

groupedLabelIDs = {
    % road
    [
    128 064 128; ... % "Road"
    128 000 192; ... % "LaneMkgsDriv"
    192 000 064; ... % "LaneMkgsNonDriv"
    000 000 192; ... % "Sidewalk" 
    064 192 128; ... % "ParkingBlock"
    128 128 192; ... % "RoadShoulder"
    ]
   
    % "sky"
    [
    128 128 128; ... % "Sky"
    ]
    
    % "vehicle"
    [
    064 000 128; ... % "Car"
    064 128 192; ... % "SUVPickupTruck"
    192 128 192; ... % "Truck_Bus"
    192 064 128; ... % "Train"
    000 128 192; ... % "Bicyclist"
    192 000 192; ... % "MotorcycleScooter"
    128 064 064; ... % "OtherMoving"
    ]
     
    % "pedestrian"
    [
    064 064 000; ... % "Pedestrian"
    192 128 064; ... % "Child"
    064 000 192; ... % "CartLuggagePram"
    064 128 064; ... % "Animal"
    ]
    
    % "background"      
    [
    128 128 000; ... % "Tree"
    192 192 000; ... % "VegetationMisc"    
    192 128 128; ... % "SignSymbol"
    128 128 064; ... % "Misc_Text"
    000 064 064; ... % "TrafficLight"  
    064 064 128; ... % "Fence"
    192 192 128; ... % "Column_Pole"
    000 000 064; ... % "TrafficCone"
    000 128 064; ... % "Bridge"
    128 000 000; ... % "Building"
    064 192 000; ... % "Wall"
    064 000 064; ... % "Tunnel"
    192 000 128; ... % "Archway"
    ]
    };

Create a pixelLabelDatastore using the new class and label IDs.

pxds = pixelLabelDatastore(labelDir,newClassNames,groupedLabelIDs);

Read the 10th pixel label image and display it on top of the image.

C = readimage(pxds,10);
cmap = jet(numel(newClassNames));
B = labeloverlay(I,C,'Colormap',cmap);
figure
imshow(B)

% add colorbar
N = numel(newClassNames);
ticks = 1/(N*2):1/N:1;
colorbar('TickLabels',cellstr(newClassNames),'Ticks',ticks,'TickLength',0,'TickLabelInterpreter','none');
colormap(cmap)

The pixelLabelDatastore with the new class names can now be used to train a network for the 4 classes without having to modify the original CamVid pixel labels.

References

[1] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic object classes in video: A high-definition ground truth database." Pattern Recognition Letters 30.2 (2009): 88-97.

[2] Everingham, M., et al. "The PASCAL visual object classes challenge 2012 results." See http://www. pascal-network. org/challenges/VOC/voc2012/workshop/index. html. Vol. 5. 2012.