Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

pixelLabelImageSource

Data source for semantic segmentation networks

Description

Use pixelLabelImageSource to create a data source for training a semantic segmentation network using deep learning.

Creation

Syntax

datasource = pixelLabelImageSource(gTruth)
datasource = pixelLabelImageSource(imds,pxds)
datasource = pixelLabelImageSource(___,Name,Value)

Description

example

datasource = pixelLabelImageSource(gTruth) returns a data source for training a semantic segmentation network based on the input array of groundTruth objects. Use the output pixelLabelImageSource object with theNeural Network Toolbox™ function trainNetwork to train convolutional neural networks for semantic segmentation.

datasource = pixelLabelImageSource(imds,pxds) returns a data source based on the input image datastore and the pixel label datastore objects. imds that an imageDatastore object that represents the training input to the network. pxds is a PixelLabelDatastore object that represents the required network output.

datasource = pixelLabelImageSource(___,Name,Value) additionally sets properties using name-value pairs.

Input Arguments

expand all

Ground truth data, specified as a groundTruth object. You can use the Image Labeler to create a groundTruth object for training a semantic segmentation network.

Collection of images, specified as an ImageDatastore object.

Collection of pixel labeled images, specified as a PixelLabelDatastore object. The object contains the pixel labeled images for each image contained in the imds input object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'ColorProcessing','rgb2gray'

expand all

Image data augmentation, specified as 'none' or an imageDataAugmenter object. Training data can be augmented in real-time during training.

Color channel preprocessing, specified as 'none', 'gray2rgb', or 'rgb2gray'. Use this property when you need the image data created by the data source must be only color or grayscale, but the training set includes both. Suppose you need to train a network that expects color images but some of your training images are grayscale. Set ColorPreprocessing to 'gray2rgb' to replicate the color channels of the grayscale images in the input image set. Using the 'gray2rgb' option creates M-by-N-by-3 output images.

Size of images produced by data source, specified as a 2-element vector indicating the number of rows and columns. When you specify the OutputSize, image sizes are adjusted as necessary. By default, this property is empty, which means that the images are not adjusted.

Technique used to adjust image sizes, specified as 'false', 'resize', 'centercrop', or 'randcrop'. This property applies only when you set OutputSize to a value other than [].

Accelerate image augmentation, specified as false or true. When you set BackgroundExecution to true, the object asyncronously reads, augments, and queues augmented images for use in training. This option requires Parallel Computing Toolbox™ software.

Properties

expand all

Image file names used as the source for ground truth images, specified as a character vector or a cell array of character vectors.

Pixel label data file names used as the source for ground truth label images, specified as a character or a cell array of characters.

Class names, specified as a cell array of character vectors.

Object Functions

countEachLabelCount occurrence of pixel label for data source images

Examples

expand all

Load training data.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore for the images.

imds = imageDatastore(imageDir);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle","background"];
labelIDs   = [255 0];
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Visualize training images and ground truth pixel labels.

I = read(imds);
C = read(pxds);
figure
I = imresize(I,5);
L = imresize(uint8(C),5);
imshowpair(I,L,'montage')

Create a semantic segmentation network. This network uses a simple semantic segmentation network based on a downsampling and upsampling design.

numFilters = 64;
filterSize = 3;
numClasses = 2;
layers = [
    imageInputLayer([32 32 1])
    convolution2dLayer(filterSize,numFilters,'Padding',1)
    reluLayer()
    maxPooling2dLayer(2,'Stride',2)
    convolution2dLayer(filterSize,numFilters,'Padding',1)
    reluLayer()
    transposedConv2dLayer(4,numFilters,'Stride',2,'Cropping',1);
    convolution2dLayer(1,numClasses);
    softmaxLayer()
    pixelClassificationLayer()
    ]
layers = 
  10x1 Layer array with layers:

     1   ''   Image Input                  32x32x1 images with 'zerocenter' normalization
     2   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   ''   ReLU                         ReLU
     4   ''   Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     6   ''   ReLU                         ReLU
     7   ''   Transposed Convolution       64 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     8   ''   Convolution                  2 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
     9   ''   Softmax                      softmax
    10   ''   Pixel Classification Layer   Cross-entropy loss 

Setup training options.

opts = trainingOptions('sgdm', ...
    'InitialLearnRate', 1e-3, ...
    'MaxEpochs', 100, ...
    'MiniBatchSize', 64);

Create a data source for training data.

trainingData = pixelLabelImageSource(imds,pxds);

Train the network.

net = trainNetwork(trainingData,layers,opts);
Training on single CPU.
Initializing image normalization.
|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            1 |            1 |         7.20 |       0.6934 |       27.67% |       0.0010 |
|           17 |           50 |       286.59 |       0.5547 |       95.06% |       0.0010 |
|           34 |          100 |       550.33 |       0.4425 |       95.13% |       0.0010 |
|           50 |          150 |       829.03 |       0.3780 |       94.57% |       0.0010 |
|           67 |          200 |      1140.88 |       0.3276 |       95.06% |       0.0010 |
|           84 |          250 |      1596.55 |       0.2952 |       95.13% |       0.0010 |
|          100 |          300 |      2078.93 |       0.2806 |       94.57% |       0.0010 |
|=========================================================================================|

Read and display a test image.

testImage = imread('triangleTest.jpg');

figure
imshow(testImage)

Segment the test image and display the results.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
figure
imshow(B)

Improve the results

The network failed to property segment the triangles and classified every pixel as "background". The training appeared to be going well with training accuracies greater than 90%. However, the network only learned to classify the background class. To understand why this happened, you can count the occurrence of each pixel label across the dataset.

tbl = countEachLabel(trainingData)
tbl=2x3 table
        Name        PixelCount    ImagePixelCount
    ____________    __________    _______________

    'triangle'           10326    2.048e+05      
    'background'    1.9447e+05    2.048e+05      

The majority of pixel labels are for the background. The poor results are due to the class imbalance. Class imbalance biases the learning process in favor of the dominant class. That's why every pixel is classified as "background". To fix this, use class weighting to balance the classes. There are several methods for computing class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This increases weight given to under-represented classes.

totalNumberOfPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / totalNumberOfPixels;
classWeights = 1./frequency
classWeights = 

   19.8334
    1.0531

Class weights can be specified using the pixelClassificationLayer. Update the last layer to use a pixelClassificationLayer with inverse class weights.

layers(end) = pixelClassificationLayer('ClassNames',tbl.Name,'ClassWeights',classWeights);

Train network again.

net = trainNetwork(trainingData,layers,opts);
Training on single CPU.
Initializing image normalization.
|=========================================================================================|
|     Epoch    |   Iteration  | Time Elapsed |  Mini-batch  |  Mini-batch  | Base Learning|
|              |              |  (seconds)   |     Loss     |   Accuracy   |     Rate     |
|=========================================================================================|
|            1 |            1 |        10.49 |       0.6939 |       48.57% |       0.0010 |
|           17 |           50 |       466.98 |       0.6651 |       73.35% |       0.0010 |
|           34 |          100 |       805.67 |       0.4365 |       79.28% |       0.0010 |
|           50 |          150 |      1130.17 |       0.3907 |       86.29% |       0.0010 |
|           67 |          200 |      1531.26 |       0.3343 |       87.49% |       0.0010 |
|           84 |          250 |      1946.20 |       0.3038 |       89.12% |       0.0010 |
|          100 |          300 |      2381.43 |       0.2825 |       90.85% |       0.0010 |
|=========================================================================================|

Try to segment the test image again.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
figure
imshow(B)

Using class weighting to balance the classes produced a better segmentation result. Additional steps to improve the results include increasing the number of epochs used for training, adding more training data, or modifying the network.

Configure the pixelLabelImageSource to augment data while training.

Load training images and pixel labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an imageDatastore holding the training images.

imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

classNames = ["triangle","background"];
labelIDs   = [255 0];

Create a pixelLabelDatastore holding the ground truth pixel labels for the training images.

pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Create an imageDataAugmenter. For example, randomly roate and mirror image data.

augmenter = imageDataAugmenter('RandRotation',[-10 10],'RandXReflection',true);

Create a datasource for training network with augmented data.

datasource = pixelLabelImageSource(imds,pxds,'DataAugmentation',augmenter);
datasource.DataAugmentation
ans = 
  imageDataAugmenter with properties:

           FillValue: 0
     RandXReflection: 1
     RandYReflection: 0
        RandRotation: [-10 10]
          RandXScale: [1 1]
          RandYScale: [1 1]
          RandXShear: [0 0]
          RandYShear: [0 0]
    RandXTranslation: [0 0]
    RandYTranslation: [0 0]

Tips

  • You cannot specify a pixelLabelImageSource with the 'ValidationData' name-value pair argument in the trainingOptions function. To use pixel labeled image sources as validation data, in the call to trainingOptions, specify 'CheckpointPath' to save a copy of the model every epoch. Also, specify an output function using 'OutputFcn'. Within the output function, load the saved checkpoint and run validation using evaluateSemanticSegmentation. If the results are acceptable, then stop the training by returning true from the output function.

Introduced in R2017b