Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

pixelLabelImageDatastore

Datastore for semantic segmentation networks

Description

Use pixelLabelImageDatastore to create a datastore for training a semantic segmentation network using deep learning.

Creation

Syntax

pximds = pixelLabelImageDatastore(gTruth)
pximds = pixelLabelImageDatastore(imds,pxds)
pximds = pixelLabelImageDatastore(___,Name,Value)

Description

example

pximds = pixelLabelImageDatastore(gTruth) returns a datastore for training a semantic segmentation network based on the input array of groundTruth objects. Use the output pixelLabelImageDatastore object with the Neural Network Toolbox™ function trainNetwork to train convolutional neural networks for semantic segmentation.

pximds = pixelLabelImageDatastore(imds,pxds) returns a datastore based on the input image datastore and the pixel label datastore objects. imds is an ImageDatastore object that represents the training input to the network. pxds is a PixelLabelDatastore object that represents the required network output.

pximds = pixelLabelImageDatastore(___,Name,Value) additionally uses name-value pairs to set the ColorPreprocessing, DataAugmentation, DispatchInBackground, OutputSize, and OutputSizeMode properties. You can specify multiple name-value pairs. Enclose each property name in quotes.

For example, pixelLabelImageDatastore(gTruth,'PatchesPerImage',40) creates a denoising image datastore and randomly generates 40 noisy patches from each image in the image datastore, imds.

Input Arguments

expand all

Ground truth data, specified as a groundTruth object. You can use the Image Labeler to create a groundTruth object for training a semantic segmentation network.

Collection of images, specified as an ImageDatastore object.

Collection of pixel labeled images, specified as a PixelLabelDatastore object. The object contains the pixel labeled images for each image contained in the imds input object.

Properties

expand all

This property is read-only.

Image file names used as the source for ground truth images, specified as a character vector or a cell array of character vectors.

This property is read-only.

Pixel label data file names used as the source for ground truth label images, specified as a character or a cell array of characters.

This property is read-only.

Class names, specified as a cell array of character vectors.

Color channel preprocessing, specified as 'none', 'gray2rgb', or 'rgb2gray'. Use this property when you need the image data created by the data source must be only color or grayscale, but the training set includes both. Suppose you need to train a network that expects color images but some of your training images are grayscale. Set ColorPreprocessing to 'gray2rgb' to replicate the color channels of the grayscale images in the input image set. Using the 'gray2rgb' option creates M-by-N-by-3 output images.

Preprocessing applied to input images, specified as an imageDataAugmenter object or 'none'. When DataAugmentation is 'none', no preprocessing is applied to input images. Training data can be augmented in real-time during training.

Dispatch observations in the background during training, prediction, and classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox™. If DispatchInBackground is true and you have Parallel Computing Toolbox™, then pixelLabelImageDatastore asynchronously reads patches, adds noise, and queues patch pairs.

This property is read-only.

Number of observations that are returned in each batch. For training, prediction, or classification, the MiniBatchSize property is set to the mini batch size defined in trainingOptions.

This property is read-only.

Total number of observations in the denoising image datastore. The number of observations is the length of one training epoch.

Size of output images, specified as a vector of two positive integers. The first element specifies the number of rows in the output images, and the second element specifies the number of columns. When you specify OutputSize, image sizes are adjusted as necessary. By default, this property is empty, which means that the images are not adjusted.

Method used to resize output images, specified as one of the following. This property applies only when you set OutputSize to a value other than [].

  • 'resize' — Scale the image to fit the output size. For more information, see imresize.

  • 'centercrop' — Take a crop from the center of the training image. The crop has the same size as the output size.

  • 'randcrop' — Take a random crop from the training image. The random crop has the same size as the output size.

Data Types: char | string

Object Functions

countEachLabelCount occurrence of pixel label for data source images
hasdataDetermine if data is available to read
partitionByIndexPartition pixelLabelImageDatastore according to indices
previewSubset of data in datastore
readRead data from pixelLabelImageDatastore
readallRead all data in datastore
readByIndexRead data specified by index from pixelLabelImageDatastore
resetReset datastore to initial state
shuffleShuffle data in pixelLabelImageDatastore

Examples

collapse all

Load the training data.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore for the images.

imds = imageDatastore(imageDir);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle","background"];
labelIDs   = [255 0];
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Visualize training images and ground truth pixel labels.

I = read(imds);
C = read(pxds);

I = imresize(I,5);
L = imresize(uint8(C),5);
imshowpair(I,L,'montage')

Create a semantic segmentation network. This network uses a simple semantic segmentation network based on a downsampling and upsampling design.

numFilters = 64;
filterSize = 3;
numClasses = 2;
layers = [
    imageInputLayer([32 32 1])
    convolution2dLayer(filterSize,numFilters,'Padding',1)
    reluLayer()
    maxPooling2dLayer(2,'Stride',2)
    convolution2dLayer(filterSize,numFilters,'Padding',1)
    reluLayer()
    transposedConv2dLayer(4,numFilters,'Stride',2,'Cropping',1);
    convolution2dLayer(1,numClasses);
    softmaxLayer()
    pixelClassificationLayer()
    ]
layers = 
  10x1 Layer array with layers:

     1   ''   Image Input                  32x32x1 images with 'zerocenter' normalization
     2   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     3   ''   ReLU                         ReLU
     4   ''   Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
     6   ''   ReLU                         ReLU
     7   ''   Transposed Convolution       64 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
     8   ''   Convolution                  2 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
     9   ''   Softmax                      softmax
    10   ''   Pixel Classification Layer   Cross-entropy loss 

Setup training options.

opts = trainingOptions('sgdm', ...
    'InitialLearnRate',1e-3, ...
    'MaxEpochs',100, ...
    'MiniBatchSize',64);

Create a pixel label image datastore that contains training data.

trainingData = pixelLabelImageDatastore(imds,pxds);

Train the network.

net = trainNetwork(trainingData,layers,opts);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       56.29% |       0.6931 |          0.0010 |
|      17 |          50 |       00:00:04 |       95.07% |       0.5546 |          0.0010 |
|      34 |         100 |       00:00:09 |       95.38% |       0.4408 |          0.0010 |
|      50 |         150 |       00:00:13 |       94.53% |       0.3785 |          0.0010 |
|      67 |         200 |       00:00:18 |       95.07% |       0.3278 |          0.0010 |
|      84 |         250 |       00:00:22 |       95.38% |       0.2923 |          0.0010 |
|     100 |         300 |       00:00:27 |       94.53% |       0.2820 |          0.0010 |
|========================================================================================|

Read and display a test image.

testImage = imread('triangleTest.jpg');
imshow(testImage)

Segment the test image and display the results.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)

Improve the results

The network failed to segment the triangles and classified every pixel as "background". The training appeared to be going well with training accuracies greater than 90%. However, the network only learned to classify the background class. To understand why this happened, you can count the occurrence of each pixel label across the dataset.

tbl = countEachLabel(trainingData)
tbl=2×3 table
        Name        PixelCount    ImagePixelCount
    ____________    __________    _______________

    'triangle'           10326       2.048e+05   
    'background'    1.9447e+05       2.048e+05   

The majority of pixel labels are for the background. The poor results are due to the class imbalance. Class imbalance biases the learning process in favor of the dominant class. That's why every pixel is classified as "background". To fix this, use class weighting to balance the classes. There are several methods for computing class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This increases weight given to under-represented classes.

totalNumberOfPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / totalNumberOfPixels;
classWeights = 1./frequency
classWeights = 2×1

   19.8334
    1.0531

Class weights can be specified using the pixelClassificationLayer. Update the last layer to use a pixelClassificationLayer with inverse class weights.

layers(end) = pixelClassificationLayer('ClassNames',tbl.Name,'ClassWeights',classWeights);

Train network again.

net = trainNetwork(trainingData,layers,opts);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       57.17% |       0.6945 |          0.0010 |
|      17 |          50 |       00:00:04 |       78.32% |       0.6677 |          0.0010 |
|      34 |         100 |       00:00:09 |       80.11% |       0.4283 |          0.0010 |
|      50 |         150 |       00:00:14 |       85.69% |       0.3923 |          0.0010 |
|      67 |         200 |       00:00:19 |       87.64% |       0.3395 |          0.0010 |
|      84 |         250 |       00:00:24 |       89.53% |       0.3038 |          0.0010 |
|     100 |         300 |       00:00:28 |       90.40% |       0.2855 |          0.0010 |
|========================================================================================|

Try to segment the test image again.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)

Using class weighting to balance the classes produced a better segmentation result. Additional steps to improve the results include increasing the number of epochs used for training, adding more training data, or modifying the network.

Configure a pixel label image datastore to augment data while training.

Load training images and pixel labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an imageDatastore object to hold the training images.

imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

classNames = ["triangle","background"];
labelIDs   = [255 0];

Create a pixelLabelDatastore object to hold the ground truth pixel labels for the training images.

pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Create an imageDataAugmenter object to randomly rotate and mirror image data.

augmenter = imageDataAugmenter('RandRotation',[-10 10],'RandXReflection',true)
augmenter = 
  imageDataAugmenter with properties:

           FillValue: 0
     RandXReflection: 1
     RandYReflection: 0
        RandRotation: [-10 10]
          RandXScale: [1 1]
          RandYScale: [1 1]
          RandXShear: [0 0]
          RandYShear: [0 0]
    RandXTranslation: [0 0]
    RandYTranslation: [0 0]

Create a pixelLabelImageDatastore object to train the network with augmented data.

plimds = pixelLabelImageDatastore(imds,pxds,'DataAugmentation',augmenter)
plimds = 
  pixelLabelImageDatastore with properties:

                  Images: {200x1 cell}
          PixelLabelData: {200x1 cell}
              ClassNames: {2x1 cell}
        DataAugmentation: [1x1 imageDataAugmenter]
      ColorPreprocessing: 'none'
              OutputSize: []
          OutputSizeMode: 'resize'
           MiniBatchSize: 1
         NumObservations: 200
    DispatchInBackground: 0

Introduced in R2018a