Create U-Net layers for semantic segmentation
lgraph = unetLayers(imageSize,numClasses)
lgraph = unetLayers(imageSize,numClasses,Name,Value)
unetLayers to create the network architecture for U-Net. You
must train the network using the Deep Learning
Create U-Net layers with an encoder/decoder depth of 3.
imageSize = [480 640 3]; numClasses = 5; encoderDepth = 3; lgraph = unetLayers(imageSize,numClasses,'EncoderDepth',encoderDepth)
lgraph = LayerGraph with properties: Layers: [46x1 nnet.cnn.layer.Layer] Connections: [48x2 table]
Display the network.
Load training images and pixel labels.
dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages'); imageDir = fullfile(dataSetDir,'trainingImages'); labelDir = fullfile(dataSetDir,'trainingLabels');
imageDatastore holding the training images.
imds = imageDatastore(imageDir);
Define the class names and their associated label IDs.
classNames = ["triangle","background"]; labelIDs = [255 0];
pixelLabelDatastore holding the ground truth pixel labels for the training images.
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);
imageSize = [32 32]; numClasses = 2; lgraph = unetLayers(imageSize, numClasses)
lgraph = LayerGraph with properties: Layers: [58×1 nnet.cnn.layer.Layer] Connections: [61×2 table]
Create data source for training a semantic segmentation network.
ds = pixelLabelImageDatastore(imds,pxds);
Set up training options.
options = trainingOptions('sgdm','InitialLearnRate',1e-3, ... 'MaxEpochs',20,'VerboseFrequency',10);
Train the network.
net = trainNetwork(ds,lgraph,options)
Training on single CPU. Initializing image normalization. |========================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Accuracy | Loss | Rate | |========================================================================================| | 1 | 1 | 00:00:04 | 5.21% | 15.1044 | 0.0010 | | 10 | 10 | 00:00:43 | 96.09% | 0.4845 | 0.0010 | | 20 | 20 | 00:01:25 | 94.38% | 0.7715 | 0.0010 | |========================================================================================|
net = DAGNetwork with properties: Layers: [58×1 nnet.cnn.layer.Layer] Connections: [61×2 table]
imageSize— Network input image size
Network input image size, specified as a:
2-element vector in the format [height, width].
3-element vector in the format [height,
width, depth]. depth is
the number of image channels. Set depth to
for RGB images,
1 for grayscale images, or to the number of
channels for multispectral and hyperspectral images.
numClasses— Number of classes
Number of classes in the semantic segmentation, specified as an integer greater than 1.
'EncoderDepth'— Encoder depth
4(default) | positive integer
Encoder depth, specified as a positive integer. U-Net is composed of an encoder
and corresponding decoder subnetwork. The depth of these networks determines the
number of times the input image is downsampled or upsampled as it is processed. The
encoder network downsamples the input image by a factor of
2D, where D is
the value of
EncoderDepth. The decoder network upsamples the
encoder network output by a factor of
'NumOutputChannels'— Number of output channels
64(default) | positive integer | vector of positive integers
Number of output channels for the first subsection in the U-Net encoder network,
specified as a positive integer or vector of positive integers. Each of the subsequent
enoder subsections double the number of output channels.
sets the number of output channels in the decoder sections to match the corresponding
'FilterSize'— Convolutional layer filter size
3(default) | positive odd integer | 2-element row vector of positive odd integers
Convolutional layer filter size, specified as a positive odd integer or a 2-element row vector of positive odd integers. Typical values are in the range [3, 7].
|scalar||The filter is square.|
|2-element row vector|
The filter has the size [height width].
The sections within the U-Net encoder subnetworks consist of two sets of convolutional and ReLU layers, followed by a 2x2 max pooling layer. The decoder subnetworks consist of a transposed convolution layer for upsampling, followed by two sets of convolutional and ReLU layers.
Convolutional layers in
unetLayers use '
padding, which retains the data size from input to output and enables a broad set of input
image sizes. The original version by Ronneberger does not use padding and is
constrained to a smaller set of input image sizes.
The bias term of all convolutional layers is initialized to zero.
Convolution layer weights in the encoder and decoder subnetworks are initialized using
He' weight initialization method .
Networks produced by
unetLayers support GPU code generation for
deep learning once they are trained with
trainNetwork. See Deep Learning Code Generation (Deep Learning Toolbox) for details and
 Ronneberger, O., P. Fischer, and T. Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation." Medical Image Computing and Computer-Assisted Intervention (MICCAI). Vol. 9351, 2015, pp. 234–241.
 He, K., X. Zhang, S. Ren, and J. Sun. "Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034.