Deep learning transposed convolution
The transposed convolution operation upsamples feature maps.
computes the deep learning transposed convolution of the input
dlY = dltranspconv(
the filters defined by
weights, and adds a constant
bias. The input
dlX is a formatted
dlarray with dimension labels. Transposed convolution acts on
dimensions that you specify as
dlY is a formatted
dlarray with the same
dimension labels as
Convolve an image and then use transposed convolution to resize the convolved image to the same size as the original image.
Import the image data and convert it to a
X = imread('sherlock.jpg'); dlX = dlarray(single(X),'SSC');
Display the image.
Initialize the convolutional filters and bias term. Specify an ungrouped convolution that applies a single filter to all three channels of the input data.
filterHeight = 10; filterWidth = 10; numChannelsPerGroup = 3; numFiltersPerGroup = 1; numGroups = 1; weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups); bias = rand(numFiltersPerGroup*numGroups,1);
Perform the convolution. Use a
'Stride' value of
2 and a
'DilationFactor' value of
dlY = dlconv(dlX,weights,bias,'Stride',2,'DilationFactor',3);
Display the convolved image.
Y = extractdata(dlY); imshow(rescale(Y))
Initialize the transposed convolutional filters and bias. Specify an ungrouped transposed convolution that applies three filters to the input. Use the same filter height and filter width as for the convolution operation.
numChannelsPerGroupTC = 1; numFiltersPerGroupTC = 3; weightsTC = rand(filterHeight,filterWidth,numFiltersPerGroupTC,numChannelsPerGroupTC,numGroups); biasTC = rand(numFiltersPerGroupTC*numGroups,1);
Perform the transposed convolution. Use the same stride and dilation factor as for the convolution operation.
dlZ = dltranspconv(dlY,weightsTC,biasTC,'Stride',2,'DilationFactor',3);
Display the image after the transposed convolution.
Z = extractdata(dlZ); imshow(rescale(Z))
Compare the size of the original image, the convolved image, and the image after the transposed convolution.
sizeX = size(X) sizeY = size(Y) sizeZ = size(Z)
sizeX = 1×3 640 960 3 sizeY = 1×2 311 471 sizeZ = 1×3 640 960 3
The transposed convolution upsamples the convolved data to the size of the original input data.
Apply transposed convolution to the input data in three groups of two channels each. Apply four filters per group.
Create the input data as ten observations of size 100-by-100 with six channels.
height = 100; width = 100; channels = 6; numObservations = 10; X = rand(height,width,channels,numObservations); dlX = dlarray(X,'SSCB');
Initialize the filters for the transposed convolution operation. Specify three groups of transposed convolutions that each apply four filters to two channels of the input data.
filterHeight = 8; filterWidth = 8; numChannelsPerGroup = 2; numFiltersPerGroup = 4; numGroups = 3; weights = rand(filterHeight,filterWidth,numFiltersPerGroup,numChannelsPerGroup,numGroups);
Initialize the bias term.
bias = rand(numFiltersPerGroup*numGroups,1);
Perform the transposed convolution.
dlY = dltranspconv(dlX,weights,bias); size(dlY) dims(dlY)
ans = 1×4 107 107 12 10 ans = 'SSCB'
The 12 channels of the convolution output represent the three groups of transposed convolutions with four filters per group.
dlX— Input data
dlarray| numeric array
Input data, specified as a
dlarray with or without dimension
labels or a numeric array. When
dlX is not a formatted
dlarray, you must specify the dimension label format using
dlX is a numeric array, at
least one of
bias must be a
Convolution acts on dimensions that you specify as spatial dimensions using the
'S' dimension label. You can specify up to three dimensions in
dlarray| numeric array
Filters, specified as a
dlarray with or without labels or a
numeric array. The
weights argument specifies the size and values of
the filters, as well as the number of filters and the number of groups for grouped
Specify weights as a
filterSize — Size of the convolutional filters.
filterSize can have up to three dimensions, depending on the
number of spatial dimensions in the input data.
|Input Data |
|1-D||h, where h corresponds to the height of the filter|
|2-D||h-by-w, where h and w correspond to the height and width of the filter, respectively|
|3-D||h-by-w-by-d, where h, w, and d correspond to the height, width, and depth of the filter, respectively|
numFiltersPerGroup — Number of filters to apply within each
numChannelsPerGroup — Number of channels within each group
for grouped transposed convolutions.
equal the number of channels in the input data divided by
numGroups, the number of groups. For ungrouped convolutions,
numGroups = 1,
equal the number of channels in the input data.
numGroups — Number of groups (optional). When
numGroups > 1, the function performs grouped transposed
numGroups = 1, the function performs ungrouped
transposed convolutions; in this case, this dimension is singleton and can be
weights is a formatted
dlarray, it can have
multiple spatial dimensions labeled
'S', one channel dimension
'C', and up to two other dimensions labeled
'U'. The number of
'S' dimensions must match the
'S' dimensions of the input data. The labeled dimensions
correspond to the filter specifications as follows.
|Filter Specification||Dimension Labels|
|Up to three |
bias— Bias constant
dlarrayscalar | numeric vector | numeric scalar
Bias constant, specified as a
dlarray vector or
dlarray scalar with or without labels, a numeric vector, or a
bias is a scalar or has only singleton dimensions, the
same bias is applied to each entry of the output.
bias has a nonsingleton dimension, each element of
bias is the bias applied to the corresponding convolutional
filter specified by
weights. The number of elements of
bias must match the number of filters specified by
bias is a formatted
nonsingleton dimension must be a channel dimension labeled
FMT— Dimension order of unformatted data
Dimension order of unformatted input data, specified as the comma-separated pair consisting of
'DataFormat' and a character array or string that provides a
label for each dimension of the data. Each character in
FMT must be
one of the following:
'S' — Spatial
'C' — Channel
'B' — Batch (for example, samples and
'T' — Time (for example, sequences)
'U' — Unspecified
You can specify multiple dimensions labeled
'U'. You can use the labels
'T' at most once.
You must specify
'DataFormat' when the input data
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'Stride',2sets the stride of each filter to 2.
'Stride'— Step size for traversing input data
Step size for traversing the input data, specified as the comma-separated pair consisting of
'Stride' and a numeric scalar or numeric vector. If you specify
'Stride' as a scalar, the same value is used for all spatial
dimensions. If you specify
'Stride' as a vector of the same size as
the number of spatial dimensions of the input data, the vector values are used for the
corresponding spatial dimensions.
The default value of
'DilationFactor'— Filter dilation factor
Filter dilation factor, specified as the comma-separated pair consisting of
'DilationFactor' and one of the following.
Numeric scalar — The same dilation factor value is applied for all spatial dimensions.
Numeric vector — A different dilation factor value is applied along each
spatial dimension. Use a vector of size
d is the number of spatial dimensions of the input
ith element of the vector specifies the
dilation factor applied to the
Use the dilation factor to increase the receptive field of the filter (the area of the input that the filter can see) on the input data. Using a dilation factor corresponds to an effective filter size of
filterSize + (filterSize-1)*(dilationFactor-1).
'Cropping'— Cropping applied to edges of data
'same'| numeric scalar | numeric vector | numeric matrix
Cropping applied to edges of data, specified as the comma-separated pair
'Cropping' and one of the following.
'same' — Cropping is set so that the output size is the
same as the input size when the stride is
1. More generally,
the output size of each spatial dimension is
inputSize is the
size of the input along a spatial dimension.
Numeric scalar — The same cropping value is applied to both ends of all spatial dimensions.
Numeric vector — A different cropping value is applied along each spatial
dimension. Use a vector of size
is the number of spatial dimensions of the input data. The
ith element of the vector specifies the cropping applied to
the start and the end along the
ith spatial dimension.
Numeric matrix — A different cropping value is applied to the start and end
of each spatial dimension. Use a matrix of size 2-by-
d is the number of spatial dimensions of the input data.
(1,d) specifies the cropping applied to the start
of spatial dimension
d. The element
specifies the cropping applied to the end of spatial dimension
d. For example, in 2-D the format is
dlY— Feature map
Feature map, returned as a
dlarray. The output
dlY has the same underlying data type as the input
If the input data
dlX is a formatted
dlY has the same dimension labels as
dlX. If the
input data is not a formatted
dlY is an
dlarray or numeric array with the same dimension order as
the input data.
The size of the
'C' channel dimension of
depends on the size of the
weights input. The size of the
'C' dimension of output
Y is the product of the
size of the dimensions
numGroups in the
weights argument. If
weights is a formatted
dlarray, this product is
the same as the product of the size of the
'C' dimension and the
Usage notes and limitations:
When at least one of the following input arguments is a
dlarray with underlying data of type
gpuArray, this function runs on the GPU.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).