Main Content

Layer normalization layer

A layer normalization layer normalizes a mini-batch of data across all channels for each observation independently. To speed up training of recurrent and multilayer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization layers after the learnable layers, such as LSTM and fully connected layers.

After normalization, the layer scales the input with a learnable scale factor
*γ* and shifts it by a learnable offset
*β*.

creates a
layer normalization layer.`layer`

= layerNormalizationLayer

sets the optional `layer`

= layerNormalizationLayer(`Name,Value)`

`Epsilon`

, Parameters and Initialization, Learning Rate and Regularization, and `Name`

properties using one or more name-value arguments. For
example, `layerNormalizationLayer('Name','layernorm')`

creates a layer
normalization layer with name `'layernorm'`

.

The layer normalization operation normalizes the elements
*x _{i}* of the input by first calculating the mean

$$\widehat{{x}_{i}}=\frac{{x}_{i}-{\mu}_{L}}{\sqrt{{\sigma}_{L}^{2}+\u03f5}},$$

where *ϵ* is a constant that improves numerical
stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow layer normalization, the layer normalization operation further shifts and scales the activations using the transformation

$${y}_{i}=\gamma {\widehat{x}}_{i}+\beta ,$$

where the offset *β* and scale factor
*γ* are learnable parameters that are updated during network
training.

[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. “Layer Normalization.” Preprint, submitted July 21, 2016. https://arxiv.org/abs/1607.06450.

`batchNormalizationLayer`

| `trainNetwork`

| `trainingOptions`

| `reluLayer`

| `convolution2dLayer`

| `groupNormalizationLayer`