Word embedding layer for deep learning networks
A word embedding layer maps word indices to vectors.
Use a word embedding layer in a deep learning long short-term memory (LSTM) network. An LSTM network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. A word embedding layer maps a sequence of word indices to embedding vectors and learns the word embedding during training.
This layer requires Deep Learning Toolbox™.
creates a word embedding layer and specifies the embedding dimension and vocabulary
layer = wordEmbeddingLayer(
Dimension— Dimension of word embedding
Dimension of the word embedding, specified as a positive integer.
NumWords— Number of words in model
Number of words in the model, specified as a positive integer. If the number of
unique words in the training data is greater than
the layer maps the out-of-vocabulary words to the same vector.
WeightsInitializer— Function to initialize weights
'ones'| function handle
Function to initialize the weights, specified as one of the following:
'narrow-normal' – Initialize the weights by independently
sampling from a normal distribution with zero mean and standard deviation
'glorot' – Initialize the weights with the Glorot
initializer  (also
known as Xavier initializer). The Glorot initializer independently samples from
a uniform distribution with zero mean and variance
numIn = NumWords + 1 and
numOut = Dimension.
'he' – Initialize the weights with the He initializer
He initializer samples from a normal distribution with zero mean and variance
numIn = NumWords +
'orthogonal' – Initialize the input weights with
Q, the orthogonal matrix given by the QR decomposition of
Z = QR for a random
matrix Z sampled from a unit normal distribution. 
'zeros' – Initialize the weights with zeros.
'ones' – Initialize the weights with ones.
Function handle – Initialize the weights with a custom function. If you
specify a function handle, then the function must be of the form
weights = func(sz), where
sz is the size
of the weights.
The layer only initializes the weights when the
property is empty.
Weights— Layer weights
Layer weights, specified as a
For input integers
i less than or equal to
NumWords, the layer outputs the vector
Weights(:,i). Otherwise, the layer maps outputs the vector
WeightLearnRateFactor— Learning rate factor for weights
Learning rate factor for the weights, specified as a nonnegative scalar.
The software multiplies this factor by the global learning rate to determine the
learning rate for the weights in this layer. For example, if
WeightLearnRateFactor is 2, then the learning rate for the
weights in this layer is twice the current global learning rate. The software determines
the global learning rate based on the settings specified with the
trainingOptions (Deep Learning Toolbox) function.
NumInputs— Number of inputs
Number of inputs of the layer. This layer accepts a single input only.
InputNames— Input names
Input names of the layer. This layer accepts a single input only.
NumOutputs— Number of outputs
Number of outputs of the layer. This layer has a single output only.
OutputNames— Output names
Output names of the layer. This layer has a single output only.
Create a word embedding layer with embedding dimension 300 and 5000 words.
layer = wordEmbeddingLayer(300,5000)
layer = WordEmbeddingLayer with properties: Name: '' Hyperparameters Dimension: 300 NumWords: 5000 Learnable Parameters Weights:  Show all properties
Include a word embedding layer in an LSTM network.
inputSize = 1; embeddingDimension = 300; numWords = 5000; numHiddenUnits = 200; numClasses = 10; layers = [ sequenceInputLayer(inputSize) wordEmbeddingLayer(embeddingDimension,numWords) lstmLayer(numHiddenUnits,'OutputMode','last') fullyConnectedLayer(numClasses) softmaxLayer classificationLayer]
layers = 6x1 Layer array with layers: 1 '' Sequence Input Sequence input with 1 dimensions 2 '' Word Embedding Layer Word embedding layer with 300 dimensions and 5000 unique words 3 '' LSTM LSTM with 200 hidden units 4 '' Fully Connected 10 fully connected layer 5 '' Softmax softmax 6 '' Classification Output crossentropyex
To initialize a word embedding layer in a deep learning network with the weights from a pretrained word embedding, use the
word2vec function to extract the layer weights and set the
'Weights' name-value pair of the
wordEmbeddingLayer function. The word embedding layer expects columns of word vectors, so you must transpose the output of the
emb = fastTextWordEmbedding; words = emb.Vocabulary; dimension = emb.Dimension; numWords = numel(words); layer = wordEmbeddingLayer(dimension,numWords,... 'Weights',word2vec(emb,words)')
layer = WordEmbeddingLayer with properties: Name: '' Hyperparameters Dimension: 300 NumWords: 999994 Learnable Parameters Weights: [300×999994 single] Show all properties
To create the corresponding word encoding from the word embedding, input the word embedding vocabulary to the
wordEncoding function as a list of words.
enc = wordEncoding(words)
enc = wordEncoding with properties: NumWords: 999994 Vocabulary: [1×999994 string]
 Glorot, Xavier, and Yoshua Bengio. "Understanding the Difficulty of Training Deep Feedforward Neural Networks." In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–356. Sardinia, Italy: AISTATS, 2010.
 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." In Proceedings of the 2015 IEEE International Conference on Computer Vision, 1026–1034. Washington, DC: IEEE Computer Vision Society, 2015.
 Saxe, Andrew M., James L. McClelland, and Surya Ganguli. "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks." arXiv preprint arXiv:1312.6120 (2013).
lstmLayer (Deep Learning Toolbox) |
sequenceInputLayer (Deep Learning Toolbox) |
trainNetwork (Deep Learning Toolbox)