Increasing vocabulary of pre-trained word embeddings
1 view (last 30 days)
Show older comments
MathWorks Support Team
on 3 May 2019
Edited: MathWorks Support Team
on 27 Sep 2021
Can we extend the pre-trained word embeddings and increase the vocabulary?
Accepted Answer
MathWorks Support Team
on 2 Sep 2021
Edited: MathWorks Support Team
on 27 Sep 2021
Yes. In order to add more words to the existing vocabulary given by 'fastTextWordEmbedding', you can try the following:
1. Obtain the wordEmbedding object for 'fastTextWordEmbedding'-
>> emb = fastTextWordEmbedding;
2. Obtain the vocabulary from the wordEmbedding object:
>> vocab = emb.Vocabulary;
3. Add more words to the string array, for example:
>> vocab(end+1) = 'Hi';
>> vocab(end+1) = 'Hello';
4. Write to a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format. You can use fopen, fprintf and fclose for this step:
5. Use 'readWordEmbedding' to read this text file with additional words, to get a new word embedding object. The doc page for 'readWordEmbedding' would explain more about why the file needs to be in the above format.
0 Comments
More Answers (0)
See Also
Categories
Find more on Migrate GUIDE Apps in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!