To find clusters and extract features from high-dimensional text datasets, you can use machine learning techniques and models such as LSA, LDA, and word embeddings. You can combine features created with Text Analytics Toolbox™ with features from other data sources. With these features, you can build machine learning models that take advantage of textual, numeric, and other types of data.
|Fit latent Dirichlet allocation (LDA) model|
|Fit LSA model|
|Resume fitting LDA model|
|Document log-probabilities and goodness of fit of LDA model|
|Predict top LDA topics of documents|
|Transform documents into lower-dimensional space|
|Latent Dirichlet allocation (LDA) model|
|Latent semantic analysis (LSA) model|
|Pretrained fastText word embedding|
|Word encoding model to map words to indices and back|
|Convert documents to sequences for deep learning|
|Word embedding layer for deep learning networks|
|Map word to embedding vector|
|Map word to encoding index|
|Map embedding vector to word|
|Map encoding index to word|
|Test if word is member of word embedding or encoding|
|Read word embedding from file|
|Train word embedding|
|Write word embedding file|
|Word embedding model to map words to vectors and back|
|Add documents to bag-of-words or bag-of-n-grams model|
|Remove documents from bag-of-words or bag-of-n-grams model|
|Remove words with low counts from bag-of-words model|
|Remove infrequently seen n-grams from bag-of-n-grams model|
|Remove selected words from documents or bag-of-words model|
|Remove n-grams from bag-of-n-grams model|
|Remove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model|
|Most important words in bag-of-words model or LDA topic|
|Most frequent n-grams|
|Encode documents as matrix of word or n-gram counts|
|Term Frequency–Inverse Document Frequency (tf-idf) matrix|
|Combine multiple bag-of-words or bag-of-n-grams models|
This example shows how to train a simple text classifier on word frequency counts using a bag-of-words model.
This example shows how to train a classifier for sentiment analysis using an annotated list of positive and negative sentiment words and a pretrained word embedding.
This example shows how to classify text descriptions of weather reports using a deep learning long short-term memory (LSTM) network.
This example shows how to classify out-of-memory text data with a deep learning network using a custom mini-batch datastore.
This example shows how to analyze text using n-gram frequency counts.
This example shows how to use the Latent Dirichlet Allocation (LDA) topic model to analyze text data.
This example shows how to decide on a suitable number of topics for a latent Dirichlet allocation (LDA) model.
This example shows how to compare latent Dirichlet allocation (LDA) solvers by comparing the goodness of fit and the time taken to fit the model.
Generate Text Using Deep Learning (Deep Learning Toolbox)
This example shows how to train a deep learning long short-term memory (LSTM) network to generate text.
This example shows how to train a deep learning LSTM network to generate text using character embeddings.
This example shows how to train a deep learning LSTM network to generate text word-by-word.