Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Text Analytics Toolbox Functions - By Category

Alphabetical List By Category

Text Data Preparation

extractFileTextRead text from PDF, Microsoft Word, HTML, and plain text files
extractHTMLTextExtract text from HTML
readPDFFormDataRead data from PDF forms
writeTextDocumentWrite documents to text file
eraseTagsErase HTML and XML tags from text
eraseURLsErase HTTP and HTTPS URLs from text
erasePunctuationErase punctuation from text and documents
decodeHTMLEntitiesConvert HTML and XML entities into characters
normalizeWordsReduce words to common stems using the Porter stemmer
removeLongWordsRemove long words from documents or bag-of-words model
removeShortWordsRemove short words from documents or bag-of-words model
removeWordsRemove selected words from documents or bag-of-words model
stopWordsList of stop words
splitSentencesSplit text into sentences
tokenDetailsDetails of tokens in tokenized document array
addSentenceDetailsAdd sentence numbers to documents
topLevelDomainsList of top-level domains
abbreviationsTable of common abbreviations
upperConvert documents to uppercase
lowerConvert documents to lowercase
plusAppend documents
docfunApply function to words in documents
replaceFind and replace substrings in documents
regexprepReplace text in words of documents using regular expression
addDocumentAdd documents to bag-of-words or bag-of-n-grams model
removeDocumentRemove documents from bag-of-words or bag-of-n-grams model
removeEmptyDocumentsRemove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model
removeInfrequentWordsRemove words with low counts from bag-of-words model
removeInfrequentNgramsRemove infrequently seen n-grams from bag-of-n-grams model
removeNgramsRemove n-grams from bag-of-n-grams model
topkwordsMost important words in bag-of-words model or LDA topic
topkngramsMost frequent n-grams
encodeEncode documents as matrix of word or n-gram counts
tfidfTerm Frequency–Inverse Document Frequency (tf-idf) matrix
joinCombine multiple bag-of-words or bag-of-n-grams models
contextSearch documents for word occurrences in context
doclengthLength of documents in document array
doc2cellConvert documents to cell array of string vectors
joinWordsConvert documents to string by joining words
stringConvert scalar document to string vector
tokenizedDocumentArray of tokenized documents
bagOfWordsBag-of-words model
bagOfNgramsBag-of-n-grams model

Modeling and Prediction

fitldaFit latent Dirichlet allocation (LDA) model
fitlsaFit LSA model
resumeResume fitting LDA model
logpDocument log-probabilities and goodness of fit of LDA model
predictPredict top LDA topics of documents
transformTransform documents into lower-dimensional space
fastTextWordEmbeddingPretrained fastText word embedding
readWordEmbeddingRead word embedding from file
trainWordEmbeddingTrain word embedding
writeWordEmbeddingWrite word embedding file
word2vecMap word to embedding vector
vec2wordMap embedding vector to word
ismemberTest word is member of word embedding
wordcloudCreate word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
textscatter2-D scatter plot of text
textscatter33-D scatter plot of text
bagOfWordsBag-of-words model
bagOfNgramsBag-of-n-grams model
ldaModelLatent Dirichlet allocation (LDA) model
lsaModelLatent semantic analysis (LSA) model
wordEmbeddingMap words to vectors and back

Display and Presentation

wordcloudCreate word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
textscatter2-D scatter plot of text
textscatter33-D scatter plot of text
wordCloudCountsCount words for word cloud creation
Was this topic helpful?