Text Analytics Toolbox™ includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. Use these tools to extract text from popular file formats, preprocess raw text, extract individual words, convert text into numerical representations, and build statistical models.
Text Analytics Toolbox supports English language. Most Text Analytics Toolbox functions will work with text from other languages. For more information, see Language Support.
|Erase HTML and XML tags from text|
|Erase HTTP and HTTPS URLs from text|
|Erase punctuation from text and documents|
|Convert HTML and XML entities into characters|
|Reduce words to common stems using the Porter stemmer|
|Remove long words from documents or bag-of-words model|
|Remove short words from documents or bag-of-words model|
|Remove selected words from documents or bag-of-words model|
|List of stop words|
|Split text into sentences|
|Details of tokens in tokenized document array|
|Add sentence numbers to documents|
|List of top-level domains|
|Table of common abbreviations|
|Convert documents to uppercase|
|Convert documents to lowercase|
|Apply function to words in documents|
|Find and replace substrings in documents|
|Replace text in words of documents using regular expression|
|Add documents to bag-of-words or bag-of-n-grams model|
|Remove documents from bag-of-words or bag-of-n-grams model|
|Remove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model|
|Remove words with low counts from bag-of-words model|
|Remove infrequently seen n-grams from bag-of-n-grams model|
|Remove n-grams from bag-of-n-grams model|
|Most important words in bag-of-words model or LDA topic|
|Most frequent n-grams|
|Encode documents as matrix of word or n-gram counts|
|Term Frequency–Inverse Document Frequency (tf-idf) matrix|
|Combine multiple bag-of-words or bag-of-n-grams models|
This example shows how to extract the text data from text, HTML, Microsoft® Word, PDF, CSV, and Microsoft Excel® files and import it into MATLAB for analysis.
This example shows how to create a function which cleans and preprocesses text data for analysis.
This example shows how to train a simple text classifier on word frequency counts using a bag-of-words model.