This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.


Remove long words from documents or bag-of-words model


newDocuments = removeLongWords(documents,len)
newBag = removeLongWords(bag,len)



newDocuments = removeLongWords(documents,len) removes words of length len or greater from documents.


newBag = removeLongWords(bag,len) removes words of length len or greater from the bagOfWords object bag.


collapse all

Remove the words with seven or greater characters from a document.

document = tokenizedDocument("An example of a short sentence");
newDocument = removeLongWords(document,7)
newDocument = 

   4 tokens: An of a short

Remove the words with seven or greater characters from a bag-of-words model.

documents = tokenizedDocument([ ...
    "an example of a short sentence"
    "a second short sentence"]);
bag = bagOfWords(documents);
newBag = removeLongWords(bag,7)
newBag = 
  bagOfWords with properties:

          Counts: [2x5 double]
      Vocabulary: ["an"    "of"    "a"    "short"    "second"]
        NumWords: 5
    NumDocuments: 2

Input Arguments

collapse all

Input documents, specified as a tokenizedDocument array.

Input bag-of-words model, specified as a bagOfWords object.

Minimum length of words to remove, specified as a positive integer. The function removes words with len or greater characters.

Output Arguments

collapse all

Output documents, returned as a tokenizedDocument array.

Output bag-of-words model, returned as a bagOfWords object.

Introduced in R2017b