Remove short words from documents or bag-of-words model
Remove the words with two or fewer characters from a document.
document = tokenizedDocument("An example of a short sentence"); newDocument = removeShortWords(document,2)
newDocument = tokenizedDocument: 3 tokens: example short sentence
Remove the words with two or fewer characters from a bag-of-words model.
documents = tokenizedDocument([ ... "an example of a short sentence" "a second short sentence"]); bag = bagOfWords(documents); newBag = removeShortWords(bag,2)
newBag = bagOfWords with properties: Counts: [2x4 double] Vocabulary: ["example" "short" "sentence" "second"] NumWords: 4 NumDocuments: 2
documents— Input documents
Input documents, specified as a
bag— Input bag-of-words model
Input bag-of-words model, specified as a
len— Maximum length of words to remove
Maximum length of words to remove, specified as a positive integer. The
function removes words with
len or fewer