Version 1.2, part of Release 2018b, includes the following enhancements:

  • Japanese Language Support: Perform text analytics on Japanese language text, including tokenization, stop word removal, lemmatization, and part-of-speech tagging
  • Word Normalization: Convert words to their dictionary form using lemmatization with parts of speech and other information
  • Part-of-Speech Tagging: Identify parts of speech, such as adjectives, adverbs, nouns, and verbs
  • Deep Learning: Train deep learning networks using word embedding layers (requires Deep Learning Toolbox)
  • HTML Parsing: Extract HTML from specific parts of a web page using HTML structure and CSS classes
  • Tokenization: Detect emoticons and emoji characters
  • Sentiment Analysis Example: Learn how to analyze sentiment in text
  • Deep Learning Examples: Learn about generating text and working with out-of-memory text data (requires Deep Learning Toolbox)

See the Release Notes for details.

Version 1.1, part of Release 2018a, includes the following enhancements:

  • Multiword Phrases: Extract and count multiword phrases (n-grams) from tokenized text
  • HTML Text: Extract text content from HTML pages.
  • Deep Learning: Learn how to use deep learning LSTM networks for text classification (requires Deep Learning Toolbox)
  • Pattern Detection: Detect sentences, email addresses, and URLs in text
  • Stochastic LDA Model Training: Fit LDA models to large datasets

See the Release Notes for details.