text analytics toolbox and MeCab

Question

0 votes

I would like to add some words into MeCab dictionary that I suppose it is used behind Matlab textanalytics toolbox.

The tokenized procedure makes some words too short.

If you have any idea to solve my problem, it will be appricated.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Christopher Creutzig on 9 Mar 2020

Open in MATLAB Online

0 votes

Text Analytics Toolbox does not ship the tooling to compile an extended MeCab dictionary. But if you have one for your field (I know there are such compiled dictionaries for medical purposes, for example), you can use mecabOptions to have tokenizedDocument use it.

Alternatively, if you only have a handful of words you want to preserve, and are not worried about inflections, you can use "CustomTokens" to pass them to the tokenizer:

tokenizedDocument("日本睡眠学会のガイドライン")

ans =

tokenizedDocument:

5 tokens: 日本睡眠学会のガイドライン

tokenizedDocument("日本睡眠学会のガイドライン","CustomTokens","日本睡眠学会")

ans =

tokenizedDocument:

3 tokens: 日本睡眠学会のガイドライン

1 Comment
Show -1 older comments Hide -1 older comments

Shuichi Obuchi on 10 Mar 2020

Thank you for your reply. I already solved the problem by using UserModel option. Anyway I am very happy to have your information.

Sign in to comment.

text analytics toolbox and MeCab

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

text analytics toolbox and MeCab

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments