text preprocessing functions in MATLAB 2013a

4 views (last 30 days)
I want to know about the text preprocessing of text dataset in MATLAB 2012b or 2013a i.e. tokenization, stopwords removal functions and feature selection functions (tfidf, df, TS, MI etc)... I have searched it out in MATLAB 2012a.... But not found any text mining toolbox available for it.... can you plz tell the details about the availability of text mining toolbox having all the functionality built in it???

Answers (1)

Guru
Guru on 3 Jul 2013
Umm, MATLAB is built around the idea that all of your basic functionality is provided and you can use this to finetune it into any task you want to focus onto. Quite simply there is no text mining toolbox in MATLAB because all of the functionality that you need for doing these things are in base MATLAB itself.
You can try to look up regexp, strfind, strrep, and other functions that deal with handling strings in MATLAB. These will provide you the means to easily program as many and as specific functions that you want to accomplish all of these text mining tasks and more.
In short, MATLAB is the tool that you can use to design the tools for text mining. You might also try out the File Exchange to see if someone has done that for you already.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!