Extraction of text from document

6 views (last 30 days)
Shivaprasad KM
Shivaprasad KM on 17 Jan 2016
Answered: Image Analyst on 17 Jan 2016
i want to extract the keywords from the document in order to find the term frequency so can u help me by providing the code

Answers (2)

Walter Roberson
Walter Roberson on 17 Jan 2016
There is no universally defined set of keywords. You will need to define more clearly what needs to be extracted from the document, and you will need to describe what "document" means to you.

Image Analyst
Image Analyst on 17 Jan 2016
Have you done OCR yet, so that you have a list of strings in a cell array? If so, I think you could construct a histogram of word frequency using uinque() and ismember(). I don't have a demo, try it yourself.
You might also like allwords to split a big long string of many, many words up into a cell array of individual words, which you can then use with unique() and ismember().

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!