3.2

3.2 | 5 ratings Rate this file 20 Downloads (last 30 days) File Size: 3.89 KB File ID: #19505

wordcount

by

 

08 Apr 2008 (Updated )

This function reads from a text file and displays the most frequently used words

| Watch this File

File Information
Description

PURPOSE:
This function reads the alphanumeric words (e.g. 'Finance', 'recycle', 'M16') from a plain text document (.txt) and displays the most frequently used words in the document. For example, after processing a document containing pizza recipes, I got the following output from this function:

    'WORD' 'FREQ' 'REL. FREQ'
    'dough' [ 170] '1.1336%'
    'flour' [ 84] '0.5601%'
    'oven' [ 70] '0.4668%'
    'pizza' [ 49] '0.3268%'
    'sauce' [ 47] '0.3134%'
    'cheese' [ 39] '0.2601%'

The first column consists of the most frequently used words in this document. The second column consists of the frequency of the word (i.e. the number of times that the word appeared in the document). The last column contains the relative frequency of the word, which is simply the frequency of the word divided by the total number of words in the document. This function might be useful for statistical purposes such as analyzing the writing habits of a particular author. Please note that the words are case-sensitive, which means 'Great' and 'great' are treated as two different words.

INPUTS:
The first input, 'filename', is simply the name of the text file. The second input, 'num', is the number of words you want to have the function display. For example, if you only want to see the top 10 most frequenly used words, simply set 'num' to 10. However, please note that this function only displays the words which were used at least twice. Therefore, if the number of words used more than once is less than the value of 'num', only those words will be displayed and you will see fewer words in the output than you specified.

OUTPUT:
The output, 'results', simply shows a table that looks like the output in the pizza recipe example described above.

HOW TO USE:
Say you want to find out the most frequenly used words in a article you found on the web. Simply copy that article and paste it into Notepad. Save the text file with whatever name you want (e.g. 'article.txt'). Then navigate to the directory containing the text file in MATLAB and type:

results = wordcount('article.txt', 10)

to see the top 10 most frequently used words in article.txt.

Acknowledgements

This file inspired Allwords.

MATLAB release MATLAB 7.5 (R2007b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (11)
10 Nov 2013 aim007 ali

Error: File: wordcount.m Line: 1 Column: 60
Unexpected MATLAB expression.
this error occur when i change results = wordcount('article.txt', 10)
someone pls help which part should i change in this coding

08 Aug 2012 Lee White

WAIT WAIT WAIT!

There is a serious problem with this function. Only one word for each integer frequency will be reported. This means if 'dog' and 'cat' both appear 3 times, only one will be reported.

Check out wordcount2 here on the file exchange for a fixed version.

08 Aug 2012 Lee White  
08 Aug 2012 Lee White

The method used to read files is sensetive to the file's machine format and encoding. I tried to read in some files encoded with little endian and it failed. This ended up being due to the text file encoding needing to be ANSI. Not a problem with this function, rather a Matlab fopen problem. Anyway, this may explain some other weird behaviours experienced by users.

The program is pretty easy to follow and debug though.

10 Apr 2012 Ilvana Dzafic  
10 Apr 2012 Ilvana Dzafic

Hi, thank you for this function, it works great!
I was wondering if there is anyway to count all the words used even if they have the same frequency (e.g. if "cat" and "dog" were both written 15 times this function only mentions "cat")?

05 May 2011 Jeff ahrens

Didn't work with my text file. It failed to list quite a few words.

19 Jan 2011 Andrey Kan

there is a bug.

if the input file contains
--------- start of file ----------
9
9
9
9
9
9
9997
9997
9997
9997
9997
9997
---------- end of file

then the output is

'WORD' 'FREQ' 'REL. FREQ'
'9' [ 6] '50.0000%'

the word '9997' is not reported.

24 Jun 2010 Nawaf Ali

It's a great application, but I am trying to implement it on big text files and it runs for ever, the matlab keeps busy for a very long time, is that normal? or there is something need to update in the code?

12 Mar 2009 Taha

Just what I was looking for. Code is easy to read and implement. Great work!

09 Apr 2008 nur w  

Contact us