Counting syllables in text from a txt file.

Question

D8RGL on 23 Apr 2017

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/336796-counting-syllables-in-text-from-a-txt-file

Answered: suraj s on 9 Jan 2020

Hello, I'm trying to create a script which will have the ability to count syllables in a .txt file, I am able to count the occurrences of vowels however to count syllables I need to somehow count occurrences of [A,I,O,U,E] but only count it as one syllable if it occurs more than once in a row, I also need to be able to disregard an 'E' as a syllable if it occurs at the end of a word.

3 Comments
Show 1 older commentHide 1 older comment

Walter Roberson on 27 Apr 2017

We volunteers get pretty disappointed when people remove their question. We are not free private consultants! The "cost" we charge for our advice is that the question and answers stay public so that everyone can learn from them.

Rena Berman on 28 Apr 2017

(Answers Dev) Restored edit

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 24 Apr 2017

2
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/336796-counting-syllables-in-text-from-a-txt-file#answer_264191

Counting syllables in English takes a lot of knowledge of the language. In some words, the number of syllables depends upon how the word is being used. For example, "unionized" might be union-ized (2 syllables) or it might be un-ion-ized (3 syllables.) The number of syllables in a word can depend upon which part of speech it is acting in.

In English, the location of syllable breaks depends upon whether a syllable is stressed or not. It also depends upon whether vowels are long or not (which can determine whether a consonant run is split into pieces or not.) These two factors are influenced by the suffixes -- adding a suffix to a word can shift how the syllables are to be broken up in earlier parts of the word, which in turn can change how many syllables there are.

If you analyze mechanically looking at characters, then you need to be able to deal with "ghoti" being one syllable.

5 Comments
Show 3 older commentsHide 3 older comments

D8RGL on 26 Apr 2017

Hi Walter, I don't suppose you know if regexp would be capable of finding 3 instances of a "syllable" in a word and counting every occurrence of this?

or would best practice for this be using a for loop?

Thanks.

Walter Roberson on 26 Apr 2017

"you know if regexp would be capable of finding 3 instances of a "syllable" in a word"

No, I am absolutely certain that it cannot do that.

http://www.tug.org/docs/liang/liang-thesis-hires.pdf

"The resulting hyphenation algorithm uses about 4500 patterns [...]"

Now if you instead wanted to do the completely different task of counting groups of vowels, then:

regexp(S, '\<[bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ]*([AEIOUaeiou]+[bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ]+){2}[AEIOUaeiou]+[bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ]*\>')

This is not at all the same as the number of syllables. Remember, in English, it is possible for there to be adjacent syllables that contain only vowels, provided that those vowels are long or stressed. (And this pattern once more neglects poor "y" ...)

Sign in to comment.

Answer 2

John D'Errico on 25 Apr 2017

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/336796-counting-syllables-in-text-from-a-txt-file#answer_264439

Edited: John D'Errico on 27 Apr 2017

The English language is a aggregate mess, compiled from words taken from many languages. So you can virtually never be 100% correct in such a task. Accept that as fact, and then just aim for the lowest failure rate that you can achieve.

If I HAD to try to solve such task, I'd use a dictionary approach. That is...

1. If possible, I'd find an online dictionary, that included syllabification. Even if the dictionary was limited in size, it would be a great starting point. Otherwise, you need to build it yourself.

2. Next, write a simple rule based code, the goal of which is to be as good as possible, but I'd not invest a huge amount of time there. My target might be to have an initial success rate as high as possible. So you want to pick off the low hanging fruit first.

3. Test the algorithm on your dictionary, looking for errors. Where possible, add new rules if you can see an obvious rule that you might have missed.

4. Next, test the tool on blocks of test pasted from any online sources you can find. Books, articles, etc. Skip over words that already exist in the dictionary. Those that are missing from the dictionary, apply your algorithm. Now, check each word so identified. You will need to rely on either your own knowledge, or if your own language skills are limited, on a large dictionary resource like the OED. Add each word to your internal MATLAB dictionary, building/extending it one word at a time.

5. For the words that are syllabically ambiguous, like unionized, now you need to go back and use grammatical rules to identify the correct count for that word.

The quality of your result will depend on how much effort you are willing to invest, regardless of the approach you follow. Perfection will take a great deal of effort.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Sergey Kasyanov on 24 Apr 2017

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/336796-counting-syllables-in-text-from-a-txt-file#answer_264187

If i understand you right, try this.

vowels={'A','I','O','U','E'};
hF=fopen('filename.txt');
%sum contains count of vowels in text
sum=0;
%repeat while end of file is not reached
while ~feof(hF)
%read and derive row to uppercase
  str=upper(fgetl(hF));
%find each vowels in row and add 1 to sum (count only first vowels in row)
  for i=1:length(vowels)
    b=findstr(str,vowels{i});
    if ~isempty(b)
        sum=sum+1;
    end
  end
%check for letter 'E' in end of word
  cE=findstr(str,'E');
    dcE=0;
    for j=1:length(cE)
        if cE(j)~=length(str)
          if isletter(str(cE(j)+1))
            continue
          end
        end
        dcE=dcE+1;
    end
%correct count of vowels with 'E' in end of word
  sum=sum-dcE;
end
fclose(hF);

4 Comments
Show 2 older commentsHide 2 older comments

Sergey Kasyanov on 25 Apr 2017

Thanks for notation about variable sum. Hmmm... I'm check this code on test file. Can you attach your text in question?

D8RGL on 25 Apr 2017

testtext.txt

Have attached, Thank you. Was also look at counting 3 instances of a syllable in one word to then increment one polysyllable.

Sign in to comment.

Answer 4

suraj s on 9 Jan 2020

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/336796-counting-syllables-in-text-from-a-txt-file#answer_409351

Hello all, I am working on the project on matlab based on lip gesture recognition caption generator Plz do help me with codes and the functions that is used for this

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Counting syllables in text from a txt file.

3 Comments
Show 1 older commentHide 1 older comment

Answers (4)

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Counting syllables in text from a txt file.

3 Comments Show 1 older commentHide 1 older comment

Answers (4)

5 Comments Show 3 older commentsHide 3 older comments

0 Comments Show -2 older commentsHide -2 older comments

4 Comments Show 2 older commentsHide 2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

0 Comments
Show -2 older commentsHide -2 older comments