String Reading Functions

5 views (last 30 days)
Jan Donyada
Jan Donyada on 27 Sep 2011
Commented: NOR AZIEYANI on 4 Dec 2018
I am writing a Malay language text to speech program. Its very simple; it reads the string from user input, matches it in the database, and plays the wave file in order of the user input. The database holds the .wav files of word, e.g. the word "saya" has a corresponding file saya.wav in the database. So far my code is:
function pushbutton1_Callback(hObject, eventdata, handles)
words = get(handles.editbox, 'string'); %scans user input string from editbox
wavdirectory = 'C:\Program Files\MATLAB\R2010b\Recordings\';
wordsstring = regexp(words, '\w+', 'match') ; %reads string only, ignores punctuation
[j, k] = size(wordsstring); %stores number of words in user input string
for m = 1:k
thisfid = [wavdirectory wordsstring{m} '.wav'];
try
[y, fs] = wavread(thisfid);
sound(y, fs);
catch
fprintf(1,'Failed to process file wave "%s" because: ', thisfid);
lasterror
end
end
The database holds base words as well as various prefixes and suffixes. However, a word containing prefix-base-suffix (scanned using regexp)cannot be matched in the database by its name unless it is first broken up into its prefix, base word, and suffix components. But I am at a loss as to how to do it in matlab language. As of now the program can only read out base words (without prefix or suffix). Regexp separates the user input sentence into individual words. From there, if a prefix-base-suffix word is read, the following pseudocode should kick in:
if no_match
%scan *FIRST FOUR* characters of no_match for matching prefixes in database. 4 is chosen because the prefixes only go up to 4 letters in Malay.
i=4;
if match
%cut those characters from no_match and store in word_prefix e.g bercakap => "ber"(prefix) "cakap" (remainder) and so "ber" stored in word_prefix and "cakap" stored in word_remainder
else i--; %decrement i by 1 for every mismatch
if i=1; %in Malay there are no 1-letter prefixes
stop
sound(prefix); %or perhaps throw back into try function? possible?
scan word_remainder
%if still no match in database for word_remainder, scan *LAST THREE* letters of word_remainder for suffixes. 3 is chosen because suffixes only go up to 3 letters in Malay.
j = 3;
%scan last 3 characters of word_remainder for matching suffixes in database
if match
%cut from word_remainder and store in word_suffix, the remaining word will be stored under base_word;
else j--; %decrement j by 1 for every mismatch
if j=0,
stop
Examples of input/output for the pseudocode section should look like this:
menghapuskan => "meng" (word_prefix) "hapus" (base_word) "kan" (word_suffix)
menghadiri => "meng" (word_prefix) "hadir" (base_word) "i" (word_suffix)
sound (y, fs) should play the broken up word in prefix-base-suffix order, as well as in the sentence, e.g. "Saya menghapuskan dia" should be "Saya" "meng" "hapus" "kan" "dia".
What function(s) do I use to do the scanning & separating of words with the prefix-base-suffix structure, as per my pseudocode?
Thanks in advance.
  1 Comment
NOR AZIEYANI
NOR AZIEYANI on 4 Dec 2018
Can I get this project with full code because it's quit similar with the what I.'m working out right now.

Sign in to comment.

Accepted Answer

Fangjun Jiang
Fangjun Jiang on 27 Sep 2011
I don't think there is a readily available function right for it. To separate a word, it's pretty easy, especially when the number of letters is fixed for prefix and suffix.
Word='menghapuskan'
Prefix=Word(1:4);
Suffix=Word(end-2:end);
BaseWord=Word(5:end-3);
The most useful function to scan a word is still regexp(), or strfind(), findstr(). To replace, regexprep(), strrep(). To compare, strcmp(), strcmpi(), strmatch().
type help strfun to find out more.

More Answers (1)

msp
msp on 22 Mar 2013
can i get the matching the user input with text file words. please help

Categories

Find more on Audio I/O and Waveform Generation in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!