storing words

5 views (last 30 days)
NUR KHAIRUNNISA rahimi
NUR KHAIRUNNISA rahimi on 22 Nov 2011
I have made a few changes, however my program were not able to to store the processed words into a cell array, which leads to an error for execution of the next line of codes:
??? Subscript indices must either be real positive integers or logicals.
Error in ==> WordSearchtryout2>loadWordBank at 170
if strcmp(wordbank{iword}(end), 's')
Error in ==> WordSearchtryout2 at 59
loadWordBank();
I understand that this code might be super long, and you might not have any time to check any of it, but if you do, I would totally appreciate it.
function loadWordBank
clc;
[FILENAME, pathname] = uigetfile('*.wsb','Read Matlab Code File');
if isequal(FILENAME,0) || isequal(pathname,0)
fprintf('User pressed cancel');
else
fprintf('User selected %s \n', FILENAME);
end
fid = fopen(FILENAME,'rt');
if fid<0
%error could not find the file
return,
end
total_no_words=0;
lineNUM=1;
wordbank=cell(250000,1);
no_plurals=0;
while ~feof(fid)
tline = fgetl(fid);
if(isempty(tline))
%line is empty, skip it
continue;
end
if(~ischar(tline))
%if line does not contain character
fclose(fid);
break;
end
%we finally have a string
tline=strtrim(tline);
if(sum(isspace(tline)))==length(tline)
continue;
end
is_letter=isletter(tline);
is_space=isspace(tline);
checkcount=sum(is_letter+is_space);
if checkcount~=length(tline)
% invalid line in puzzle
return;
end
%tline only contain spaces and letters
if strcmp(tline,lower(tline))==1 || strcmp(tline,upper(tline))==1
total_no_words=total_no_words+1;
wordbank(total_no_words,1)={tline};
end
lineNUM=lineNUM+1;
end
wordbank2=wordbank;
for iword=1:length(wordbank)
if strcmp(wordbank{iword}(end), 's')
no_plurals=no_plurals+1;
wordbank2{iword} = wordbank{iword}(1:(end-1));
end
end
inv_words=line_NUM-length(wordbank);
% this include empty lines, lower upper case letters, as required
no_dup=length(wordbank2)-length(unique(wordbank2));
fprintf('LOAD WORD BANK \n');
fprintf('Loading word bank: none....started\n');
fprintf('Loading word bank: %s\b\b\b\n',FILENAME);
fprintf('Successfully loaded %d words from the word bank file\n',total_no_words);
fprintf('Removing invalid words..%d words were successfully removed. \n',inv_words); %not complete
fprintf('Removing duplicate words and sorting...done\n');
fprintf('Removed %d duplicate words\n',no_dup);
fprintf('Searching for and removing any plural forms of words ending in S:%%\n');
fprintf('Removed %d plural word \n',no_plurals);
fprintf('Building word indices and calculating beginning letter counts...done\n');
fprintf('Calculating word length counts...done\n');
fprintf('Final word count: %d \n');
end
  1 Comment
Walter Roberson
Walter Roberson on 22 Nov 2011
Time for the debugger. At the command prompt,
dbstop if error
then rerun. When it stops, look carefully at the values, make sure you are calling the functions you think you are, and so on.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 23 Nov 2011
Your line
for iword=1:length(wordbank)
should be
for iword=1:total_no_words
as you do not want to be trying to de-pluralize words that were never stored. length(wordbank) is going to be set by you defining it as a cell array with 25000 entries.
As an optimization, you can replace your checkcount lines that currently say
checkcount=sum(is_letter+is_space);
if checkcount~=length(tline)
with
if ~all(is_letter | is_space)
And after that your line
if strcmp(tline,lower(tline))==1 || strcmp(tline,upper(tline))==1
could be optimized to
if strcmp(tline,lower(tline)) || strcmp(tline,upper(tline))
However! This line checks that time is all in upper-case or all in lower-case and will not store any line that is in mixed-case. Your all-upper-case lines will be stored in upper-case. I suspect you do not intend either of these behaviours, and I cannot tell what you are wanting to check with that "if" statement. I suspect you should be using
tline = lower(tline);
and then storing that unconditionally.
  1 Comment
NUR KHAIRUNNISA rahimi
NUR KHAIRUNNISA rahimi on 23 Nov 2011
if strcmp(tline,lower(tline)) || strcmp(tline,upper(tline))
However! This line checks that time is all in upper-case or all in lower-case and will not store any line that is in mixed-case. Your all-upper-case lines will be stored in upper-case. I suspect you do not intend either of these behaviours, and I cannot tell what you are wanting to check with that "if" statement. I suspect you should be using
tline = lower(tline);
and then storing that unconditionally
Yes, I do intend to store the words that are either only lower case or only upper case, and not store mixed cases.

Sign in to comment.

More Answers (1)

Fangjun Jiang
Fangjun Jiang on 22 Nov 2011
I am not sure if you follow your own question. There is a much easier way to do this.
  1 Comment
NUR KHAIRUNNISA rahimi
NUR KHAIRUNNISA rahimi on 23 Nov 2011
I decided to use a cell array because it is easier for me to manipulate the cell array in a manner I would understand. However, your suggestion have helped me in understand more functions in Matlab, thank you. I will try to use it in the next part of the program.

Sign in to comment.

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!