# Adding up words in matrices on Matlab

9 views (last 30 days)
Blaise on 11 Apr 2013
Example
hello
My name is Kevin
Hello my name is Susan
u1=[1]
u2[0,1,1,1,1]
u3=[1,1,1,1,0,1]
So u1 has a matrix with 1 as the word hello is in fact in the first sentence. Then u2 has[0,1,1,1,1] as 'hello' is not in the second sentence but 'my' 'name' 'is' and 'kevin' are.
And the same goes for u3, it contains the boolean value for 'hello' 'my' 'name' 'is' 'Kevin' 'Susan' respectively, with 'Kevin' being 0 as it's not in this final sentence.
As there are 7 different words in my example, the last matrix should have 7 indices.
.
How would I go in implementing such an algorithm on Matlab?
The sentences are in a file which I have to read onto Matlab. I'm able to read the sentences and put them in matrices,
while~feof(file) eachLine=fgetl(file) if isempty(eachLine)||strncmp(eachLine, '%',1)||~ischar(eachLine) ...
matrix=regexp(eachLine, ' ', 'split')

Babak on 11 Apr 2013
b = {'Hello' 'my' 'name' 'is' 'kevin' 'Susan'};
a = strsplit('kevin, kevin my baby I am telling you Hello Hello my name is Susan not Susana');
% a is the string you would like to test if b's keywords exits in or not.
u = zeros(size(b));
for j = 1: length(b)
counter = 0;
for k = 1:length(a)
if isequal(b{j},a{k})
counter = counter +1;
end
end
u(j) = counter;
end
u
Babak on 12 Apr 2013
this is how you can create the cell variable c that includes all the elements of both a and b
b={'hello' 'my' 'name' 'is' 'kevin'};
a={'and' 'my' 'name' 'is' 'susan'};
c = [a b]

Matt Kindig on 12 Apr 2013
Edited: Matt Kindig on 12 Apr 2013
Another approach might be to use ismember(). For example:
dictionary = {'hello', 'my', 'name', 'is', 'kevin', 'susan'}; %words to match
Results = false(nLines, length(dictionary));
count = 1;
fid = fopen('your_file.txt');
while ~feof(fid)
Line = strtrim(fgetl(fid)); %get line
words = lower(regexp(Line, '\s+', 'split')); %split into (lowercase) words
Results(count,:) = ismember( dictionary, words); %determine if present
end
%for each line k, Results(k,m) will indicate if the word at dictionary{m} is present.
Blaise on 13 Apr 2013
EDIT: I've found a solution I tried your code, but there's an error, nLines hasn't been declared.
I've sort of done it, with the example I used above without reading form a file, using ismember
for(i=1:length(a))
for(j=1:length(a))
ismember(a,a)
end
c=[a,b]
for(i=1:length(c))
for(j=1:length(b))
ismember(c,b)
end
end
However, with this code, if a word is seen more than once, it outputs 1 both all entries it's found in. I want it to ignore the second instance and put zero in it instead of 1. How can I go about doing this?
And I'm trying to do it from reading a file now, but I'm having difficulty with it. I want to read the first line and compare it with itself, then the first AND second and compare with the second, and then read the first, second AND third line and compare it with the third etc.