Adding up words in matrices on Matlab

Question

0 votes

Example

hello

My name is Kevin

Hello my name is Susan

u1=[1]

u2[0,1,1,1,1]

u3=[1,1,1,1,0,1]

So u1 has a matrix with 1 as the word hello is in fact in the first sentence. Then u2 has[0,1,1,1,1] as 'hello' is not in the second sentence but 'my' 'name' 'is' and 'kevin' are.

And the same goes for u3, it contains the boolean value for 'hello' 'my' 'name' 'is' 'Kevin' 'Susan' respectively, with 'Kevin' being 0 as it's not in this final sentence.

As there are 7 different words in my example, the last matrix should have 7 indices.

.

How would I go in implementing such an algorithm on Matlab?

The sentences are in a file which I have to read onto Matlab. I'm able to read the sentences and put them in matrices,

while~feof(file) eachLine=fgetl(file) if isempty(eachLine)||strncmp(eachLine, '%',1)||~ischar(eachLine) ...

matrix=regexp(eachLine, ' ', 'split')

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Babak on 11 Apr 2013

Open in MATLAB Online

0 votes

 b = {'Hello' 'my' 'name' 'is' 'kevin' 'Susan'};
 a = strsplit('kevin, kevin my baby I am telling you Hello Hello my name is Susan  not Susana');
 % a is the string you would like to test if b's keywords exits in or not.
 u = zeros(size(b));
 for j = 1: length(b)
 counter = 0;
 for k = 1:length(a)
 if isequal(b{j},a{k})
 counter = counter +1;
 end
 end
 u(j) = counter;
 end
 u

4 Comments
Show 2 older comments Hide 2 older comments

Blaise on 12 Apr 2013

Open in MATLAB Online

My aim is to calculate what words are in sentences, I'm doing a dialogue act tagging report, and have to use matlab for it, so I'm going to add up all these matrices at the end to work out the mean.

My first sentence is b, so I search the first word in the array b and compare it to all other words in the array, I do the same with 'name' etc and get a matrix with all 1's.

Then I search for the word 'hello' in a. It's not in a, so I assign 0 to 'hello', is the word 'my' in a? Yes, it is, so I assign a 1 to 'my'. Is 'name' in a? Yes, so assign 1 to it, is 'is' in a? Yes, assign 1 to, is 'Kevin' in a? No, so assign it 0, and so on. If a word repeats, you don't add it up

    for(i=1:length(a))
    for(j=1:length(a))
      if isequal(a{i},a{j})
      a{i}=1
  end

That's the first part of my implementation, I'm having difficulty adding up both matrices and then comparing the new matrix with just b

if true
c=[a,b]
for(i=1:length(c))
for(j=1:length(a))
if isequal(c{i}, b{j})
c{i}=1
end

However, this doesn't work, as it outputs a matrix with all the words, even the ones that come up in both sentences, and I can't seem to assign indices to 0, when the if statement fails, if I put else b{i}=0, I get a wrong answer also.

Babak on 12 Apr 2013

Open in MATLAB Online

this is how you can create the cell variable c that includes all the elements of both a and b

b={'hello' 'my' 'name' 'is' 'kevin'};
a={'and' 'my' 'name' 'is' 'susan'};
c = [a b]

Sign in to comment.

Answer 2

Matt Kindig on 12 Apr 2013

Edited: Matt Kindig on 12 Apr 2013

Open in MATLAB Online

0 votes

Another approach might be to use ismember(). For example:

dictionary = {'hello', 'my', 'name', 'is', 'kevin', 'susan'};  %words to match
Results = false(nLines, length(dictionary));
count = 1;
fid = fopen('your_file.txt');
while ~feof(fid)
   Line = strtrim(fgetl(fid));  %get line
   words = lower(regexp(Line, '\s+', 'split'));  %split into (lowercase) words
   Results(count,:) = ismember( dictionary, words);  %determine if present
end 
 %for each line k, Results(k,m) will indicate if the word at dictionary{m} is present.

1 Comment
Show -1 older comments Hide -1 older comments

Blaise on 13 Apr 2013

Edited: Blaise on 16 Apr 2013

Open in MATLAB Online

EDIT: I've found a solution I tried your code, but there's an error, nLines hasn't been declared.

I've sort of done it, with the example I used above without reading form a file, using ismember

    for(i=1:length(a))
    for(j=1:length(a))
      ismember(a,a)
      end
      c=[a,b]
      for(i=1:length(c))
    for(j=1:length(b))
      ismember(c,b)
  end
end

However, with this code, if a word is seen more than once, it outputs 1 both all entries it's found in. I want it to ignore the second instance and put zero in it instead of 1. How can I go about doing this?

And I'm trying to do it from reading a file now, but I'm having difficulty with it. I want to read the first line and compare it with itself, then the first AND second and compare with the second, and then read the first, second AND third line and compare it with the third etc.

Sign in to comment.

Adding up words in matrices on Matlab

0 Comments
Show -2 older comments Hide -2 older comments

Answers (2)

4 Comments
Show 2 older comments Hide 2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

Adding up words in matrices on Matlab

0 Comments Show -2 older comments Hide -2 older comments

Answers (2)

4 Comments Show 2 older comments Hide 2 older comments

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments

1 Comment
Show -1 older comments Hide -1 older comments