How to find the exact location of a word in a string?

Question

Yunfei Zhang on 13 Feb 2016

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string

Commented: Guillaume on 13 Feb 2016

I have a string that 'chemical engineering is a challenge for electrical engineer'. I used to use 'strfind' function to find the exact location of the word‘engineer'. However, there is a problem that word engineering is also included in my results. How can i just get the location of word 'engineer' instead of 'engineering'.

 list='chemical engineering is a challenge for electrical engineer';
 temp=findstr(list,'engineer')

The result is

temp =
      10    52

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 13 Feb 2016

2
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string#answer_209694

Open in MATLAB Online

This regexp call will pick up only ‘engineer’:

Str = 'chemical engineering is a challenge for electrical engineer';
idxs = regexp(Str, 'engineer\>')
idxs =
    52

6 Comments
Show 4 older commentsHide 4 older comments

Yunfei Zhang on 13 Feb 2016

Edited: Yunfei Zhang on 13 Feb 2016

Open in MATLAB Online

Sorry for confusion. Before asking this question, i simplified the question. 'Pre' is a cell matrix containing 20 documents and each document is a long string. 'word' is a cell matrix and containing 1099 words from these 20 document after removing stopwords. What I wanted to do is to construct a 20*1099 matrix to show each word's frequency in different documents and it leaded to the problem mentioned above that 'engineer' may have higher frequency than the 'engineering' for the word dictionary. However, I think the function you suggested is the correct way to find the location of each word. After finding the correct location of words like 'enginer', I can calculate the frequency of this word and indicate it at the corresponding location using code below. Guillaume provided me with a method of building the regular expression for each word and it works. However, it is based on the sacrifice of time to achieve higher accuracy and it takes much longer time when processing a large number of articles (when 'pre' contains a large number of long strings.)

if(~isempty(temp))     
        docum(i,j)=size(temp,2);  
end

Guillaume on 13 Feb 2016

Edited: Guillaume on 13 Feb 2016

Open in MATLAB Online

You can prebuild the regular expressions before the loops if you wish.

word = strcat(word, '\>')

Yunfei Zhang on 13 Feb 2016

Thank you! It helps a lot for controlling the processing time as i also want to do the feature selection and clustering for my data.

Sign in to comment.

Answer 2

Guillaume on 13 Feb 2016

2
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string#answer_209710

Edited: Guillaume on 13 Feb 2016

Open in MATLAB Online

Another option, since the words you're trying to match are always delimited by spaces or the end of the sentence (other punctuation marks are already embedded in the words), is to add a space to the end of each word and to the end of each sentences. That way 'engineer ' does not match 'engineering ' anymore:

tic
docum = zeros(numel(pre), numel(word));
word2 = strcat(word, {' '}); %strcat removes trailing ' ' if it's not in a cell array
pre2 = strcat(vertcat(pre{:}), {' '}); %why is your pre a cell array of 1x1 cell arrays?
for widx = 1:numel(word)
   docum(:, widx) = cellfun(@numel, strfind(pre2, word2{widx}));
end
toc

I'm not convinced it's going to be faster than regexp:

tic
docum = zeros(numel(pre), numel(word));
word2 = strcat(word, '\>'); 
pre2 =vertcat(pre{:}); %why is your pre a cell array of 1x1 cell arrays?
for widx = 1:numel(word)
   docum(:, widx) = cellfun(@numel, regexp(pre2, word2{widx}));
end
toc

In my testing they take both more or less the same time.

3 Comments
Show 1 older commentHide 1 older comment

Star Strider on 13 Feb 2016

@Guillaume — Thank you. I had to be away for a few minutes.

Guillaume on 13 Feb 2016

@Yunfei, what is probably having the most effect on the processing speed is that I apply the regexp or strfind to all the sentences at once. There is only one loop, looping over the individual words.

Sign in to comment.

How to find the exact location of a word in a string?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments
Show 4 older commentsHide 4 older comments

More Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

How to find the exact location of a word in a string?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments Show 4 older commentsHide 4 older comments

More Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

6 Comments
Show 4 older commentsHide 4 older comments

3 Comments
Show 1 older commentHide 1 older comment