Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Matching Character Phrases...
Date: Fri, 22 Feb 2008 15:16:34 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 41
Message-ID: <fpmp0i$fi6$1@fred.mathworks.com>
References: <fpkn42$3cg$1@fred.mathworks.com> <47bee1bc$0$294$b45e6eb0@senator-bedfellow.mit.edu>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1203693394 15942 172.30.248.37 (22 Feb 2008 15:16:34 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 22 Feb 2008 15:16:34 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1138100
Xref: news.mathworks.com comp.soft-sys.matlab:453155



Hi, thank you so much for your help, I think this solution is very close to what 
I need.

However, I do have a couple of questions: 
How can I build an array of just the 'words' that are repeated in A?
and how can I build another array that shows the distances between 
matching pairs?

This solution is 1000% quicker than my solution, so I am very interested in 
hearing how I can put it into practise!

Thanks again!
> 
> All of the current suggestions search through the string multiple 
> times. I think you should run through it once and collect information 
> along the way. If your text is always letters (no spaces or numbers), 
> you can use the words as dynamic field names to quickly "hash" the 
> various words. For example, the following code will create (1) a 
> structure of locations of each "word" and (2) a structure of distances 
> between multiple occurences of the words. However, this code could 
> become slow if you have *lots* of occurences of a particular word, 
> because it keeps "growing" arrays [in the line that uses (end+1)]. 
> Really, this problem would be much easier in a language that had more 
> flexible hashes/dictionaries and supported linked lists.
> 
> A = 'OPASKSGLBOJASLOPASNKMGLBOSDLASJSFLOPASHHASKSMLGLBO';
> num = 4;
> locationStruct = struct;
> for k=1:(numel(A)-num)
>     word = A(k:(k+num-1));
>     if isfield(locationStruct, word)
>         locationStruct.(word)(end+1) = k;
>     else
>         locationStruct.(word) = k;
>     end
> end
> distanceStruct = structfun(@diff, locationStruct, 'UniformOutput', 0);
> 
>