Path: news.mathworks.com!not-for-mail
From: "jay vaughan" <jvaughan5.nospam@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Matching Character Phrases...
Date: Fri, 22 Feb 2008 03:18:02 +0000 (UTC)
Organization: harvard
Lines: 76
Message-ID: <fpleta$cqu$1@fred.mathworks.com>
References: <fpkn42$3cg$1@fred.mathworks.com>
Reply-To: "jay vaughan" <jvaughan5.nospam@gmail.com>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1203650282 13150 172.30.248.37 (22 Feb 2008 03:18:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 22 Feb 2008 03:18:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1215048
Xref: news.mathworks.com comp.soft-sys.matlab:453048



Here is a brute force solution. It runs in 10 seconds or so 
on my computer witn num = 4, but it will scale badly for 
larger numbers.

A = 'OPASKSBBBBGLBOJASLOPASNKMGLBOSDLBBBBAS...
JSFLOPASHHASKSMLGLBOAAABFDFAAAB';

v = ['ABCDEFGHIJKLMNOPQRSTUVWXYZ'];
num = 4; % user specifies this
SS = PermsRep(v,num); % generates all permutations

% download subroutine PermsRep from URL below
% http://www.mathworks.com/support/solutions...
% /files/s36265/PermsRep.m

O = struct([]);
ctr = 1;
for k = 1:size(SS,1)
    [s]=regexp(A,SS(k,:),'start');
    if max(size(s))>1
        O(ctr).string = SS(k,:);
        O(ctr).positions = s;
        O(ctr).distances = diff(s);
        ctr = ctr + 1;
    end
end

Hope it helps.

J

"Jack Branning" <jbr.nospam@nospam.com> wrote in message 
<fpkn42$3cg$1@fred.mathworks.com>...
> Hi
> 
> Can anyone help me with figuring out what kind of loop 
would solve this 
> problem?
> 
> I have a variable 'text' that is a series of uppercase 
characters.  It looks 
> something like this:
> 
> OPASKSGLBOJASLOPASNKMGLBOSDLASJSFLOPASHHASKSMLGLBO...
> 
> The user enters a value, and based on this number, the 
program should look 
> for all matching phrases of that length.  For example, if 
they choose '4' the 
> loop should look through 'text' for all phrases of this 
size that occur more 
> than once.  It should also record the distance between 
the matching phrases 
> in another row of the array (or a seperate array if this 
is easier). The output 
> array for the above 'text' should end up looking 
something like this:
> 
> OPAS  [14]
> OPAS  [20]
> GLBO  [15]
> GLBO  [23]
> ...etc...
> 
> Using strmatch doesnt seem to help me for this...
> 
> I have a loop that works, but it is very time consuming 
to run.  I only really 
> need to use the first '30' results from the output array 
so it would be ideal if 
> the loop could break when the output array is of length 
30 (if it gets up to 30, 
> sometimes there will be less), otherwise it should end 
when there are no 
> more matches found.