Path: news.mathworks.com!not-for-mail
From: "jay vaughan" <jvaughan5.nospam@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Matching Character Phrases...
Date: Fri, 22 Feb 2008 05:13:02 +0000 (UTC)
Organization: harvard
Lines: 25
Message-ID: <fpllku$4oi$1@fred.mathworks.com>
References: <fpkn42$3cg$1@fred.mathworks.com>
Reply-To: "jay vaughan" <jvaughan5.nospam@gmail.com>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1203657182 4882 172.30.248.37 (22 Feb 2008 05:13:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 22 Feb 2008 05:13:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1215048
Xref: news.mathworks.com comp.soft-sys.matlab:453054



Driving home, I thought about it again and realized that my
brute force suggestion only wins when length(A)>(26^num),
which is probably not usually true!! Then you could try the
following instead.

-J


A = 'XXXXDEFGHXXXXIJKLYYYYMNOPQXXXXRSTUVWXYXXXXZYYYY';
num = 4; 

O = struct([]);
ctr = 1;
for k = 1:(length(A)-num)
   [m s]=regexp(A(k:end),A(k:(k+num-1)),'match','start');
   occurs_again = length(s)>1;
   occurred_already =...
      ~isempty(regexp(A(1:(k+num-2)),A(k:(k+num-1))));
   if and(occurs_again,~occurred_already)
      O(ctr).string = m{1};
      O(ctr).positions = s;
      O(ctr).distance = diff(s);
      ctr = ctr+1;
   end
end