Path: news.mathworks.com!newsfeed-00.mathworks.com!panix!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!dreaderd!not-for-mail
From: Arthur G <gorramfreak+news@gmail.com>
Newsgroups: comp.soft-sys.matlab
Date: Fri, 22 Feb 2008 12:12:54 -0500
Message-ID: <47bf0296$0$294$b45e6eb0@senator-bedfellow.mit.edu>
References: <fpkn42$3cg$1@fred.mathworks.com> <47bee1bc$0$294$b45e6eb0@senator-bedfellow.mit.edu> <fpmp0i$fi6$1@fred.mathworks.com> <47beebf4$0$287$b45e6eb0@senator-bedfellow.mit.edu> <fpmsh4$nmi$1@fred.mathworks.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: Matching Character Phrases...
User-Agent: Unison/1.8
Lines: 43
NNTP-Posting-Host: 18.56.7.94
X-Trace: 1203700374 senator-bedfellow.mit.edu 294 18.56.7.94
Xref: news.mathworks.com comp.soft-sys.matlab:453192



On 2008-02-22 11:16:36 -0500, "Jack Branning" <jbr.nospam@nospam.com> said:

>> Once you have locationStruct and distanceStruct, there are lots of ways to
>> create the arrays. What's most efficient depends on the number of "single"
>> words, but here's what I think is a relatively robust solution:
>> 
>> numWords = sum( ~structfun(@numel, locationStruct) );
>> wordList = cell(numWords, 1);
>> distanceList = zeros(numWords, 1);
>> count = 0;
>> fn = fieldnames(distanceStruct);
>> for i=1:numel(fn)
>> word = fn{i};
>> for distance=distanceStruct.(word)
>> count = count + 1;
>> wordList{count} = word;
>> distanceList(count) = distance;
>> end
>> end
>> 
> 
> Thanks again!!
> 
> I have tested this code, and it seems 'wordlist' contains every possible word.
> Is there a way to just show the words that match in an array?
> 
> Also, distanceList seems to be full of the mostly same number... For the
> ciphertext I tested it on, it shows as mostly '480'. Is there a way of 
> this array
> showing the distances between matches? (i.e. 45, 95, 105, 25 etc... (or
> whatever the values may be)).
> 
> We are so nearly there though, thanks so much for all the effort!!

The code works as expected on all text I've tested...

Here's what it does: step through every field in distanceStruct and 
record every nonzero distance and the associated word. For efficiency, 
I first count the number of items and "pre-allocate" the cell and array.

You could also check locationStruct and see if the same words really 
are occurring 480 characters apart...