Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: algorithms for pairwise comparison (large number of comparisons)

Subject: algorithms for pairwise comparison (large number of comparisons)

From: lee cronbach

Date: 31 Jul, 2002 20:03:41

Message: 1 of 5

Hi there:


I am looking for an efficient algorithm to perform a large number of
pairwise comparisons. Does anyone happen to know any utility
programs, libraries, or algorithms in MATLAB or in other languages
that can do this?


For example, say, if I have five strings of 128 bits. The goal is to
first do a bitwise comparison for every possible pair, followed by
counting how many bits that each pair has in common. This task can
become very intensive, as the number of strings increases (i.e., the
problem size is exponential, almost like N^2). Specifically, five
strings require 10 comparisons; 50 strings require 1225 comparisons;
and 50,000 strings require over one billion comparisons.


It will probably take forever using FOR LOOPS. Any insights or
comments to efficiently do a pairwise comparison for some 50,000 to
100,000 strings will be greatly appreciated.


Sincerely,
Chris

Subject: algorithms for pairwise comparison (large number of comparisons)

From: Stijn Helsen

Date: 1 Aug, 2002 01:56:20

Message: 2 of 5

lee cronbach wrote:
> For example, say, if I have five strings of 128 bits. The goal is to
> first do a bitwise comparison for every possible pair, followed by
> counting how many bits that each pair has in common. This task can
> become very intensive, as the number of strings increases (i.e., the
> problem size is exponential, almost like N^2). Specifically, five
> strings require 10 comparisons; 50 strings require 1225 comparisons;
> and 50,000 strings require over one billion comparisons.
(exponential is something like 2^N and not N^2, so the problem size
is not really exponential(!))
The problem can be especially what you want to do. In your example,
if the number of elements is not too high, I would do the following :
- separate the strings in separate numbers (bytes, int16's) so that
it can be handled by doubles.
- use meshgrid to make an array (!N^2 number of elements!)
- do calculations on matrices
- by using the upper triangular part of the matrix,


x=....numberdata1
X=meshgrid(x);
andX=bitand(X,X'); % or other functions
numberofbits=0;
for i=0:nbits-1
   numberofbits=numberofbits+sum(bitand(andX,2^i));
end


Of course if you have to do many many combinations, you will not have
enough memory.....


Stijn

Subject: algorithms for pairwise comparison (large number of comparisons)

From: Stijn Helsen

Date: 1 Aug, 2002 01:59:51

Message: 3 of 5

Sorry I forgot to add a line in my proposal :
> 1. x=....numberdata1
> 2. X=meshgrid(x);
> 3. andX=bitand(X,X'); % or other functions
> 4. numberofbits=0;
> 5. for i=0:nbits-1
> 6. numberofbits=numberofbits+sum(bitand(andX,2^i));
> 7. end
I forgot to extract the upper triangular part. So between lines 3
and 4 the following line can be added
andX=triu(andX,1); %extract upper part without diagonal


Stijn

Subject: algorithms for pairwise comparison (large number of comparisons)

From: Steven Lord

Date: 1 Aug, 2002 09:10:36

Message: 4 of 5


"lee cronbach" <chiuwing@msu.edu> wrote in message
news:eeb05c3.-1@WebX.raydaftYaTP...
> Hi there:
>
>
> I am looking for an efficient algorithm to perform a large number of
> pairwise comparisons. Does anyone happen to know any utility
> programs, libraries, or algorithms in MATLAB or in other languages
> that can do this?
>
> For example, say, if I have five strings of 128 bits. The goal is to
> first do a bitwise comparison for every possible pair, followed by
> counting how many bits that each pair has in common. This task can
> become very intensive, as the number of strings increases (i.e., the
> problem size is exponential, almost like N^2). Specifically, five
> strings require 10 comparisons; 50 strings require 1225 comparisons;
> and 50,000 strings require over one billion comparisons.
>
>
> It will probably take forever using FOR LOOPS. Any insights or
> comments to efficiently do a pairwise comparison for some 50,000 to
> 100,000 strings will be greatly appreciated.

Hmm ... if you don't need to know _where_ the common bits are, you could
always try this:

A=[1 1 1 1;1 0 0 1;0 1 1 1;0 0 0 0]
NumOfMatches1=A*A';
NumOfMatches0=(1-A)*(1-A)';

If NumOfMatches1+NumOfMatches0=4 (or rather LENGTH_OF_STRING) then the
strings are identical.

I'll leave the n=50,000 case to you ;)

--
Steve Lord
slord@mathworks.com


Subject: algorithms for pairwise comparison (large number of comparisons)

From: steve.moffitt@mail.stuart.iit.edu (Steven Moffit)

Date: 21 Sep, 2004 15:39:16

Message: 5 of 5

I'd suggest the following algorithm:

(1) Create an array having 2^16 entries mapping each 16 bit word into
    the number of bits (for speed),
(2) Apply this array to sixteen bit segments of each 128 bit value
    obtained by doing a "bitwise and" on the two operands,
(3) Do (1) & (2) this for a random sample of operands; you can get
    arbitrarily close to the true population value by selecting a
    sufficiently large sample size.

Your message is old, so you probably already solved the problem.

Another idea - add the number of entries in each position over the
entire population. Then calculate the number of entries in every pair
or positions. Test the result for "independence."

On Wed, 31 Jul 2002 20:03:41 -0400, lee cronbach wrote:
>Hi there:
>
>
>I am looking for an efficient algorithm to perform a large number of
>pairwise comparisons. Does anyone happen to know any utility
>programs, libraries, or algorithms in MATLAB or in other languages
>that can do this?
>
>
>For example, say, if I have five strings of 128 bits. The goal is to
>first do a bitwise comparison for every possible pair, followed by
>counting how many bits that each pair has in common. This task can
>become very intensive, as the number of strings increases (i.e., the
>problem size is exponential, almost like N^2). Specifically, five
>strings require 10 comparisons; 50 strings require 1225 comparisons;
>and 50,000 strings require over one billion comparisons.
>
>
>It will probably take forever using FOR LOOPS. Any insights or
>comments to efficiently do a pairwise comparison for some 50,000 to
>100,000 strings will be greatly appreciated.
>
>
>Sincerely,
>Chris

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics