Thread Subject:
Comparing two series of data

Subject: Comparing two series of data

From: TraderMan

Date: 25 Nov, 2011 14:50:09

Message: 1 of 20

Hello everybody.

I want to compare and match two long series of data using Matlab.
To give a short example:
Column A Column B
25 1
25 130
50 85
27 15
65 11
130 75
5 7

What I would like to do is to match the values of the column A with the values of column B. In this case, it is easy and it is possible to do it mentally:

25 + 50 from Column A MATCHED WITH 75 from Column B
25 + 65 from Column A MATCHED WITH 85 from Column B
130 from Column A MATCHED WITH 130 from Column B
27 from Column A MATCHED WITH 15+11+1 from Column B
5 from Column A and 7 from Column B will be left unmatched.


Since the lists are going to be long, is there a way to do it automatically with a function/script/etc...?

I will appreciate any kind of help.

Thanks!

Subject: Comparing two series of data

From: Rune Allnor

Date: 25 Nov, 2011 15:13:36

Message: 2 of 20

On 25 Nov, 15:50, "TraderMan " <gius.felice...@gmail.com> wrote:
> Hello everybody.
>
> I want to compare and match two long series of data using Matlab.
> To give a short example:
> Column A                    Column B
> 25                               1
> 25                               130
> 50                                85
> 27                               15
> 65                               11
> 130                             75
> 5                                 7
>
> What I would like to do is to match the values of the column A with the values of column B. In this case, it is easy and it is possible to do it mentally:
>
> 25 + 50 from Column A  MATCHED WITH  75 from Column B
> 25 + 65 from Column A  MATCHED WITH  85 from Column B
> 130 from Column A MATCHED WITH 130 from Column B
> 27 from Column A MATCHED WITH 15+11+1 from Column B
> 5 from Column A and 7 from Column B will be left unmatched.
>
> Since the lists are going to be long, is there a way to do it automatically with a function/script/etc...?
>
> I will appreciate any kind of help.

This isn't comparisions, this is numerology.
Try a news group on astrology.

Rune

Subject: Comparing two series of data

From: ade77

Date: 25 Nov, 2011 15:25:08

Message: 3 of 20

> This isn't comparisions, this is numerology.
> Try a news group on astrology.
>
> Rune


very funny

Subject: Comparing two series of data

From: TraderMan

Date: 25 Nov, 2011 15:31:08

Message: 4 of 20

It is just how to compare all the possible combinations of one column with all the possible combinations of the other column in an efficient way.

It is not astrology.......

Rune Allnor <allnor@tele.ntnu.no> wrote in message <cfdd8742-ce34-48ff-ba7e-b600c1af820e@y12g2000vba.googlegroups.com>...
> On 25 Nov, 15:50, "TraderMan " <gius.felice...@gmail.com> wrote:
> > Hello everybody.
> >
> > I want to compare and match two long series of data using Matlab.
> > To give a short example:
> > Column A                    Column B
> > 25                               1
> > 25                               130
> > 50                                85
> > 27                               15
> > 65                               11
> > 130                             75
> > 5                                 7
> >
> > What I would like to do is to match the values of the column A with the values of column B. In this case, it is easy and it is possible to do it mentally:
> >
> > 25 + 50 from Column A  MATCHED WITH  75 from Column B
> > 25 + 65 from Column A  MATCHED WITH  85 from Column B
> > 130 from Column A MATCHED WITH 130 from Column B
> > 27 from Column A MATCHED WITH 15+11+1 from Column B
> > 5 from Column A and 7 from Column B will be left unmatched.
> >
> > Since the lists are going to be long, is there a way to do it automatically with a function/script/etc...?
> >
> > I will appreciate any kind of help.
>
> This isn't comparisions, this is numerology.
> Try a news group on astrology.
>
> Rune

Subject: Comparing two series of data

From: Rune Allnor

Date: 25 Nov, 2011 16:06:49

Message: 5 of 20

On 25 Nov, 16:31, "TraderMan " <gius.felice...@gmail.com> wrote:
> It is just how to compare all the possible combinations of one column with all the possible combinations of the other column in an efficient way.

Well, let's use some combinatorics:

N numbers can be ordered in N! ways. You want to compare
two N-length sequences, which might give as many as (N!)^2
combinations to check, depending on the details of the task.
Even worse, you want to compare subsequences of arbitrary
length, which might give something like (haven't studied
the exact details) (N!)! possible combinations.

In the the worst-case scenario you have something like
((N!)!)^2 combinations to check.

> It is not astrology.......

It might not be. But astrology is probaly your
best bet anyway.

Rune

Subject: Comparing two series of data

From: TraderMan

Date: 25 Nov, 2011 16:37:08

Message: 6 of 20

I had already understood you are not capable of writing something useful to solve the problem..... so please leave the space for the guys who can.
Thanks


Rune Allnor <allnor@tele.ntnu.no> wrote in message <e7f41ba5-de8f-41fa-8d40-005cc6a9d7f7@p16g2000yqd.googlegroups.com>...
> On 25 Nov, 16:31, "TraderMan " <gius.felice...@gmail.com> wrote:
> > It is just how to compare all the possible combinations of one column with all the possible combinations of the other column in an efficient way.
>
> Well, let's use some combinatorics:
>
> N numbers can be ordered in N! ways. You want to compare
> two N-length sequences, which might give as many as (N!)^2
> combinations to check, depending on the details of the task.
> Even worse, you want to compare subsequences of arbitrary
> length, which might give something like (haven't studied
> the exact details) (N!)! possible combinations.
>
> In the the worst-case scenario you have something like
> ((N!)!)^2 combinations to check.
>
> > It is not astrology.......
>
> It might not be. But astrology is probaly your
> best bet anyway.
>
> Rune

Subject: Comparing two series of data

From: Rune Allnor

Date: 25 Nov, 2011 17:03:39

Message: 7 of 20

On 25 Nov, 17:37, "TraderMan " <gius.felice...@gmail.com> wrote:
> I had already understood you are not capable of writing something useful to solve the problem..... so please leave the space for the guys who can.

...which proves that your poster name indicates correctly;
you are not an engineer. I have just explained why _no_one_
can write a program that does what you want.

Rune

Subject: Comparing two series of data

From: TraderMan

Date: 25 Nov, 2011 17:21:08

Message: 8 of 20

Please stop posting useless stuff....
You have the answer you post, you don't have it you go to write to another post....not mine!

Rune Allnor <allnor@tele.ntnu.no> wrote in message <48d2039b-b2c6-4a52-b046-f7a114498b9b@cu3g2000vbb.googlegroups.com>...
> On 25 Nov, 17:37, "TraderMan " <gius.felice...@gmail.com> wrote:
> > I had already understood you are not capable of writing something useful to solve the problem..... so please leave the space for the guys who can.
>
> ...which proves that your poster name indicates correctly;
> you are not an engineer. I have just explained why _no_one_
> can write a program that does what you want.
>
> Rune

Subject: Comparing two series of data

From: Roger Stafford

Date: 25 Nov, 2011 19:56:08

Message: 9 of 20

"TraderMan" wrote in message <jao9v1$fdv$1@newscl01ah.mathworks.com>...
> I want to compare and match two long series of data using Matlab.
> ........
> It is just how to compare all the possible combinations of one column with all the possible combinations of the other column in an efficient way.
> ........
- - - - - - - - -
  It is important that you realize the difficulties involved with the problem you pose if the two series are "long", as you have said. The number of different sum combinations to try in a series of length n is 2^n-1 if we exclude the empty set, and for large n this can be a truly enormous quantity. If n is only as large as 333 that puts you over the infamous googol of Kasner and Newman which is larger than the number of atoms in the known universe. With two series of integers that long it makes for an impossibly long list of comparisons to make. Perhaps you had better consider placing some reasonable limit in the series' lengths before you ask people to work on your problem.

Roger Stafford

Subject: Comparing two series of data

From: TraderMan

Date: 25 Nov, 2011 20:36:08

Message: 10 of 20

"Roger Stafford" wrote in message <jaorso$63d$1@newscl01ah.mathworks.com>...
> "TraderMan" wrote in message <jao9v1$fdv$1@newscl01ah.mathworks.com>...
> > I want to compare and match two long series of data using Matlab.
> > ........
> > It is just how to compare all the possible combinations of one column with all the possible combinations of the other column in an efficient way.
> > ........
> - - - - - - - - -
> It is important that you realize the difficulties involved with the problem you pose if the two series are "long", as you have said. The number of different sum combinations to try in a series of length n is 2^n-1 if we exclude the empty set, and for large n this can be a truly enormous quantity. If n is only as large as 333 that puts you over the infamous googol of Kasner and Newman which is larger than the number of atoms in the known universe. With two series of integers that long it makes for an impossibly long list of comparisons to make. Perhaps you had better consider placing some reasonable limit in the series' lengths before you ask people to work on your problem.
>
> Roger Stafford

Hi Roger.
Thank you very much for your answer.
I am not saying at all it is easy and I know the number can grow exponentially.
Anyway, the series can have 50-60 observations, 100 at max.
Do you think is it possible to write a code capable of matching the two series as I have showed in my problem formulation?

Subject: Comparing two series of data

From: Bruno Luong

Date: 25 Nov, 2011 21:00:09

Message: 11 of 20

Read on:

http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem

Bruno

Subject: Comparing two series of data

From: Rune Allnor

Date: 25 Nov, 2011 22:01:48

Message: 12 of 20

On 25 Nov, 21:36, "TraderMan " <gius.felice...@gmail.com> wrote:
> "Roger Stafford" wrote in message <jaorso$63...@newscl01ah.mathworks.com>...
> > "TraderMan" wrote in message <jao9v1$fd...@newscl01ah.mathworks.com>...
> > > I want to compare and match two long series of data using Matlab.
> > > ........
> > > It is just how to compare all the possible combinations of one column with all the possible combinations of the other column in an efficient way.
> > > ........
> > - - - - - - - - -
> >   It is important that you realize the difficulties involved with the problem you pose if the two series are "long", as you have said.  The number of different sum combinations to try in a series of length n is 2^n-1 if we exclude the empty set, and for large n this can be a truly enormous quantity.  If n is only as large as 333 that puts you over the infamous googol of Kasner and Newman which is larger than the number of atoms in the known universe.  With two series of integers that long it makes for an impossibly long list of comparisons to make.  Perhaps you had better consider placing some reasonable limit in the series' lengths before you ask people to work on your problem.
>
> > Roger Stafford
>
> Hi Roger.
> Thank you very much for your answer.
> I am not saying at all it is easy and I know the number can grow exponentially.
> Anyway, the series can have 50-60 observations, 100 at max.
> Do you think is it possible to write a code capable of matching the two series as I have showed in my problem formulation?

It might be possible to write the *code*. What you
haven't understood yet is that it will never finish
if you run it.

To get a sense of numbers, 1e10 (ten billion) seconds
is roughly 300 years. 2^100 is a number with 30 digits,
meaning 30,000 billion billion years.

Even if your computer is fast, say, a quad core with
10 GHz per core (I have no idea how available such
a thing might be) you mught expect your program
to fins in maybe 300 billion billion years.

Which really doesn't help much.

Rune

Subject: Comparing two series of data

From: Roger Stafford

Date: 25 Nov, 2011 22:05:09

Message: 13 of 20

"TraderMan" wrote in message <jaou7o$cc0$1@newscl01ah.mathworks.com>...
> Hi Roger.
> Thank you very much for your answer.
> I am not saying at all it is easy and I know the number can grow exponentially.
> Anyway, the series can have 50-60 observations, 100 at max.
> Do you think is it possible to write a code capable of matching the two series as I have showed in my problem formulation?
- - - - - - - - -
  I can very briefly sketch an approach, but I warn you that for it to be at all practical you would need your sizes to be far lower than those you quote. This difficulty is inherent in the very nature of your posed problem as I discussed earlier.

% First step
 n = size(A,1);
 bA = dec2bin((1:2^n-1).',n)-'0'; % Convert 1:2^n-1 to n-digit binary numbers
 [sA,pA] = sort(sum(bsxfun(@times,A.',bA),2)); % Sum A over the 1-digits

% Second step - similar action with B

  You now have two (very long) lists of combination sums, sA and sB, each in ascending order. The indices in pA and pB refer, by way of the corresponding binary numbers in 'bA', to the set combinations involved in each sum. Now a single 'while' loop can be written which scans through sA and sB in linear time (because they are sorted,) identifying each pair of matching sums. This is not hard to do, (and I prefer to leave it to you,) but I have no idea how you wish to store your results as you proceed.

Roger Stafford

Subject: Comparing two series of data

From: TraderMan

Date: 25 Nov, 2011 22:33:08

Message: 14 of 20

"Roger Stafford" wrote in message <jap3el$qb9$1@newscl01ah.mathworks.com>...
> "TraderMan" wrote in message <jaou7o$cc0$1@newscl01ah.mathworks.com>...
> > Hi Roger.
> > Thank you very much for your answer.
> > I am not saying at all it is easy and I know the number can grow exponentially.
> > Anyway, the series can have 50-60 observations, 100 at max.
> > Do you think is it possible to write a code capable of matching the two series as I have showed in my problem formulation?
> - - - - - - - - -
> I can very briefly sketch an approach, but I warn you that for it to be at all practical you would need your sizes to be far lower than those you quote. This difficulty is inherent in the very nature of your posed problem as I discussed earlier.
>
> % First step
> n = size(A,1);
> bA = dec2bin((1:2^n-1).',n)-'0'; % Convert 1:2^n-1 to n-digit binary numbers
> [sA,pA] = sort(sum(bsxfun(@times,A.',bA),2)); % Sum A over the 1-digits
>
> % Second step - similar action with B
>
> You now have two (very long) lists of combination sums, sA and sB, each in ascending order. The indices in pA and pB refer, by way of the corresponding binary numbers in 'bA', to the set combinations involved in each sum. Now a single 'while' loop can be written which scans through sA and sB in linear time (because they are sorted,) identifying each pair of matching sums. This is not hard to do, (and I prefer to leave it to you,) but I have no idea how you wish to store your results as you proceed.
>
> Roger Stafford


Ok Roger.
Thanks for the code. Tomorrow I will try it.
By the way, I can insert some constraints on the size to keep it lower.
What would be, according to you, a feasible size?

Subject: Comparing two series of data

From: Rune Allnor

Date: 25 Nov, 2011 22:39:24

Message: 15 of 20

On 25 Nov, 23:33, "TraderMan " <gius.felice...@gmail.com> wrote:
> "Roger Stafford" wrote in message <jap3el$qb...@newscl01ah.mathworks.com>...
> > "TraderMan" wrote in message <jaou7o$cc...@newscl01ah.mathworks.com>...
> > > Hi Roger.
> > > Thank you very much for your answer.
> > > I am not saying at all it is easy and I know the number can grow exponentially.
> > > Anyway, the series can have 50-60 observations, 100 at max.
> > > Do you think is it possible to write a code capable of matching the two series as I have showed in my problem formulation?
> > - - - - - - - - -
> >   I can very briefly sketch an approach, but I warn you that for it to be at all practical you would need your sizes to be far lower than those you quote.  This difficulty is inherent in the very nature of your posed problem as I discussed earlier.
>
> > % First step
> >  n = size(A,1);
> >  bA = dec2bin((1:2^n-1).',n)-'0';  % Convert 1:2^n-1 to n-digit binary numbers
> >  [sA,pA] = sort(sum(bsxfun(@times,A.',bA),2)); % Sum A over the 1-digits
>
> > % Second step - similar action with B
>
> >   You now have two (very long) lists of combination sums, sA and sB, each in ascending order.  The indices in pA and pB refer, by way of the corresponding binary numbers in 'bA', to the set combinations involved in each sum.  Now a single 'while' loop can be written which scans through sA and sB in linear time (because they are sorted,) identifying each pair of matching sums.  This is not hard to do, (and I prefer to leave it to you,) but I have no idea how you wish to store your results as you proceed.
>
> > Roger Stafford
>
> Ok Roger.
> Thanks for the code. Tomorrow I will try it.
> By the way, I can insert some constraints on the size to keep it lower.
> What would be, according to you, a feasible size?

Try it and see. The way Roger structured the code, you need to
have all the data available in the computer's memory at any
one time. Don't be surprised if you end up with sequence lengths
on the order of 10 or so.

Rune

Subject: Comparing two series of data

From: Roger Stafford

Date: 25 Nov, 2011 23:16:07

Message: 16 of 20

Rune Allnor <allnor@tele.ntnu.no> wrote in message
> Try it and see. The way Roger structured the code, you need to
> have all the data available in the computer's memory at any
> one time. Don't be surprised if you end up with sequence lengths
> on the order of 10 or so.
>
> Rune
- - - - - - - - - -
  Yes, you're right to warn of the structuring, Rune. In view of memory limitations it would be better to go through a for-loop generating one binary number and value of sA at a time so as to avoid the huge 2^n-1 by n arrays. Maybe something like this:

 n = length(A);
 sA = zeros(2^n-1,1);
 for k = 1:2^n-1
  sA(k) = sum(A(dec2bin(k,n)=='1'));
 end
 [sA,pA] = sort(sA);

  Also I neglected to warn TraderMan that if the two series have elements of a reasonably restricted integer range, there are likely to be very high numbers of repetitions with many of the central possible sum values, which will slow down the later processing substantially because of the large numbers of possible pairings.

Roger Stafford

Subject: Comparing two series of data

From: Rune Allnor

Date: 25 Nov, 2011 23:34:10

Message: 17 of 20

On 26 Nov, 00:16, "Roger Stafford"
<ellieandrogerxy...@mindspring.com.invalid> wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message
> > Try it and see. The way Roger structured the code, you need to
> > have all the data available in the computer's memory at any
> > one time. Don't be surprised if you end up with sequence lengths
> > on the order of 10 or so.
>
> > Rune
>
> - - - - - - - - - -
>   Yes, you're right to warn of the structuring, Rune.  In view of memory limitations it would be better to go through a for-loop generating one binary number and value of sA at a time so as to avoid the huge 2^n-1 by n arrays.

It doesn't really matter. Even if you can make do with
a small amout of RAM, the combinatorics ensures that
whatever the OP is up to, takes a very long time.

Do note that extracting 'all possible' subsequences
grows factorially with N, not exponentially (ref the
binomial formula in statistics, that treats the
probability of extracting a given subsequence).

Rune

Subject: Comparing two series of data

From: Roger Stafford

Date: 26 Nov, 2011 00:55:08

Message: 18 of 20

Rune Allnor <allnor@tele.ntnu.no> wrote in message <09897430-e730-4cdb-9d6c-bdfdbf8240a3@h21g2000yqg.googlegroups.com>...
> ........
> Do note that extracting 'all possible' subsequences
> grows factorially with N, not exponentially (ref the
> binomial formula in statistics, that treats the
> probability of extracting a given subsequence).
> .......
- - - - - - - - - -
  I disagree with your statement about factorial growth, Rune. Getting all possible non-empty combinations of elements in a given sequence is an exponential matter, namely 2^n-1 for n elements. That is the number of possible non-empty subsets of n elements and this is what is significant here.

  The fact that factorials are used to express the coefficients in a binomial expansion does not imply that the total number grows factorially. Partially compensating factorials are also present in the coefficients' denominators. For example, a set of 20 elements has about a million subsets, whereas 20 factorial is about 2.4 million million millions, which is far, far greater.

  It looks to me as though TraderMan will probably run into memory trouble with about 20 elements in each series, though if the range on the elements is too restricted the list of pairings could become overly long because of numerous occurrence of matches in both series for the same sum value (as I mentioned before.)

Roger Stafford

Subject: Comparing two series of data

From: Rune Allnor

Date: 26 Nov, 2011 01:38:34

Message: 19 of 20

On 26 Nov, 01:55, "Roger Stafford"
<ellieandrogerxy...@mindspring.com.invalid> wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message <09897430-e730-4cdb-9d6c-bdfdbf824...@h21g2000yqg.googlegroups.com>...
> > ........
> > Do note that extracting 'all possible' subsequences
> > grows factorially with N, not exponentially (ref the
> > binomial formula in statistics, that treats the
> > probability of extracting a given subsequence).
> > .......
>
> - - - - - - - - - -
>   I disagree with your statement about factorial growth, Rune.  Getting all possible non-empty combinations of elements in a given sequence is an exponential matter, namely 2^n-1 for n elements.  That is the number of possible non-empty subsets of n elements and this is what is significant here.

Actually, I don't know what is significant since I don't
know what the OP is up to. You are right in that a set
of N elements have 2^N-1 non-empty unique subsets. I
interpreted 'all possible' somewhat differently, as all
*ordered* subsets. In which case the factorial kicks in.

>   The fact that factorials are used to express the coefficients in a binomial expansion does not imply that the total number grows factorially.  Partially compensating factorials are also present in the coefficients' denominators.  For example, a set of 20 elements has about a million subsets, whereas 20 factorial is about 2.4 million million millions, which is far, far greater.

It doesn't really matter, as memory requirements and
excution times will blow up anyway, with only slightly
larger sets.

>   It looks to me as though TraderMan will probably run into memory trouble with about 20 elements in each series, though if the range on the elements is too restricted the list of pairings could become overly long because of numerous occurrence of matches in both series for the same sum value (as I mentioned before.)

Again, I don't know what the OP is up to, so
I don't know whether order is important. Or
whatever else might become important. As far
as I am concerned, we are discussing hypothetical
solutions to a question that should never have
been asked.

Rune

Subject: Comparing two series of data

From: Steven_Lord

Date: 28 Nov, 2011 05:50:48

Message: 20 of 20



"TraderMan " <gius.felicetto@gmail.com> wrote in message
news:jao9v1$fdv$1@newscl01ah.mathworks.com...
> Hello everybody.
>
> I want to compare and match two long series of data using Matlab.
> To give a short example:
> Column A Column B
> 25 1
> 25 130
> 50 85
> 27 15
> 65 11 130 75
> 5 7
>
> What I would like to do is to match the values of the column A with the
> values of column B. In this case, it is easy and it is possible to do it
> mentally:
>
> 25 + 50 from Column A MATCHED WITH 75 from Column B
> 25 + 65 from Column A MATCHED WITH 85 from Column B

Why? Last time I checked, 25 + 65 = 90 not 85.

*snip*

Solving this type of problem efficiently for large data sets could be
extremely difficult for a computer. You probably want to study the knapsack
and bin packing problem literature for algorithm suggestions. The very first
MATLAB programming contest may also be of interest to you:

http://www.mathworks.com/matlabcentral/contest/

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
matching TraderMan 25 Nov, 2011 09:54:11
comparing TraderMan 25 Nov, 2011 09:54:11
series of data TraderMan 25 Nov, 2011 09:54:11
rssFeed for this Thread

Contact us