Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
"Unique" row comparison with duplicate row counter

Subject: "Unique" row comparison with duplicate row counter

From: Josh

Date: 9 Sep, 2010 22:20:23

Message: 1 of 9

Hi Everyone,

     I have a large sorted array (More than 100,000 rows), comprized of three columns. The first two columns represent X and Y coordinates, and the third column is a counter used to keep track of duplicate points.

What I want to do, is keep only unique rows (the "unique" command is likely to come in handy). However, I only want to compare the first two columns for uniqueness, remove any duplicates, and increase column 3 by the number of duplicates that I have just removed (Note: previous accumulation of duplicates may be present in the top sorted position of column 3).

For example, if I have the following Array:

A=
[ 3 7 2
  3 7 1
  3 1 1
  2 6 1
  2 -5 3
  2 -5 1
  2 -5 1
 -1 9 1
 -1 9 1
 -1 -9 1]

The result should be:
[ 3 7 3
  3 1 1
  2 6 1
  2 -5 5
 -1 9 2
 -1 -9 1]

I recognize that it is probably easier to seperate the first two columns from the counter (3rd column) for uniqueness comparison, which is fine as long as they are indexed correctly.

If anyone has a quick method for doing this, It would be greatly appreciated.

Thanks,
     Josh

Subject: "Unique" row comparison with duplicate row counter

From: Darren Rowland

Date: 10 Sep, 2010 03:58:06

Message: 2 of 9

Josh,

Try this

[b,u,v] = unique(A(:,1:2),'rows');
c = accumarray(v,A(:,3));
A2 = [b c]

With your example the result is

A2 =

    -1 -9 1
    -1 9 2
     2 -5 5
     2 6 1
     3 1 1
     3 7 3

So the order of the rows has changed to ascending order but is otherwise as you want.

Hth
Darren

Subject: "Unique" row comparison with duplicate row counter

From: Josh

Date: 10 Sep, 2010 04:11:21

Message: 3 of 9

Brilliant!
Thanks Darren, it works perfectly.

Josh

Subject: "Unique" row comparison with duplicate row counter

From: Roger Stafford

Date: 10 Sep, 2010 11:12:04

Message: 4 of 9

"Josh " <joshinbox@hotmail.com> wrote in message <i6bmj7$t2d$1@fred.mathworks.com>...
> I have a large sorted array (More than 100,000 rows), comprized of three columns. The first two columns represent X and Y coordinates, and the third column is a counter used to keep track of duplicate points.
> ...........
- - - - - - -
  Since we have your assurance that A is already in sorted order (presumably with respect to its first two columns,) it would perhaps be a waste of cpu time to sort it again with the unique function. Try this:

 f = find([true;diff(A(:,1))~=0|diff(A(:,2))~=0;true]);
 B = A(f(1:end-1),:);
 B(:,3) = B(:,3)+diff(f)-1;
 
Roger Stafford

Subject: "Unique" row comparison with duplicate row counter

From: Josh

Date: 24 Sep, 2010 19:29:05

Message: 5 of 9

"Josh " <joshinbox@hotmail.com> wrote in message <i6bmj7$t2d$1@fred.mathworks.com>...
> Hi Everyone,
>
> I have a large sorted array (More than 100,000 rows), comprized of three columns. The first two columns represent X and Y coordinates, and the third column is a counter used to keep track of duplicate points.
>
> What I want to do, is keep only unique rows (the "unique" command is likely to come in handy). However, I only want to compare the first two columns for uniqueness, remove any duplicates, and increase column 3 by the number of duplicates that I have just removed (Note: previous accumulation of duplicates may be present in the top sorted position of column 3).
>
> For example, if I have the following Array:
>
> A=
> [ 3 7 2
> 3 7 1
> 3 1 1
> 2 6 1
> 2 -5 3
> 2 -5 1
> 2 -5 1
> -1 9 1
> -1 9 1
> -1 -9 1]
>
> The result should be:
> [ 3 7 3
> 3 1 1
> 2 6 1
> 2 -5 5
> -1 9 2
> -1 -9 1]
>
> I recognize that it is probably easier to seperate the first two columns from the counter (3rd column) for uniqueness comparison, which is fine as long as they are indexed correctly.
>
> If anyone has a quick method for doing this, It would be greatly appreciated.
>
> Thanks,
> Josh
------------------------------------------------------------------------------------------------------------

Hi Everyone,

     Now I want to change things up a bit. Suppose that I want to take the result from this previous example, and now I want to compare a new array to it (actually I will be comparing hundreds of arrays to it). Again, this time if the X and Y coordinates (in columns 1 and two) are the same, I want to again increment the number in the third column, but I do not want to add any new points to the array.

For example:
Start with the base array
A=
 [ 3 7 3
   3 1 1
   2 6 1
   2 -5 5
  -1 9 2
  -1 -9 1]

Compare new array (B) against the array above:
B=
[ 2 6 1
  0 1 5
 -3 7 2
 -1 9 1
 -2 6 2
 -3 7 1
  5 3 2]

Now add components from column 3 of the new array (B) for each instance Where column 1 and 2 of any row in B is identical to a row in A

Solution should yield:
A=
 [ 3 7 6
   3 1 1
   2 6 2
   2 -5 5
  -1 9 3
  -1 -9 1]

 Thanks,
     Josh

Subject: "Unique" row comparison with duplicate row counter

From: Josh

Date: 24 Sep, 2010 20:21:21

Message: 6 of 9

Oops! I screwed up on the results, as I didn't pay attention to the - in front of the 3 (Column 1 of the B matrix). Therefore the first row result should have been 3 7 3, as shown below:

 Solution should yield:
 A=
  [ 3 7 3
    3 1 1
    2 6 2
    2 -5 5
   -1 9 3
   -1 -9 1]
 
Sorry for the confusion,
      Josh

Subject: "Unique" row comparison with duplicate row counter

From: Sean

Date: 24 Sep, 2010 20:46:19

Message: 7 of 9

"Josh " <joshinbox@hotmail.com> wrote in message <i7j181$kii$1@fred.mathworks.com>...
> Oops! I screwed up on the results, as I didn't pay attention to the - in front of the 3 (Column 1 of the B matrix). Therefore the first row result should have been 3 7 3, as shown below:
>
> Solution should yield:
> A=
> [ 3 7 3
> 3 1 1
> 2 6 2
> 2 -5 5
> -1 9 3
> -1 -9 1]
>
> Sorry for the confusion,
> Josh

One way:
 A=...
  [ 3 7 3
    3 1 1
    2 6 2
    2 -5 5
   -1 9 3
   -1 -9 1];

B=...
[ 2 6 1
  0 1 5
 -3 7 2
 -1 9 1
 -2 6 2
 -3 7 1
  5 3 2] ;

offset = abs(min(reshape([A(:,1:2);B(:,1:2)],[],1)))+1;
Allofit = accumarray(offset+[A(:,1:2);B(:,1:2)],[A(:,3);B(:,3)]);
A(:,3) = Allofit(sub2ind(size(Allofit),A(:,1)+offset,A(:,2)+offset));

I think you expected output was wrong again:
A =

     3 7 3
     3 1 1
     2 6 3
     2 -5 5
    -1 9 4
    -1 -9 1

Subject: "Unique" row comparison with duplicate row counter

From: Josh

Date: 29 Sep, 2010 19:17:26

Message: 8 of 9

------------------------------------------------------------------------------------------------
Thanks Sean, works great!
(And you were right on the correction, unfortunately I need more sleep).

I now realize that I have another problem however, as my data is actually in decimal format (as the points represent spatial location to an accuracy of 10,000th of an inch). Also, keep in mind that my base array is commonly around 500,000 to 1.5 million rows long, and the arrays I am comparing are normally 100,000+ rows long as well, so I have to avoid any NxN duplication. etc.. so as not to run out of memory.

I tried to make your code work on my data by multiplyng it all by 10000 first (to eliminate the decimals), but after doing so, and running the code, it errors out. Saying:

??? Error using ==> accumarray
First input SUBS must contain positive integer subscripts.

I then tried to e-mail you two files called A and B, which had sample data (with about 150,000 lines each) to see if you could get your code to work on those, but I recieved a delivery error.

     If you or anyone else has any suggestions on what I can do to make large files of decimal data work, I would be very appreciative.
Thanks,

   Josh

Subject: "Unique" row comparison with duplicate row counter

From: Sean

Date: 29 Sep, 2010 20:40:22

Message: 9 of 9

"Josh " <joshinbox@hotmail.com> wrote in message <i803c6$ml7$1@fred.mathworks.com>...
> ------------------------------------------------------------------------------------------------
> Thanks Sean, works great!
> (And you were right on the correction, unfortunately I need more sleep).
>
> I now realize that I have another problem however, as my data is actually in decimal format (as the points represent spatial location to an accuracy of 10,000th of an inch). Also, keep in mind that my base array is commonly around 500,000 to 1.5 million rows long, and the arrays I am comparing are normally 100,000+ rows long as well, so I have to avoid any NxN duplication. etc.. so as not to run out of memory.
>
> I tried to make your code work on my data by multiplyng it all by 10000 first (to eliminate the decimals), but after doing so, and running the code, it errors out. Saying:
>
> ??? Error using ==> accumarray
> First input SUBS must contain positive integer subscripts.
>
> I then tried to e-mail you two files called A and B, which had sample data (with about 150,000 lines each) to see if you could get your code to work on those, but I recieved a delivery error.
>
> If you or anyone else has any suggestions on what I can do to make large files of decimal data work, I would be very appreciative.
> Thanks,
>
> Josh

If you look at the first part after the @ in my email address you'll see why delivery failed.

ACCUMARRAY does need everything to be positive because it is the equivalent of indices in a matrix. That's why in the example I used an offset. You probably also haven't rounded to integers if you just multiplied 10000. You could try rounding the data at this point.

You can also look at the last input argument to ACCUMARRAY which would make it a sparse matrix and save some memory.

Overall I would guess there is a better way to go about this. Check out John's Consolidator ( http://www.mathworks.com/matlabcentral/fileexchange/8354-consolidator ). I've never used it myself but it looks like it may address all the problems you are seeing at once!

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us