http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278
MATLAB Central Newsreader  "Unique" row comparison with duplicate row counter
Feed for thread: "Unique" row comparison with duplicate row counter
enus
©19942015 by MathWorks, Inc.
webmaster@mathworks.com
MATLAB Central Newsreader
http://blogs.law.harvard.edu/tech/rss
60
MathWorks
http://www.mathworks.com/images/membrane_icon.gif

Thu, 09 Sep 2010 22:20:23 +0000
"Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#778331
Josh
Hi Everyone,<br>
<br>
I have a large sorted array (More than 100,000 rows), comprized of three columns. The first two columns represent X and Y coordinates, and the third column is a counter used to keep track of duplicate points. <br>
<br>
What I want to do, is keep only unique rows (the "unique" command is likely to come in handy). However, I only want to compare the first two columns for uniqueness, remove any duplicates, and increase column 3 by the number of duplicates that I have just removed (Note: previous accumulation of duplicates may be present in the top sorted position of column 3).<br>
<br>
For example, if I have the following Array:<br>
<br>
A=<br>
[ 3 7 2<br>
3 7 1<br>
3 1 1<br>
2 6 1<br>
2 5 3<br>
2 5 1<br>
2 5 1<br>
1 9 1<br>
1 9 1<br>
1 9 1]<br>
<br>
The result should be:<br>
[ 3 7 3<br>
3 1 1<br>
2 6 1<br>
2 5 5<br>
1 9 2<br>
1 9 1]<br>
<br>
I recognize that it is probably easier to seperate the first two columns from the counter (3rd column) for uniqueness comparison, which is fine as long as they are indexed correctly.<br>
<br>
If anyone has a quick method for doing this, It would be greatly appreciated.<br>
<br>
Thanks,<br>
Josh

Fri, 10 Sep 2010 03:58:06 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#778395
Darren Rowland
Josh,<br>
<br>
Try this<br>
<br>
[b,u,v] = unique(A(:,1:2),'rows');<br>
c = accumarray(v,A(:,3));<br>
A2 = [b c]<br>
<br>
With your example the result is <br>
<br>
A2 =<br>
<br>
1 9 1<br>
1 9 2<br>
2 5 5<br>
2 6 1<br>
3 1 1<br>
3 7 3<br>
<br>
So the order of the rows has changed to ascending order but is otherwise as you want.<br>
<br>
Hth<br>
Darren

Fri, 10 Sep 2010 04:11:21 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#778398
Josh
Brilliant!<br>
Thanks Darren, it works perfectly.<br>
<br>
Josh

Fri, 10 Sep 2010 11:12:04 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#778462
Roger Stafford
"Josh " <joshinbox@hotmail.com> wrote in message <i6bmj7$t2d$1@fred.mathworks.com>...<br>
> I have a large sorted array (More than 100,000 rows), comprized of three columns. The first two columns represent X and Y coordinates, and the third column is a counter used to keep track of duplicate points. <br>
> ...........<br>
      <br>
Since we have your assurance that A is already in sorted order (presumably with respect to its first two columns,) it would perhaps be a waste of cpu time to sort it again with the unique function. Try this:<br>
<br>
f = find([true;diff(A(:,1))~=0diff(A(:,2))~=0;true]);<br>
B = A(f(1:end1),:);<br>
B(:,3) = B(:,3)+diff(f)1;<br>
<br>
Roger Stafford

Fri, 24 Sep 2010 19:29:05 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#782370
Josh
"Josh " <joshinbox@hotmail.com> wrote in message <i6bmj7$t2d$1@fred.mathworks.com>...<br>
> Hi Everyone,<br>
> <br>
> I have a large sorted array (More than 100,000 rows), comprized of three columns. The first two columns represent X and Y coordinates, and the third column is a counter used to keep track of duplicate points. <br>
> <br>
> What I want to do, is keep only unique rows (the "unique" command is likely to come in handy). However, I only want to compare the first two columns for uniqueness, remove any duplicates, and increase column 3 by the number of duplicates that I have just removed (Note: previous accumulation of duplicates may be present in the top sorted position of column 3).<br>
> <br>
> For example, if I have the following Array:<br>
> <br>
> A=<br>
> [ 3 7 2<br>
> 3 7 1<br>
> 3 1 1<br>
> 2 6 1<br>
> 2 5 3<br>
> 2 5 1<br>
> 2 5 1<br>
> 1 9 1<br>
> 1 9 1<br>
> 1 9 1]<br>
> <br>
> The result should be:<br>
> [ 3 7 3<br>
> 3 1 1<br>
> 2 6 1<br>
> 2 5 5<br>
> 1 9 2<br>
> 1 9 1]<br>
> <br>
> I recognize that it is probably easier to seperate the first two columns from the counter (3rd column) for uniqueness comparison, which is fine as long as they are indexed correctly.<br>
> <br>
> If anyone has a quick method for doing this, It would be greatly appreciated.<br>
> <br>
> Thanks,<br>
> Josh<br>
<br>
<br>
Hi Everyone,<br>
<br>
Now I want to change things up a bit. Suppose that I want to take the result from this previous example, and now I want to compare a new array to it (actually I will be comparing hundreds of arrays to it). Again, this time if the X and Y coordinates (in columns 1 and two) are the same, I want to again increment the number in the third column, but I do not want to add any new points to the array. <br>
<br>
For example:<br>
Start with the base array<br>
A=<br>
[ 3 7 3<br>
3 1 1<br>
2 6 1<br>
2 5 5<br>
1 9 2<br>
1 9 1]<br>
<br>
Compare new array (B) against the array above:<br>
B=<br>
[ 2 6 1<br>
0 1 5<br>
3 7 2<br>
1 9 1<br>
2 6 2<br>
3 7 1<br>
5 3 2] <br>
<br>
Now add components from column 3 of the new array (B) for each instance Where column 1 and 2 of any row in B is identical to a row in A<br>
<br>
Solution should yield:<br>
A=<br>
[ 3 7 6<br>
3 1 1<br>
2 6 2<br>
2 5 5<br>
1 9 3<br>
1 9 1]<br>
<br>
Thanks,<br>
Josh

Fri, 24 Sep 2010 20:21:21 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#782389
Josh
Oops! I screwed up on the results, as I didn't pay attention to the  in front of the 3 (Column 1 of the B matrix). Therefore the first row result should have been 3 7 3, as shown below:<br>
<br>
Solution should yield:<br>
A=<br>
[ 3 7 3<br>
3 1 1<br>
2 6 2<br>
2 5 5<br>
1 9 3<br>
1 9 1]<br>
<br>
Sorry for the confusion,<br>
Josh

Fri, 24 Sep 2010 20:46:19 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#782395
Sean
"Josh " <joshinbox@hotmail.com> wrote in message <i7j181$kii$1@fred.mathworks.com>...<br>
> Oops! I screwed up on the results, as I didn't pay attention to the  in front of the 3 (Column 1 of the B matrix). Therefore the first row result should have been 3 7 3, as shown below:<br>
> <br>
> Solution should yield:<br>
> A=<br>
> [ 3 7 3<br>
> 3 1 1<br>
> 2 6 2<br>
> 2 5 5<br>
> 1 9 3<br>
> 1 9 1]<br>
> <br>
> Sorry for the confusion,<br>
> Josh<br>
<br>
One way:<br>
A=...<br>
[ 3 7 3<br>
3 1 1<br>
2 6 2<br>
2 5 5<br>
1 9 3<br>
1 9 1];<br>
<br>
B=...<br>
[ 2 6 1<br>
0 1 5<br>
3 7 2<br>
1 9 1<br>
2 6 2<br>
3 7 1<br>
5 3 2] ;<br>
<br>
offset = abs(min(reshape([A(:,1:2);B(:,1:2)],[],1)))+1;<br>
Allofit = accumarray(offset+[A(:,1:2);B(:,1:2)],[A(:,3);B(:,3)]);<br>
A(:,3) = Allofit(sub2ind(size(Allofit),A(:,1)+offset,A(:,2)+offset));<br>
<br>
I think you expected output was wrong again:<br>
A =<br>
<br>
3 7 3<br>
3 1 1<br>
2 6 3<br>
2 5 5<br>
1 9 4<br>
1 9 1

Wed, 29 Sep 2010 19:17:26 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#783741
Josh
<br>
Thanks Sean, works great! <br>
(And you were right on the correction, unfortunately I need more sleep).<br>
<br>
I now realize that I have another problem however, as my data is actually in decimal format (as the points represent spatial location to an accuracy of 10,000th of an inch). Also, keep in mind that my base array is commonly around 500,000 to 1.5 million rows long, and the arrays I am comparing are normally 100,000+ rows long as well, so I have to avoid any NxN duplication. etc.. so as not to run out of memory.<br>
<br>
I tried to make your code work on my data by multiplyng it all by 10000 first (to eliminate the decimals), but after doing so, and running the code, it errors out. Saying:<br>
<br>
??? Error using ==> accumarray<br>
First input SUBS must contain positive integer subscripts.<br>
<br>
I then tried to email you two files called A and B, which had sample data (with about 150,000 lines each) to see if you could get your code to work on those, but I recieved a delivery error. <br>
<br>
If you or anyone else has any suggestions on what I can do to make large files of decimal data work, I would be very appreciative.<br>
Thanks,<br>
<br>
Josh

Wed, 29 Sep 2010 20:40:22 +0000
Re: "Unique" row comparison with duplicate row counter
http://www.mathworks.com/matlabcentral/newsreader/view_thread/291278#783757
Sean
"Josh " <joshinbox@hotmail.com> wrote in message <i803c6$ml7$1@fred.mathworks.com>...<br>
> <br>
> Thanks Sean, works great! <br>
> (And you were right on the correction, unfortunately I need more sleep).<br>
> <br>
> I now realize that I have another problem however, as my data is actually in decimal format (as the points represent spatial location to an accuracy of 10,000th of an inch). Also, keep in mind that my base array is commonly around 500,000 to 1.5 million rows long, and the arrays I am comparing are normally 100,000+ rows long as well, so I have to avoid any NxN duplication. etc.. so as not to run out of memory.<br>
> <br>
> I tried to make your code work on my data by multiplyng it all by 10000 first (to eliminate the decimals), but after doing so, and running the code, it errors out. Saying:<br>
> <br>
> ??? Error using ==> accumarray<br>
> First input SUBS must contain positive integer subscripts.<br>
> <br>
> I then tried to email you two files called A and B, which had sample data (with about 150,000 lines each) to see if you could get your code to work on those, but I recieved a delivery error. <br>
> <br>
> If you or anyone else has any suggestions on what I can do to make large files of decimal data work, I would be very appreciative.<br>
> Thanks,<br>
> <br>
> Josh<br>
<br>
If you look at the first part after the @ in my email address you'll see why delivery failed.<br>
<br>
ACCUMARRAY does need everything to be positive because it is the equivalent of indices in a matrix. That's why in the example I used an offset. You probably also haven't rounded to integers if you just multiplied 10000. You could try rounding the data at this point.<br>
<br>
You can also look at the last input argument to ACCUMARRAY which would make it a sparse matrix and save some memory.<br>
<br>
Overall I would guess there is a better way to go about this. Check out John's Consolidator ( <a href="http://www.mathworks.com/matlabcentral/fileexchange/8354consolidator">http://www.mathworks.com/matlabcentral/fileexchange/8354consolidator</a> ). I've never used it myself but it looks like it may address all the problems you are seeing at once!