Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Removing duplicates

Subject: Removing duplicates

From: Mary Thompson

Date: 16 Apr, 2012 02:25:07

Message: 1 of 9

I was wondering if it would be possible to do the following.

I have a set of data in one column with ID numbers:

ID:
22
22
33
33
44
44
55
55
66
66
66
77
77
88
88
88

The first and second row should be the same. However, there are scenarios like with 66 and 88 that the identifier and the data that comes along with it repeats 3x. I would like to remove the middle duplicate -i am not able to do anything in excel and was wondering if there's any type of checking/verifying in matlab?

thanks.

Subject: Removing duplicates

From: Nasser M. Abbasi

Date: 16 Apr, 2012 02:51:38

Message: 2 of 9

On 4/15/2012 9:25 PM, Mary Thompson wrote:
> I was wondering if it would be possible to do the following.
>
> I have a set of data in one column with ID numbers:
>
> ID:
> 22
> 22
> 33
> 33
> 44
> 44
> 55
> 55
> 66
> 66
> 66
> 77
> 77
> 88
> 88
> 88
>
> The first and second row should be the same.
>However, there are scenarios like with 66 and 88 that the identifier and
>the data that comes along with it repeats 3x. I would like to remove the
>middle duplicate -i am not able to do anything in excel and was wondering
>if there's any type of checking/verifying in matlab?

WHat do you mean by "middle duplicate" ?

You can use the unique() command in matlab to remove duplicates.

If you want to start this after some index, say after the second
index, then you can. But you need to be more clear by what you mean
by "middle duplicate".

using your data:

EDU>> unique(A)

ans =

     22
     33
     44
     55
     66
     77
     88

--Nasser

Subject: Removing duplicates

From: Siva

Date: 16 Apr, 2012 03:03:06

Message: 3 of 9

"Mary Thompson" wrote in message <jmfvu3$sl3$1@newscl01ah.mathworks.com>...
> I was wondering if it would be possible to do the following.
>
> I have a set of data in one column with ID numbers:
>
> ID:
> 22
> 22
> 33
> 33
> 44
> 44
> 55
> 55
> 66
> 66
> 66
> 77
> 77
> 88
> 88
> 88
>
> The first and second row should be the same. However, there are scenarios like with 66 and 88 that the identifier and the data that comes along with it repeats 3x. I would like to remove the middle duplicate -i am not able to do anything in excel and was wondering if there's any type of checking/verifying in matlab?
>
> thanks.

Not sure how big your data sets are but for small data sets this might work:

% Assume DATA is a matrix where column 1 contains ID, and the rest of the columns
% contain the associated data.
uniqueIDs= unique( DATA( :, 1)) ; % identify all the unique IDs
for i= 1:length( uniqueIDs)
  idx= find( DATA( :, 1)==uniqueIDs( i)) ; % identify the rows corresponding
                                                          % to i'th unique ID
  if length( idx)==3 % check if we have three of the same ID
    DATA( idx( 2), :)= [] ; % discard the second row for that ID
  end
end

At the end of this code segment, the matrix DATA should be stripped of the second row when there were three rows for an ID.

Subject: Removing duplicates

From: Roger Stafford

Date: 16 Apr, 2012 04:33:08

Message: 4 of 9

"Mary Thompson" wrote in message <jmfvu3$sl3$1@newscl01ah.mathworks.com>...
> .... However, there are scenarios like with 66 and 88 that the identifier and the data that comes along with it repeats 3x. I would like to remove the middle duplicate
- - - - - - - - - -
  If 'ID' is the name of the column vector, do this:

 t = [true;diff(ID)~=0];
 ID = ID(t|[diff(t)~=0;true]);

  It should reduce any consecutive sequence of more than two like numbers to just two of them, but a sequence of two is left unchanged. Is that what you wanted?

Roger Stafford

Subject: Removing duplicates

From: Parag Shridhar

Date: 17 Apr, 2012 04:48:08

Message: 5 of 9

"Mary Thompson" wrote in message <jmfvu3$sl3$1@newscl01ah.mathworks.com>...
> I was wondering if it would be possible to do the following.
>
> I have a set of data in one column with ID numbers:
>
> ID:
> 22
> 22
> 33
> 33
> 44
> 44
> 55
> 55
> 66
> 66
> 66
> 77
> 77
> 88
> 88
> 88
>
> The first and second row should be the same. However, there are scenarios like with 66 and 88 that the identifier and the data that comes along with it repeats 3x. I would like to remove the middle duplicate -i am not able to do anything in excel and was wondering if there's any type of checking/verifying in matlab?
>
> thanks.

Roger Stafford just provided a brilliant solution.
Anyway if you want a generalized solution but a slow one comparatively, you can group the numbers using "splitvec" (A file in MATLAB File Exhchange) and then you can remove the numbers you want to.

- Parag Shridhar Chandakkar.

Subject: Removing duplicates

From: venkat vasu

Date: 18 Apr, 2012 11:12:07

Message: 6 of 9

This code surely helpful for you....


a=[1 1 2 2 3 3 3 5 5 5 5 5 7 7 4 4 8 8 9 9 9 4 4 ];
c=length(a);
j=1;
l=1;
while j<c
       
       d(l)=a(j);
       j=j+1;l=l+1;
       d(l)=a(j);
       l=l+1;
       for k=j+1:c-1
           
           if a(j)==a(k)
               continue;
           else
               
               break;
           end
       end
       j=k;
       
end
disp(d);

Subject: Removing duplicates

From: Nasser M. Abbasi

Date: 18 Apr, 2012 11:21:50

Message: 7 of 9

On 4/18/2012 6:12 AM, venkat vasu wrote:
> This code surely helpful for you....
>
>
> a=[1 1 2 2 3 3 3 5 5 5 5 5 7 7 4 4 8 8 9 9 9 4 4 ];
> c=length(a);
> j=1;
> l=1;
> while j<c
>
> d(l)=a(j);
> j=j+1;l=l+1;
> d(l)=a(j);
> l=l+1;
> for k=j+1:c-1
>
> if a(j)==a(k)
> continue;
> else
>
> break;
> end
> end
> j=k;
>
> end
> disp(d);


I have not examined your algorithm in detail, but it does
not seem to work on my matlab 2012a.

When I look at 'd' at the end, it print same as 'a'.
May be there is a bug some where?

btw, why not just use Matlab function

     unique(a)

instead?

--Nasser

Subject: Removing duplicates

From: venkat vasu

Date: 18 Apr, 2012 11:46:08

Message: 8 of 9



Yes... we can use the matlab function unique(a). it will give following output.
a=[1 1 2 2 3 3 3 5 5 5 5 5 7 7 4 4 8 8 9 9 9 4 4 ];

 b=1 2 3 4 5 7 8 9;


I thought following output have to give
  b=1 1 2 2 3 3 5 5 7 7 4 4 8 8 9 9 4 4;
my code give's this like output.

Subject: Removing duplicates

From: Shanmugam Kannappan

Date: 18 Apr, 2012 12:46:10

Message: 9 of 9

"Mary Thompson" wrote in message <jmfvu3$sl3$1@newscl01ah.mathworks.com>...
> I was wondering if it would be possible to do the following.
>
> I have a set of data in one column with ID numbers:
>
> ID:
> 22
> 22
> 33
> 33
> 44
> 44
> 55
> 55
> 66
> 66
> 66
> 77
> 77
> 88
> 88
> 88
>
> The first and second row should be the same. However, there are scenarios like with 66 and 88 that the identifier and the data that comes along with it repeats 3x. I would like to remove the middle duplicate -i am not able to do anything in excel and was wondering if there's any type of checking/verifying in matlab?
>
> thanks.

Hi,

Just try If this helps.

id = [2 2 4 4 5 5 5 6 6 7 7 8 8 8 8 9 9 9 9 9 10 10]';
val = (1:length(id))';
x =[id val];
[a index_frst] = unique(x(:,1), 'first');
[b index_last] = unique(x(:,1), 'last');
x1 = zeros(2*length(index_frst), 2);
x1(1:2:end,:) = x(index_frst,:);
x1(2:2:end,:) = x(index_last,:);
disp(x1)
disp(x) % Just for comparison.

It extracts the rows of the first & last index.

HTH,
Shan!

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us