Path: news.mathworks.com!not-for-mail From: "Siva " <sivaathome@gmail.com> Newsgroups: comp.soft-sys.matlab Subject: Removing duplicates Date: Mon, 16 Apr 2012 03:03:06 +0000 (UTC) Organization: Roche Diagnostics Lines: 41 Message-ID: <jmg25a$7p1$1@newscl01ah.mathworks.com> References: <jmfvu3$sl3$1@newscl01ah.mathworks.com> Reply-To: "Siva " <sivaathome@gmail.com> NNTP-Posting-Host: www-02-blr.mathworks.com Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: newscl01ah.mathworks.com 1334545386 7969 172.30.248.47 (16 Apr 2012 03:03:06 GMT) X-Complaints-To: news@mathworks.com NNTP-Posting-Date: Mon, 16 Apr 2012 03:03:06 +0000 (UTC) X-Newsreader: MATLAB Central Newsreader 11031 Xref: news.mathworks.com comp.soft-sys.matlab:764620 "Mary Thompson" wrote in message <jmfvu3$sl3$1@newscl01ah.mathworks.com>... > I was wondering if it would be possible to do the following. > > I have a set of data in one column with ID numbers: > > ID: > 22 > 22 > 33 > 33 > 44 > 44 > 55 > 55 > 66 > 66 > 66 > 77 > 77 > 88 > 88 > 88 > > The first and second row should be the same. However, there are scenarios like with 66 and 88 that the identifier and the data that comes along with it repeats 3x. I would like to remove the middle duplicate -i am not able to do anything in excel and was wondering if there's any type of checking/verifying in matlab? > > thanks. Not sure how big your data sets are but for small data sets this might work: % Assume DATA is a matrix where column 1 contains ID, and the rest of the columns % contain the associated data. uniqueIDs= unique( DATA( :, 1)) ; % identify all the unique IDs for i= 1:length( uniqueIDs) idx= find( DATA( :, 1)==uniqueIDs( i)) ; % identify the rows corresponding % to i'th unique ID if length( idx)==3 % check if we have three of the same ID DATA( idx( 2), :)= [] ; % discard the second row for that ID end end At the end of this code segment, the matrix DATA should be stripped of the second row when there were three rows for an ID.