Finding Duplicate string values in two cell array 22124x1
Show older comments
I have a cell 22124x1 and it contain duplicate Values, I want to know how many times these values duplicate and their index
first cell contain these values Datacell=
'221853_s_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'222031_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'
'39854_r_at'
'53202_at'
'53202_at'
'60528_at'
'60528_at'
'90610_at'
'90610_at'
symbol cell:
'OR1D4 '
' OR1D5'
' HLA-DRB4 '
' HLA-DRB5 '
' LOC100133661 '
' LOC100294036'
'UTP14A '
' UTP14C'
'GTF2H2 '
'ZNF324B '
' LOC644504'
'JMJD7 '
'ZNF324B '
' JMJD7-PLA2G4B'
'OR2A20P '
' OR2A5 '
' OR2A9P'
'ZNF324B '
' ZNF584'
'WHAMM '
' WHAMML1 '
'LOC100290658 '
' WHAMML2'
'NR1D1 '
' THRA'
'C7orf25 '
' PRR5 '
' PRR5-ARHGAP8'
'LOC100290658 '
'C7orf25 '
' SAP25'
'HIP1R '
' LOC100294412'
Any help will be highly appreciated
1 Comment
Chuck Olosky
on 3 Oct 2017
Added (2) additional lines to get names and indices:
function [dupNames, dupNdxs] = getDuplicates(aList) % find duplicate entries in the list of names
[uniqueList,~,uniqueNdx] = unique(aList);
N = histc(uniqueNdx,1:numel(uniqueList));
dupNames = uniqueList(N>1);
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1), ...
'UniformOutput',false);
end
Answers (1)
Jos (10584)
on 26 Jan 2016
Let C be your cell array of strings, then
[UniqueC,~,k] = unique(C)
N = histc(k,1:numel(UniqueC))
will give you the unique elements in UniqueC and their frequency in N
2 Comments
Ansam Al-Sabti
on 26 Jan 2016
Sulaymon Eshkabilov
on 4 Jul 2021
The code given by Chuck Olosky gives the duplicate string names and indexes:
...
dupNames = uniqueList(N>1); % Names
dupNdxs = arrayfun(@(x) find(uniqueNdx==x), find(N>1),'UniformOutput',false); % Indexes
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!