# Find double repetitions in a (sorted) array.

2 views (last 30 days)
bbb_bbb on 20 Oct 2017
Commented: Andrei Bobrov on 23 Oct 2017
Given an array submitted in a form of struct field, containing integer numbers. For convenience, let's assume that the numbers are already sorted in ascending order:
>> s.x
ans =
2
ans =
2
ans =
5
ans =
5
ans =
5
ans =
8
ans =
8
Find indexes of elements, which occur exact 2 times:
ind =
1 2 6 7
bbb_bbb on 21 Oct 2017
This does not catch the case, if the last two elements are equal.
if d(end)==0, d(end+1)=1; end
This variant seems to be the fastest.

Andrei Bobrov on 20 Oct 2017
Edited: Andrei Bobrov on 23 Oct 2017
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
[~,~,g] = unique(x); % OR for last versions of MATLAB: g = findgroups(x)
c = accumarray(g,1:numel(x),[],@(x){x});
out = cell2mat(c(cellfun(@numel,c) == 2));
or
[a,~,g] = unique(x);
out = find(ismember(x,a(accumarray(g,1) == 2)));
or (FIXED)
out = reshape(strfind([1,diff(x(:)')~=0,1],[1 0 1]) + [0;1],[],1);
out = reshape(bsxfun(@plus,strfind([1,diff(x(:)')~=0,1],[1 0 1]),[0;1]),[],1); % for old MATLAB
Andrei Bobrov on 23 Oct 2017
Thank you mister Bbb_bbb.

Rik on 20 Oct 2017
Edited: Rik on 20 Oct 2017
It always pays off to get rid of loops and/or pre-allocating your output.
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
x=[s.x];
tic
count=histcounts(x,0.5 : max(x)+0.5);
ind=find(sum(x==find(count==2)'));
toc
%should work on most releases: 0.000628 seconds
tic
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
toc
tic
d=diff(x); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
toc
bbb_bbb on 21 Oct 2017
This variant isn't working at big arrays:
x=randi([1,1e6],1e5,1); x=sort(x)';
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
Error using repmat
Maximum variable size allowed by the program is exceeded.

Image Analyst on 20 Oct 2017
You didn't tag it as homework. Is it? This will do it:
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
numbers = [s.x]
[groupNumber, groupValue] = findgroups(numbers)
counts = histcounts(groupNumber)
ofGroupSize2 = find(counts == 2) % Find those only if they have a length of 2.
values = groupValue(ofGroupSize2)
indexes = find(ismember(numbers, values))
##### 2 CommentsShowHide 1 older comment
Image Analyst on 20 Oct 2017
You can use regionprops() instead of findgroups() if you have an old version and have the Image Processing Toolbox. See my separate answer with demo code.

Jan on 21 Oct 2017
Your code looks like the input is sorted. The other approaches do not have this limitation. If it is really sorted:
d = [true; diff(x) ~= 0]; % TRUE if values change
b = x(d); % Elements without repetitions
k = find([d', true]); % Indices of changes
n = diff(k);
is2 = find(n==2);
ind4 = reshape([k(is2); k(is2)+1], 1, []);
Code taken from FEX: RunLength.

Image Analyst on 21 Oct 2017
Edited: Image Analyst on 21 Oct 2017
If you have the Image Processing Toolbox, you can use regionprops():
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
numbers = [s.x] % A labeled "image"
% Find lengths of each run of numbers plus the indexes where they occur.
props = regionprops(numbers, 'Area', 'PixelIdxList')
% Extract from structure into one vector.
allLengths = [props.Area]
% Find those only if they have a length of 2.
ofGroupSize2 = find(allLengths == 2)
% Find indexes of those runs with length 2.
indexes = [props(ofGroupSize2).PixelIdxList]
% Shape into row vector
indexes = reshape(sort(indexes(:)), 1, [])
bbb_bbb on 21 Oct 2017
This works, but the variant is the longest (6.2 sec on 1e6 elements vector). The fastest is 0.03 sec.