MATLAB Answers

bbb_bbb
0

Find double repetitions in a (sorted) array.

Asked by bbb_bbb
on 20 Oct 2017
Latest activity Commented on by Andrei Bobrov
on 23 Oct 2017
Given an array submitted in a form of struct field, containing integer numbers. For convenience, let's assume that the numbers are already sorted in ascending order:
>> s.x
ans =
2
ans =
2
ans =
5
ans =
5
ans =
5
ans =
8
ans =
8
Find indexes of elements, which occur exact 2 times:
ind =
1 2 6 7

  4 Comments

Show 1 older comment
I have built a for loop, but I see that it is not optimal for speed, first of all because ind changes size every loop iteration. I think there is more concise way...
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
% Finding double repetitions
d=diff([s.x]); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
Jan
on 21 Oct 2017
The iterative growing of arrays is a standard mistake from the view point of efficiency. Simply pre-allocate:
d = diff(x);
j = 0;
ind = zeros(1, numel(d));
indi = 1;
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(indi) = i-1;
ind(indi+1) = i;
indi = indi + 2;
end
j=0;
end
end
ind = ind(1:indi-1);
This does not catch the case, if the last two elements are equal.
This does not catch the case, if the last two elements are equal.
Adding this line repairs this:
if d(end)==0, d(end+1)=1; end
This variant seems to be the fastest.

Sign in to comment.

5 Answers

Answer by Andrei Bobrov
on 20 Oct 2017
Edited by Andrei Bobrov
on 23 Oct 2017
 Accepted Answer

x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
[~,~,g] = unique(x); % OR for last versions of MATLAB: g = findgroups(x)
c = accumarray(g,1:numel(x),[],@(x){x});
out = cell2mat(c(cellfun(@numel,c) == 2));
or
[a,~,g] = unique(x);
out = find(ismember(x,a(accumarray(g,1) == 2)));
or (FIXED)
out = reshape(strfind([1,diff(x(:)')~=0,1],[1 0 1]) + [0;1],[],1);
out = reshape(bsxfun(@plus,strfind([1,diff(x(:)')~=0,1],[1 0 1]),[0;1]),[],1); % for old MATLAB

  13 Comments

:), fixed!
This is ok. I sligtly modified it for speed. The fastest and concisiest algorithm of all suggested!
out = strfind([true,diff(x')~=0,true],[1 0 1]);
out = reshape([out;out+1],1,[]);
Thank you mister Bbb_bbb.

Sign in to comment.


Answer by Rik
on 20 Oct 2017
Edited by Rik
on 20 Oct 2017

It always pays off to get rid of loops and/or pre-allocating your output.
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
x=[s.x];
%only newer releases: 0.000778 seconds
tic
count=histcounts(x,0.5 : max(x)+0.5);
ind=find(sum(x==find(count==2)'));
toc
%should work on most releases: 0.000628 seconds
tic
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
toc
%your loop: 0.001100 seconds
tic
d=diff(x); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
toc

  8 Comments

x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
tic
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
toc
Error using ==
Matrix dimensions must agree.
repmat(x,length(count),1) % 22x1 double
repmat(count',1,length(x)) % 2x11 double
Rik
on 21 Oct 2017
That's because x has a different shape:
x1= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x1));
x2=[s.x];
x1 is 11x1 and x2 is 1x11
This variant isn't working at big arrays:
x=randi([1,1e6],1e5,1); x=sort(x)';
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
Error using repmat
Maximum variable size allowed by the program is exceeded.

Sign in to comment.


Answer by Image Analyst
on 20 Oct 2017

You didn't tag it as homework. Is it? This will do it:
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
numbers = [s.x]
[groupNumber, groupValue] = findgroups(numbers)
counts = histcounts(groupNumber)
ofGroupSize2 = find(counts == 2) % Find those only if they have a length of 2.
values = groupValue(ofGroupSize2)
indexes = find(ismember(numbers, values))

  2 Comments

No, its no homework - so called "just-for-fun project".
[groupNumber, groupValue] = findgroups(numbers)
Undefined function or variable 'findgroups'.
Matlab 2015a
You can use regionprops() instead of findgroups() if you have an old version and have the Image Processing Toolbox. See my separate answer with demo code.

Sign in to comment.


Answer by Jan
on 21 Oct 2017

Your code looks like the input is sorted. The other approaches do not have this limitation. If it is really sorted:
d = [true; diff(x) ~= 0]; % TRUE if values change
b = x(d); % Elements without repetitions
k = find([d', true]); % Indices of changes
n = diff(k);
is2 = find(n==2);
ind4 = reshape([k(is2); k(is2)+1], 1, []);
Code taken from FEX: RunLength.

  0 Comments

Sign in to comment.


Answer by Image Analyst
on 21 Oct 2017
Edited by Image Analyst
on 21 Oct 2017

If you have the Image Processing Toolbox, you can use regionprops():
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
numbers = [s.x] % A labeled "image"
% Find lengths of each run of numbers plus the indexes where they occur.
props = regionprops(numbers, 'Area', 'PixelIdxList')
% Extract from structure into one vector.
allLengths = [props.Area]
% Find those only if they have a length of 2.
ofGroupSize2 = find(allLengths == 2)
% Find indexes of those runs with length 2.
indexes = [props(ofGroupSize2).PixelIdxList]
% Shape into row vector
indexes = reshape(sort(indexes(:)), 1, [])

  1 Comment

This works, but the variant is the longest (6.2 sec on 1e6 elements vector). The fastest is 0.03 sec.

Sign in to comment.