How to find repeated sequence , and the times of repeat, in array by Matlab?

64 views (last 30 days)
I want to input an array e.g. 12341212356
and find all repeated sequences(and times)in that array like: 12 : repeat 3 times 123 : repest 2 times
How could I use Matlab to do this work efficiently?
thanks!
---
thanks for the comments! supplement explanation for my question:
1.The length of sequence need to large than 1 otherwise every data can be a pattern, that's not what I want.
2.Every sequence that appear more than two times will be record.
3.I want to find every patterns that agree with the rules mentioned above.
e.g. 121212 will be 12:trice, 121:twice, 212 twice, 1212:twice e.g. 123234345 will be 23:twice, 34:twic
ps: I don't really care about the output form right now. I think a matrix that record those information will be fine.
thanks
  2 Comments
Daniel Shub
Daniel Shub on 21 May 2011
You need to supply more information.
Do you want the repeats for sequences you specify or all sequences? What about sequences of length 1?
What doe you expect the output to look like?
Can you give a couple of examples (worked out by hand) of inputs and outputs.

Sign in to comment.

Answers (2)

Matt Fig
Matt Fig on 21 May 2011
Here is another version which returns in strings and only returns those patterns that repeat and are of length >1:
% This code works with numbers or vectors of digits.
% M = round(rand(1,500)*3); % A bigger vector, try this out!
M = 123412123562356; % An example number to try...
As = sprintf('%.0f',M);
cnt = 1;
L = length(As);
for jj = 1:L-2
for ii = jj+1:L
I = strfind(As,As(jj:ii));
if length(I)>1
T{cnt} = sprintf('%s_%i',As(jj:ii),length(I));
cnt = cnt + 1;
end
end
end
T = unique(T)
Note that this could easily be modified to return numeric results instead...
%
%
%
%
%
EDIT Update with a functional form.
I made this into a function which is more general. For instance, you will want to switch to using a vector if your length gets long enough because double precision limits how many digits you can hold. For example:
M = 123412123562356778834239877;
Would be better represented as a vector of digits, as it cannot be stored exactly. The following works fast on vectors of numbers or characters, and scalar numbers like M above (subject to the discussed limits).
function T = find_patterns(M)
%FIND_PATTERNS finds all repeating patterns in a vector or number.
% T = FIND_PATTERNS(M) returns an n-by-2 cell array. For each row of the
% cell array, column one has the pattern, and column two has the number of
% times the pattern occurs.
% Takes as input either a vector of digits or charcters, or a scalar number
% and finds all repeating patterns of length greater one.
% If the input argument is numeric, the output argument will be numeric,
% and if the input argument is a vector of characters, the output argument
% will be a vector of characters.
%
% Examples:
%
% M = 123412123562356; % A scalar number.
% T = find_patterns(M);
% T{8,:} % Show that [2 3 5 6] occurs 2 times.
%
% M = 'c31a234121a23562c356'; % A character array.
% T = find_patterns(M);
%
% M = round(rand(1,300)*3); % A vector of digits.
% T = find_patterns(M);
% T{1,:}
%
%
% Author: Matt Fig
if ~isscalar(M) && ~isvector(M)
error('Only scalar and vector arguments are allowed.');
end
flg = 0;
if ~ischar(M)
M = sprintf('%.0f',M);
flg = 1;
else
M = M(:).'; % Make sure we have a row vector...
end
cnt = 1;
L = length(M);
T = {};
for jj = 1:L-2
for ii = jj+1:L-1
I = strfind(M,M(jj:ii));
if length(I)>1
T{cnt} = sprintf('%s %i',M(jj:ii),length(I));%#ok
cnt = cnt + 1;
end
end
end
if ~isempty(T)
T = regexp(unique(T),'\ ','split');
T = cat(1,T{:});
if flg
T(:,1) = cellfun(@(x) str2num(x.').',T(:,1),'Un',0);%#ok
T(:,2) = cellfun(@str2double,T(:,2),'Un',0);
end
[J,J] = sort(cellfun('length',T(:,1)));
T = T(J,:);
end
  1 Comment
Xiaonan Xing
Xiaonan Xing on 15 Jan 2019
Your function is just too beautiful and I have to log in to my MATLAB account to give you an upvote, also my very first upvote. Thanks for the code!

Sign in to comment.


Andrei Bobrov
Andrei Bobrov on 21 May 2011
variant
A = str2num(num2str(12341212356)')
N = length(A);
out = cell(N-1,1);
for j = 1:N-1
[a, ~, n] = unique(A(bsxfun(@plus,1:j,(0:N-j)')),'rows');
out{j} = [a sum(bsxfun(@(x,y)x==y,1:size(a,1),n))'];
end
repeated combinations:
o2 = cellfun(@(x)x(x(:,end) > 1,:),out,'un',0)
out2 = o2(~cellfun('isempty',o2))
in each cell of the last column the number of combinations, the combination of the left hand side

Categories

Find more on Structures in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!