Best solution to finding repeating characters on a line.
Show older comments
I am looking for any instances of two characters (e/d) being repeated in a row greater then or equal to 10. I just want to either print every line that this occurs to the command line or stop and print the location of the stop everytime it is detected. Basically I am trying to find when e and d show up over ten times grouped together in a large data file. For example:
asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs
asseefadfefeeedddeeedddasdfsdf
asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs
asseefadfefeeedddeeedddasdfsdf
The script would then print out line 2 and line 4 in the command line.
Thank you for your help
1 Comment
Rena Berman
on 26 Sep 2023
(Answers Dev) Restored edit
Accepted Answer
More Answers (1)
You say "10 or over", so is it correct that the program needs to all possible patterns? For example,
'adadadadaaaadadadadaaa'
(length 22) should be located if it exists?
S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf'}
matches = regexp(S, '([ad]{5,})\1', 'match');
celldisp(matches)
5 Comments
Matthew Worker
on 13 Jul 2021
S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf', 'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs', 'asseefadfefaaadddaaadddasdfsdf', 'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs', 'asseefadfefaaadddaaadddasdfsdf'}
matchidx = regexp(S, '([ad]{5,})\1', 'once')
S(~cellfun(@isempty, matchidx))
Walter Roberson
on 13 Jul 2021
... Wait, any two characters, or two specific characters?
Matthew Worker
on 13 Jul 2021
Example of reading from file:
%create a file for demonstration purposes only
tname = [tempname() '.txt'];
fid = fopen(tname, 'w');
T = regexprep('asseefadfefaaadddaaadddasdfsdf\nasseeadadadadaaaadadadadaaadfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\n', 'a', 'e');
fprintf(fid, T);
fclose(fid);
%okay, main function
filename = tname;
%okay, main function
S = readlines(filename);
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
%alternative without readlines
S = regexp(fileread(filename), '\r?\n', 'split');
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
%alternative without splitting
S = fileread(filename);
matches = regexp(S, '^.*[de]{10}.*$', 'match', 'dotexceptnewline', 'lineanchors');
matches
Categories
Find more on Programming in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!