not for a string in regular expressions

15 views (last 30 days)
Hadas Lewinsky
Hadas Lewinsky on 8 Feb 2018
Edited: per isakson on 22 Apr 2018
How do I search a sequence for a certain match not containing a certain substring?
As in wanting to search an RNA sequence starting with CG and not containing AG in the middle and then ending with it? When I run
regexp(mRNA, 'GU\w+[^AG]AG');
it gives me the location of matches that dont contain either A or G in the middle, and not the 'AG' substring.
Would really appreciate the help!

Answers (1)

Walter Roberson
Walter Roberson on 8 Feb 2018
Presuming that mRNA is a cell array of character vectors, then
mask = ~cellfun(@isempty, regexp(mRNA, '^CG([^A]|A[^G])*AG$', 'lineanchors', 'once'));
matched_strings = mRNA(mask);

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!