This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materials including this page, select Japan from the country navigator on the bottom of this page.


Convert sequence with ambiguous characters to regular expression


RegExp = seq2regexp(Seq)
RegExp = seq2regexp(Seq, ...'Alphabet', AlphabetValue, ...)
RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...)

Input Arguments


Either of the following:


Character vector specifying the sequence alphabet. Choices are:

  • 'NT' (default) — Nucleotide

  • 'AA' — Amino acid


Controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are:

  • true (default) — Include ambiguous characters in the return value

  • false — Return only unambiguous characters

Output Arguments


Character vector of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes.


RegExp = seq2regexp(Seq) converts ambiguous amino acid or nucleotide symbols in a sequence to a regular expression format using IUB/IUPAC codes.

RegExp = seq2regexp(Seq, ...'PropertyName', PropertyValue, ...) calls seq2regexp with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

RegExp = seq2regexp(Seq, ...'Alphabet', AlphabetValue, ...) specifies the sequence alphabet. AlphabetValue can be either 'NT' for nucleotide sequences or 'AA' for amino acid sequences. Default is 'NT'.

RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...) controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are true (default) or false. For example:

  • If Seq = 'ACGTK', and AmbiguousValue is true , the MATLAB® software returns ACGT[GTK] with the unambiguous characters G and T and the ambiguous character K.

  • If Seq = 'ACGTK', and AmbiguousValue is false, the MATLAB software returns ACGT[GT] with only the unambiguous characters.

Nucleotide Conversion

Nucleotide CodeNucleotideConversion
A Adenosine A
C Cytosine C
G Guanine G
T Thymidine T
U Uridine U
R Purine [AG]
Y Pyrimidine[TC]
K Keto[GT]
M Amino[AC]
S Strong interaction (3 H bonds) [GC]
W Weak interaction (2 H bonds)[AT]
B Not A[CGT]
D Not C[AGT]
H Not G[ACT]
V Not T or U[ACG]
N Any nucleotide [ACGT]
- Gap of indeterminate length -
? Unknown ?

Amino Acid Conversion

Amino Acid CodeAmino AcidConversion
B Asparagine or Aspartic acid (Aspartate) [DN]
Z Glutamine or Glutamic acid (Glutamate) [EQ]
X Any amino acid[A R N D C Q E G H I L K M F P S T W Y V]


  1. Convert a nucleotide sequence to a regular expression.

    ans =
  2. Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression.

    seq2regexp('ACWTMAN', 'ambiguous', false)
    ans =

Introduced before R2006a

Was this topic helpful?