Convert sequence with ambiguous characters to regular expression


RegExp = seq2regexp(Seq)

RegExp = seq2regexp(Seq, ...'Alphabet', AlphabetValue, ...)
RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...)

Input Arguments


Either of the following:


String specifying the sequence alphabet. Choices are:

  • 'NT' (default) — Nucleotide

  • 'AA' — Amino acid


Controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are:

  • true (default) — Include ambiguous characters in the return value

  • false — Return only unambiguous characters

Output Arguments


Character string of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes.


RegExp = seq2regexp(Seq) converts ambiguous amino acid or nucleotide symbols in a sequence to a regular expression format using IUB/IUPAC codes.

RegExp = seq2regexp(Seq, ...'PropertyName', PropertyValue, ...) calls seq2regexp with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

RegExp = seq2regexp(Seq, ...'Alphabet', AlphabetValue, ...)
specifies the sequence alphabet. AlphabetValue can be either 'NT' for nucleotide sequences or 'AA' for amino acid sequences. Default is 'NT'.

RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...) controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are true (default) or false. For example:

  • If Seq = 'ACGTK', and AmbiguousValue is true , the MATLAB® software returns ACGT[GTK] with the unambiguous characters G and T and the ambiguous character K.

  • If Seq = 'ACGTK', and AmbiguousValue is false, the MATLAB software returns ACGT[GT] with only the unambiguous characters.

Nucleotide Conversion

Nucleotide CodeNucleotideConversion
AAdenosine A
CCytosine C
GGuanine G
TThymidine T
UUridine U
RPurine [AG]
SStrong interaction (3 H bonds) [GC]
WWeak interaction (2 H bonds)[AT]
B Not A[CGT]
D Not C[AGT]
H Not G[ACT]
V Not T or U[ACG]
NAny nucleotide [ACGT]
-Gap of indeterminate length -
?Unknown ?

Amino Acid Conversion

Amino Acid CodeAmino AcidConversion
BAsparagine or Aspartic acid (Aspartate) [DN]
ZGlutamine or Glutamic acid (Glutamate) [EQ]
XAny amino acid[A R N D C Q E G H I L K M F P S T W Y V]


  1. Convert a nucleotide sequence to a regular expression.

    ans =
  2. Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression.

    seq2regexp('ACWTMAN', 'ambiguous', false)
    ans =
Was this topic helpful?