Skip to Main Content Skip to Search
Product Documentation

seq2regexp - Convert sequence with ambiguous characters to regular expression

Syntax

RegExp = seq2regexp(Seq)

RegExp = seq2regexp(Seq, ...'Alphabet', AlphabetValue, ...)
RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...)

Input Arguments

Seq

Either of the following:

AlphabetValue

String specifying the sequence alphabet. Choices are:

  • 'NT' (default) — Nucleotide

  • 'AA' — Amino acid

AmbiguousValue

Controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are:

  • true (default) — Include ambiguous characters in the return value

  • false — Return only unambiguous characters

Output Arguments

RegExp

Character string of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes.

Description

RegExp = seq2regexp(Seq) converts ambiguous amino acid or nucleotide symbols in a sequence to a regular expression format using IUB/IUPAC codes.

RegExp = seq2regexp(Seq, ...'PropertyName', PropertyValue, ...) calls seq2regexp with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


RegExp = seq2regexp(Seq, ...'Alphabet', AlphabetValue, ...)
specifies the sequence alphabet. AlphabetValue can be either 'NT' for nucleotide sequences or 'AA' for amino acid sequences. Default is 'NT'.

RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...) controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are true (default) or false. For example:

Nucleotide Conversions

Nucleotide CodeNucleotideConversion
AAdenosine A
CCytosine C
GGuanine G
TThymidine T
UUridine U
RPurine [AG]
YPyrimidine[TC]
KKeto[GT]
MAmino[AC]
SStrong interaction (3 H bonds) [GC]
WWeak interaction (2 H bonds)[AT]
B Not A[CGT]
D Not C[AGT]
H Not G[ACT]
V Not T or U[ACG]
NAny nucleotide [ACGT]
-Gap of indeterminate length -
?Unknown ?

Amino Acid Conversion

Amino Acid CodeAmino AcidConversion
BAsparagine or Aspartic acid (Aspartate) [DN]
ZGlutamine or Glutamic acid (Glutamate) [EQ]
XAny amino acid[A R N D C Q E G H I L K M F P S T W Y V]

Examples

  1. Convert a nucleotide sequence to a regular expression.

    seq2regexp('ACWTMAN')
    
    ans =
    AC[ATW]T[ACM]A[ACGTRYKMSWBDHVN]
  2. Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression.

    seq2regexp('ACWTMAN', 'ambiguous', false)
    
    ans =
    AC[AT]T[AC]A[ACGT]

See Also

regexp | regexpi | restrict | seqwordcount

  


Free Computational Biology Interactive Kit

See how to analyze, visualize, and model biological data and systems using MathWorks products.

Get free kit

Trials Available

Try the latest computational biology products.

Get trial software
 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS