| Bioinformatics Toolbox™ | ![]() |
RegExp = seq2regexp(Seq)
RegExp = seq2regexp(Seq,
...'Alphabet', AlphabetValue, ...)
RegExp = seq2regexp(Seq,
...'Ambiguous', AmbiguousValue, ...)
| Seq | Either of the following:
|
| AlphabetValue | String specifying the sequence alphabet. Choices are:
|
| AmbiguousValue | Controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are:
|
| RegExp | Character string of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes. |
RegExp = seq2regexp(Seq) converts ambiguous amino acid or nucleotide symbols in a sequence to a regular expression format using IUB/IUPAC codes.
RegExp = seq2regexp(Seq, ...'PropertyName', PropertyValue, ...) calls seq2regexp with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:
RegExp = seq2regexp(Seq,
...'Alphabet', AlphabetValue, ...) specifies
the sequence alphabet. AlphabetValue can be either 'NT' for
nucleotide sequences or 'AA' for amino acid sequences.
Default is 'NT'.
RegExp = seq2regexp(Seq, ...'Ambiguous', AmbiguousValue, ...) controls whether ambiguous characters are included in RegExp, the regular expression return value. Choices are true (default) or false. For example:
If Seq = 'ACGTK', and AmbiguousValue is true , the MATLAB® software returns ACGT[GTK] with the unambiguous characters G and T and the ambiguous character K.
If Seq = 'ACGTK', and AmbiguousValue is false, the MATLAB software returns ACGT[GT] with only the unambiguous characters.
Nucleotide Conversions
| Nucleotide Code | Nucleotide | Conversion |
|---|---|---|
| A | Adenosine | A |
| C | Cytosine | C |
| G | Guanine | G |
| T | Thymidine | T |
| U | Uridine | U |
| R | Purine | [AG] |
| Y | Pyrimidine | [TC] |
| K | Keto | [GT] |
| M | Amino | [AC] |
| S | Strong interaction (3 H bonds) | [GC] |
| W | Weak interaction (2 H bonds) | [AT] |
| B | Not A | [CGT] |
| D | Not C | [AGT] |
| H | Not G | [ACT] |
| V | Not T or U | [ACG] |
| N | Any nucleotide | [ACGT] |
| - | Gap of indeterminate length | - |
| ? | Unknown | ? |
Amino Acid Conversion
| Amino Acid Code | Amino Acid | Conversion |
|---|---|---|
| B | Asparagine or Aspartic acid (Aspartate) | [DN] |
| Z | Glutamine or Glutamic acid (Glutamate) | [EQ] |
| X | Any amino acid | [A R N D C Q E G H I L K M F P S T W Y V] |
Convert a nucleotide sequence into a regular expression.
seq2regexp('ACWTMAN')
ans =
AC[ATW]T[ACM]A[ACGTRYKMSWBDHVN]Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression.
seq2regexp('ACWTMAN', 'ambiguous', false)
ans =
AC[AT]T[AC]A[ACGT]Bioinformatics Toolbox™ functions: restrict, seqwordcount
MATLAB functions: regexp, regexpi
![]() | scfread | seqcomplement | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |