nt2aa

Convert nucleotide sequence to amino acid sequence

collapse all in page

Syntax

SeqAA = nt2aa(SeqNT)

SeqAA = nt2aa(SeqNT,Name=Value)

Description

example

SeqAA = nt2aa(SeqNT) converts a nucleotide sequence to an amino acid sequence using the standard genetic code.

example

SeqAA = nt2aa(SeqNT,Name=Value) uses additional options specified by one or more name-value arguments.

Examples

collapse all

Convert Nucleotide Sequence to Amino Acid Sequence

Open Live Script

Generate a random DNA sequence.

ntSeq = randseq(30)

ntSeq = 
'TTATGACGTTATTCTACTTTGATTGTGCGA'

Convert the DNA sequence to an amino acid sequence using the standard genetic code.

aaSeq = nt2aa(ntSeq)

aaSeq = 
'L*RYSTLIVR'

Generate amino acid sequences for all three reading frames using the yeast mitochondrial genetic code.

aaSeq = nt2aa(ntSeq,Frame="all",GeneticCode=3)

aaSeq = 3x1 cell
    {'LWRYSTLIVR'}
    {'YDVITTWLC' }
    {'MTLFYFDCA' }

Input Arguments

collapse all

`SeqNT` — Nucleotide sequence
character vector | string scalar | row vector of integers | structure

Nucleotide sequence, specified as one of the following.

Character vector or string scalar consisting of the characters A, C, G, T, and U, and ambiguous characters R, Y, K, M, S, W, B, D, H, V, and N.
Row vector of integers specifying a nucleotide sequence. For information on valid integers, see Mapping Nucleotide Integers to Letter Codes.
Structure that contains a nucleotide sequence in the Sequence field. The fastaread, fastqread, emblread, getembl, genbankread, and getgenbank functions return structures with a Sequence field.

Note

Hyphens are valid only if the codon to which it belongs represents a gap, that is, the codon contains all hyphens. For example, ACT---TGA.
Do not use a sequence with hyphens if you specify "all" for Frame.

Example: SeqAA = nt2aa("CGACTT") converts the nucleotide sequence to the amino acid sequence 'RL'.

Data Types: double | char | string | struct

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: SeqAA = nt2aa("CGACTT",Frame=2)

`Frame` — Reading frame
`1` (default) | `2` | `3` | `"all"`

Reading frame, specified as 1, 2, 3, or "all". If you specify "all", the function outputs a 3-by-1 cell array containing the amino acid sequences for all three reading frames.

Example: SeqAA = nt2aa("AAGACT",Frame=3) converts the nucleotide sequence to an amino acid sequence using the third reading frame.

Data Types: double | char | string

`GeneticCode` — Genetic code number or name
`1` (default) | `integer` | `character vector` | `string scalar`

Genetic code number or name, specified as an integer, character vector, or string scalar. This table lists valid genetic code numbers and names.

Genetic Code Number	Genetic Code Name
`1`	`Standard`
`2`	`Vertebrate Mitochondrial`
`3`	`Yeast Mitochondrial`
`4`	`Mold`, `Protozoan`, `Coelenterate Mitochondrial`, and `Mycoplasma/Spiroplasma`
`5`	`Invertebrate Mitochondrial`
`6`	`Ciliate`, `Dasycladacean`, and `Hexamita Nuclear`
`9`	`Echinoderm Mitochondrial`
`10`	`Euplotid Nuclear`
`11`	`Bacterial` and `Plant Plastid`
`12`	`Alternative Yeast Nuclear`
`13`	`Ascidian Mitochondrial`
`14`	`Flatworm Mitochondrial`
`15`	`Blepharisma Nuclear`
`16`	`Chlorophycean Mitochondrial`
`21`	`Trematode Mitochondrial`
`22`	`Scenedesmus Obliquus Mitochondrial`
`23`	`Thraustochytrium Mitochondrial`

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

This table shows the nucleotide codon to amino acid mapping for the standard genetic code.

Amino Acid Name	Amino Acid Code	Nucleotide Codon
Alanine	`A`	`GCT GCC GCA GCG`
Arginine	`R`	`CGT CGC CGA CGG AGA AGG`
Asparagine	`N`	`AAT AAC`
Aspartic acid (Aspartate)	`D`	`GAT GAC`
Cysteine	`C`	`TGT TGC`
Glutamine	`Q`	`CAA CAG`
Glutamic acid (Glutamate)	`E`	`GAA GAG`
Glycine	`G`	`GGT GGC GGA GGG`
Histidine	`H`	`CAT CAC`
Isoleucine	`I`	`ATT ATC ATA`
Leucine	`L`	`TTA TTG† CTT CTC CTA CTG†` † indicates an alternative start codon for the standard genetic code as defined here. If you are using `nt2aa`, alternative start codons are converted to methionine (M) by default when one of these codons is the first codon of a sequence. To change this default behavior, set the `AlternativeStartCodons` name-value argument of `nt2aa` to `false`.
Lysine	`K`	`AAA AAG`
Methionine	`M`	`ATG`
Phenylalanine	`F`	`TTT TTC`
Proline	`P`	`CCT CCC CCA CCG`
Serine	`S`	`TCT TCC TCA TCG AGT AGC`
Threonine	`T`	`ACT ACC ACA ACG`
Tryptophan	`W`	`TGG`
Tyrosine	`Y`	`TAT TAC`
Valine	`V`	`GTT GTC GTA GTG`
Asparagine or Aspartic acid (Aspartate)	`B`	Random codon from `D` and `N`
Glutamine or Glutamic acid (Glutamate)	`Z`	Random codon from `E` and `Q`
Unknown amino acid (any amino acid)	`X`	Random codon
Translation stop	`*`	`TAA TAG TGA`
Gap of indeterminate length	`-`	`---`
Unknown character (any character or symbol not in table)	`?`	`???`

Example: SeqAA = nt2aa("ACGTTA",GeneticCode=2) converts the nucleotide sequence using the vertebrate mitochondrial genetic code.

Data Types: double | char | string

`AlternataiveStartCodons` — Flag to translate alternative start codons
`false` (default) | `true`

Flag to translate alternative start codons, specified as true or false. When true, if the first codon of a sequence is a known alternative start codon, the function translates the codon to methionine (M). When false, the function translates the alternative start codon to its corresponding amino acid.

For more information on alternative start codons, visit https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1.

Example: SeqAA = nt2aa("TTGATC",AlternativeStartCodons=true) converts the first codon to methionine (M) instead of leucine (L).

Data Types: logical

`ACGTOnly` — Flag to control the behavior of ambiguous nucleotides
`true` (default) | `false`

Flag to control the behavior of ambiguous nucleotides (R, Y, K, M, S, W, B, D, H, V, and N), specified as true or false. If you specify true, the function produces an error if any ambiguous nucleotides are present. If you specify false, the function tries to resolve any ambiguities. If it cannot, the function returns X for the affected codon.

Data Types: logical

Output Arguments

collapse all

`SeqAA` — Amino acid sequence
character vector | row vector of integers | cell array

Amino acid sequence, specified as one of the following.

If SeqNT is a character vector or string scalar, then the function returns a character vector.
If SeqNT is a row vector of integers, then the function returns a row vector of integers. For information on valid integers, see Mapping Amino Acid Letter Codes to Integers.
If SeqNT is a structure, then the function returns SeqAA with the same data type as the Sequence field, either a character vector or a row vector of integers.

Setting Frame to "all" directs the function to return a 3-by-1 cell array.

Version History

Introduced before R2006a

nt2aa

Syntax

Description

Examples

Convert Nucleotide Sequence to Amino Acid Sequence

Input Arguments

SeqNT — Nucleotide sequence character vector | string scalar | row vector of integers | structure

Name-Value Arguments

Frame — Reading frame 1 (default) | 2 | 3 | "all"

GeneticCode — Genetic code number or name 1 (default) | integer | character vector | string scalar

AlternataiveStartCodons — Flag to translate alternative start codons false (default) | true

ACGTOnly — Flag to control the behavior of ambiguous nucleotides true (default) | false

Output Arguments

SeqAA — Amino acid sequence character vector | row vector of integers | cell array

Version History

See Also

`SeqNT` — Nucleotide sequence
character vector | string scalar | row vector of integers | structure

`Frame` — Reading frame
`1` (default) | `2` | `3` | `"all"`

`GeneticCode` — Genetic code number or name
`1` (default) | `integer` | `character vector` | `string scalar`

`AlternataiveStartCodons` — Flag to translate alternative start codons
`false` (default) | `true`

`ACGTOnly` — Flag to control the behavior of ambiguous nucleotides
`true` (default) | `false`

`SeqAA` — Amino acid sequence
character vector | row vector of integers | cell array