Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

nt2aa

Convert nucleotide sequence to amino acid sequence

Syntax

SeqAA = nt2aa(SeqNT)
SeqAA = nt2aa(..., 'Frame', FrameValue, ...)
SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...)
SeqAA = nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue, ...)
SeqAA = nt2aa(..., 'ACGTOnly', ACGTOnlyValue, ...)

Input Arguments

SeqNT

One of the following:

Note

Hyphens are valid only if the codon to which it belongs represents a gap, that is, the codon contains all hyphens. Example: ACT---TGA

Tip

Do not use a sequence with hyphens if you specify 'all' for FrameValue.

FrameValue

Integer or character vector specifying a reading frame in the nucleotide sequence. Choices are 1, 2, 3, or 'all'. Default is 1.

If FrameValue is 'all', then SeqAA is a 3-by-1 cell array.

GeneticCodeValue

Integer or character vector specifying a genetic code number or code name from the table Genetic Code. Default is 1 or 'Standard'.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

AlternativeStartCodonsValue

Controls the translation of alternative codons. Choices are true (default) or false.

ACGTOnlyValue

Controls the behavior of ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N) and unknown characters. ACGTOnlyValue can be true (default) or false.

  • If true, then the function errors if any of these characters are present.

  • If false, then the function tries to resolve ambiguities. If it cannot, it returns X for the affected codon.

Output Arguments

SeqAAAmino acid sequence specified by a character vector of single-letter codes.

Description

SeqAA = nt2aa(SeqNT) converts a nucleotide sequence, specified by SeqNT, to an amino acid sequence, returned in SeqAA, using the standard genetic code.

SeqAA = nt2aa(SeqNT, ...'PropertyName', PropertyValue, ...) calls nt2aa with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

SeqAA = nt2aa(..., 'Frame', FrameValue, ...) converts a nucleotide sequence for a specific reading frame to an amino acid sequence. Choices are 1, 2, 3, or 'all'. Default is 1. If FrameValue is 'all', then output SeqAA is a 3-by-1 cell array.

SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...) specifies a genetic code to use when converting a nucleotide sequence to an amino acid sequence. GeneticCodeValue can be an integer or character vector specifying a code number or code name from the table Genetic Code. Default is 1 or 'Standard'. The amino acid to nucleotide codon mapping for the Standard genetic code is shown in the table Standard Genetic Code.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

SeqAA = nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue, ...) controls the translation of alternative start codons. By default, AlternativeStartCodonsValue is set to true, and if the first codon of a sequence is a known alternative start codon, the codon is translated to methionine.

If this option is set to false, then an alternative start codon at the start of a sequence is translated to its corresponding amino acid in the genetic code that you specify, which might not necessarily be methionine. For example, in the human mitochondrial genetic code, AUA and AUU are known to be alternative start codons. For more information on alternative start codons, visit https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1.

For more information about alternative start codons, see:

Genetic Code

Code NumberCode Name
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan, Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean, and Hexamita Nuclear
9Echinoderm Mitochondrial
10Euplotid Nuclear
11Bacterial and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Flatworm Mitochondrial
15Blepharisma Nuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus Obliquus Mitochondrial
23Thraustochytrium Mitochondrial

Standard Genetic Code

Amino Acid NameAmino Acid CodeNucleotide Codon
Alanine AGCT GCC GCA GCG
ArginineRCGT CGC CGA CGG AGA AGG
AsparagineNATT AAC
Aspartic acid (Aspartate) DGAT GAC
CysteineCTGT TGC
GlutamineQCAA CAG
Glutamic acid (Glutamate) EGAA GAG
GlycineGGGT GGC GGA GGG
HistidineHCAT CAC
IsoleucineIATT ATC ATA
LeucineLTTA TTG CTT CTC CTA CTG
LysineKAAA AAG
MethionineMATG
PhenylalanineFTTT TTC
Proline PCCT CCC CCA CCG
SerineSTCT TCC TCA TCG AGT AGC
ThreonineTACT ACC ACA ACG
TryptophanWTGG
TyrosineYTAT, TAC
ValineVGTT GTC GTA GTG
Asparagine or Aspartic acid (Aspartate) B Random codon from D and N
Glutamine or Glutamic acid (Glutamate) ZRandom codon from E and Q
Unknown amino acid (any amino acid) XRandom codon
Translation stop *TAA TAG TGA
Gap of indeterminate length ----
Unknown character (any character or symbol not in table) ????

SeqAA = nt2aa(..., 'ACGTOnly', ACGTOnlyValue, ...) controls the behavior of ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N) and unknown characters. ACGTOnlyValue can be true (default) or false. If true, then the function errors if any of these characters are present. If false, then the function tries to resolve ambiguities. If it cannot, it returns X for the affected codon.

Examples

Example 72. Converting the ND1 Gene
  1. Use the getgenbank function to retrieve genomic information for the human mitochondrion from the GenBank® database and store it in a MATLAB structure .

    mitochondria = getgenbank('NC_012920')
    
    mitochondria = 
    
                    LocusName: 'NC_012920'
          LocusSequenceLength: '16569'
         LocusNumberofStrands: ''
                LocusTopology: 'circular'
            LocusMoleculeType: 'DNA'
         LocusGenBankDivision: 'PRI'
        LocusModificationDate: '05-MAR-2010'
                   Definition: 'Homo sapiens mitochondrion, complete genome.'
                    Accession: 'NC_012920 AC_000021'
                      Version: 'NC_012920.1'
                           GI: '251831106'
                      Project: []
                       DBLink: 'Project:30353'
                     Keywords: []
                      Segment: []
                       Source: 'mitochondrion Homo sapiens (human)'
               SourceOrganism: [4x65 char]
                    Reference: {1x7 cell}
                      Comment: [24x67 char]
                     Features: [933x74 char]
                          CDS: [1x13 struct]
                     Sequence: [1x16569 char]
                    SearchURL: [1x70 char]
                  RetrieveURL: [1x104 char]
  2. Determine the name and location of the first gene in the human mitochondrion.

    mitochondria.CDS(1).gene
    
    ans =
    
    ND1
    mitochondria.CDS(1).location
    ans =
    
    3307..4262
  3. Extract the sequence for the ND1 gene from the nucleotide sequence.

    ND1gene = mitochondria.Sequence(3307:4262);
    
  4. Convert the ND1 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.

    protein1 = nt2aa(ND1gene,'GeneticCode', 2);
    
  5. Use the getgenpept function to retrieve the same amino acid sequence from the GenPept database.

    protein2 = getgenpept('YP_003024026', 'SequenceOnly', true);
    
  6. Use the isequal function to compare the two amino acid sequences.

    isequal (protein1, protein2)
    
    ans =
    
         1
Example 73. Converting the ND2 Gene
  1. Use the getgenbank function to retrieve the nucleotide sequence for the human mitochondrion from the GenBank database.

    mitochondria = getgenbank('NC_012920');
    
  2. Determine the name and location of the second gene in the human mitochondrion.

    mitochondria.CDS(2).gene
    
    ans =
    
    ND2
    mitochondria.CDS(2).location
    ans =
    
    4470..5511
  3. Extract the sequence for the ND2 gene from the nucleotide sequence.

    ND2gene = mitochondria.Sequence(4470:5511);
    
  4. Convert the ND2 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.

    protein1 = nt2aa(ND2gene,'GeneticCode', 2);
    

    Note

    In the ND2gene nucleotide sequence, the first codon is ATT, which is translated to M, while the subsequent ATT codons are translated to I. If you set 'AlternativeStartCodons' to false, then the first ATT codon is translated to I, the corresponding amino acid in the Vertebrate Mitochondrial genetic code.

  5. Use the getgenpept function to retrieve the same amino acid sequence from the GenPept database.

    protein2 = getgenpept('YP_003024027', 'SequenceOnly', true);
    
  6. Use the isequal function to compare the two amino acid sequences.

    isequal (protein1, protein2)
    
    ans =
    
         1
Example 74. Converting a Sequence with Ambiguous Characters

If you have a sequence with ambiguous or unknown nucleotide characters, you can set the 'ACGTOnly' property to false to have the nt2aa function try to resolve them:

nt2aa('agttgccgacgcgcncar','ACGTOnly', false)

ans =

SCRRAQ

Introduced before R2006a

Was this topic helpful?