nt2aa

Convert nucleotide sequence to amino acid sequence

Syntax

SeqAA = nt2aa(SeqNT)

SeqAA = nt2aa(..., 'Frame', FrameValue, ...)
SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...)
SeqAA = nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue, ...)
SeqAA = nt2aa(..., 'ACGTOnly', ACGTOnlyValue, ...)

Input Arguments

SeqNT

One of the following:

    Note:   Hyphens are valid only if the codon to which it belongs represents a gap, that is, the codon contains all hyphens. Example: ACT---TGA

    Tip   Do not use a sequence with hyphens if you specify 'all' for FrameValue.

FrameValue

Integer or string specifying a reading frame in the nucleotide sequence. Choices are 1, 2, 3, or 'all'. Default is 1.

If FrameValue is 'all', then SeqAA is a 3-by-1 cell array.

GeneticCodeValue

Integer or string specifying a genetic code number or code name from the table Genetic Code. Default is 1 or 'Standard'.

    Tip   If you use a code name, you can truncate the name to the first two letters of the name.

AlternativeStartCodonsValue

Controls the translation of alternative codons. Choices are true (default) or false.

ACGTOnlyValue

Controls the behavior of ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N) and unknown characters. ACGTOnlyValue can be true (default) or false.

  • If true, then the function errors if any of these characters are present.

  • If false, then the function tries to resolve ambiguities. If it cannot, it returns X for the affected codon.

Output Arguments

SeqAAAmino acid sequence specified by a string of single-letter codes.

Description

SeqAA = nt2aa(SeqNT) converts a nucleotide sequence, specified by SeqNT, to an amino acid sequence, returned in SeqAA, using the standard genetic code.

SeqAA = nt2aa(SeqNT, ...'PropertyName', PropertyValue, ...) calls nt2aa with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


SeqAA = nt2aa(..., 'Frame', FrameValue, ...)
converts a nucleotide sequence for a specific reading frame to an amino acid sequence. Choices are 1, 2, 3, or 'all'. Default is 1. If FrameValue is 'all', then output SeqAA is a 3-by-1 cell array.

SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...) specifies a genetic code to use when converting a nucleotide sequence to an amino acid sequence. GeneticCodeValue can be an integer or string specifying a code number or code name from the table Genetic Code. Default is 1 or 'Standard'. The amino acid to nucleotide codon mapping for the Standard genetic code is shown in the table Standard Genetic Code.

    Tip   If you use a code name, you can truncate the name to the first two letters of the name.

SeqAA = nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue, ...) controls the translation of alternative start codons. By default, AlternativeStartCodonsValue is set to true, and if the first codon of a sequence is a known alternative start codon, the codon is translated to methionine.

If this option is set to false, then an alternative start codon at the start of a sequence is translated to its corresponding amino acid in the genetic code that you specify, which might not necessarily be methionine. For example, in the human mitochondrial genetic code, AUA and AUU are known to be alternative start codons.

For more information about alternative start codons, see:

www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1

Genetic Code

Code NumberCode Name
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan, Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean, and Hexamita Nuclear
9Echinoderm Mitochondrial
10Euplotid Nuclear
11Bacterial and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Flatworm Mitochondrial
15Blepharisma Nuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus Obliquus Mitochondrial
23Thraustochytrium Mitochondrial

Standard Genetic Code

Amino Acid NameAmino Acid CodeNucleotide Codon
Alanine AGCT GCC GCA GCG
ArginineRCGT CGC CGA CGG AGA AGG
AsparagineNATT AAC
Aspartic acid (Aspartate) DGAT GAC
CysteineCTGT TGC
GlutamineQCAA CAG
Glutamic acid (Glutamate) EGAA GAG
GlycineGGGT GGC GGA GGG
HistidineHCAT CAC
IsoleucineIATT ATC ATA
LeucineLTTA TTG CTT CTC CTA CTG
LysineKAAA AAG
MethionineMATG
PhenylalanineFTTT TTC
Proline PCCT CCC CCA CCG
SerineSTCT TCC TCA TCG AGT AGC
ThreonineTACT ACC ACA ACG
TryptophanWTGG
TyrosineYTAT, TAC
ValineVGTT GTC GTA GTG
Asparagine or Aspartic acid (Aspartate) B Random codon from D and N
Glutamine or Glutamic acid (Glutamate) ZRandom codon from E and Q
Unknown amino acid (any amino acid) XRandom codon
Translation stop *TAA TAG TGA
Gap of indeterminate length ----
Unknown character (any character or symbol not in table) ????

SeqAA = nt2aa(..., 'ACGTOnly', ACGTOnlyValue, ...) controls the behavior of ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N) and unknown characters. ACGTOnlyValue can be true (default) or false. If true, then the function errors if any of these characters are present. If false, then the function tries to resolve ambiguities. If it cannot, it returns X for the affected codon.

Examples

Converting the ND1 Gene

  1. Use the getgenbank function to retrieve genomic information for the human mitochondrion from the GenBank® database and store it in a MATLAB structure .

    mitochondria = getgenbank('NC_012920')
    
    mitochondria = 
    
                    LocusName: 'NC_012920'
          LocusSequenceLength: '16569'
         LocusNumberofStrands: ''
                LocusTopology: 'circular'
            LocusMoleculeType: 'DNA'
         LocusGenBankDivision: 'PRI'
        LocusModificationDate: '05-MAR-2010'
                   Definition: 'Homo sapiens mitochondrion, complete genome.'
                    Accession: 'NC_012920 AC_000021'
                      Version: 'NC_012920.1'
                           GI: '251831106'
                      Project: []
                       DBLink: 'Project:30353'
                     Keywords: []
                      Segment: []
                       Source: 'mitochondrion Homo sapiens (human)'
               SourceOrganism: [4x65 char]
                    Reference: {1x7 cell}
                      Comment: [24x67 char]
                     Features: [933x74 char]
                          CDS: [1x13 struct]
                     Sequence: [1x16569 char]
                    SearchURL: [1x70 char]
                  RetrieveURL: [1x104 char]
  2. Determine the name and location of the first gene in the human mitochondrion.

    mitochondria.CDS(1).gene
    
    ans =
    
    ND1
    mitochondria.CDS(1).location
    ans =
    
    3307..4262
  3. Extract the sequence for the ND1 gene from the nucleotide sequence.

    ND1gene = mitochondria.Sequence(3307:4262);
    
  4. Convert the ND1 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.

    protein1 = nt2aa(ND1gene,'GeneticCode', 2);
    
  5. Use the getgenpept function to retrieve the same amino acid sequence from the GenPept database.

    protein2 = getgenpept('YP_003024026', 'SequenceOnly', true);
    
  6. Use the isequal function to compare the two amino acid sequences.

    isequal (protein1, protein2)
    
    ans =
    
         1

Converting the ND2 Gene

  1. Use the getgenbank function to retrieve the nucleotide sequence for the human mitochondrion from the GenBank database.

    mitochondria = getgenbank('NC_012920');
    
  2. Determine the name and location of the second gene in the human mitochondrion.

    mitochondria.CDS(2).gene
    
    ans =
    
    ND2
    mitochondria.CDS(2).location
    ans =
    
    4470..5511
  3. Extract the sequence for the ND2 gene from the nucleotide sequence.

    ND2gene = mitochondria.Sequence(4470:5511);
    
  4. Convert the ND2 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.

    protein1 = nt2aa(ND2gene,'GeneticCode', 2);
    

      Note:   In the ND2gene nucleotide sequence, the first codon is ATT, which is translated to M, while the subsequent ATT codons are translated to I. If you set 'AlternativeStartCodons' to false, then the first ATT codon is translated to I, the corresponding amino acid in the Vertebrate Mitochondrial genetic code.

  5. Use the getgenpept function to retrieve the same amino acid sequence from the GenPept database.

    protein2 = getgenpept('YP_003024027', 'SequenceOnly', true);
    
  6. Use the isequal function to compare the two amino acid sequences.

    isequal (protein1, protein2)
    
    ans =
    
         1

Converting a Sequence with Ambiguous Characters

If you have a sequence with ambiguous or unknown nucleotide characters, you can set the 'ACGTOnly' property to false to have the nt2aa function try to resolve them:

nt2aa('agttgccgacgcgcncar','ACGTOnly', false)

ans =

SCRRAQ
Was this topic helpful?