Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

nt2int

Convert nucleotide sequence from letter to integer representation

Syntax

SeqInt = nt2int(SeqChar)
SeqInt = nt2int(SeqChar, ...'Unknown', UnknownValue, ...)
SeqInt = nt2int(SeqChar, ...'ACGTOnly', ACGTOnlyValue, ...)

Input Arguments

SeqChar

One of the following:

UnknownValue Integer to represent unknown nucleotides. Choices are integers ≥ 0 and ≤ 255. Default is 0.
ACGTOnlyValueControls the prohibition of ambiguous nucleotides. Choices are true or false (default). If ACGTOnlyValue is true, you can enter only the characters A, C, G, T, and U.

Output Arguments

SeqInt Nucleotide sequence specified by a row vector of integers.

Description

SeqInt = nt2int(SeqChar) converts SeqChar, a character vector of codes specifying a nucleotide sequence, to SeqInt, a row vector of integers specifying the same nucleotide sequence. For valid codes, see the table Mapping Nucleotide Letter Codes to Integers. Unknown characters (characters not in the table) are mapped to 0. Gaps represented with hyphens are mapped to 16.

SeqInt = nt2int(SeqChar, ...'PropertyName', PropertyValue, ...) calls nt2int with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

SeqInt = nt2int(SeqChar, ...'Unknown', UnknownValue, ...) specifies an integer to represent unknown nucleotides. UnknownValue can be an integer ≥ 0 and ≤ 255. Default is 0.

SeqInt = nt2int(SeqChar, ...'ACGTOnly', ACGTOnlyValue, ...) controls the prohibition of ambiguous nucleotides (N, R, Y, K, M, S, W, B, D, H, and V). Choices are true or false (default). If ACGTOnlyValue is true, you can enter only the characters A, C, G, T, and U.

Mapping Nucleotide Letter Codes to Integers

NucleotideCodeInteger
Adenosine A 1
Cytidine C2
Guanine G 3
Thymidine T4
Uridine (if 'Alphabet' set to 'RNA') U 4
Purine (A or G) R 5
Pyrimidine (T or C) Y 6
Keto (G or T) K 7
Amino (A or C) M 8
Strong interaction (3 H bonds) (G or C) S 9
Weak interaction (2 H bonds) (A or T) W 10
Not A (C or G or T)B 11
Not C (A or G or T)D 12
Not G (A or C or T)H 13
Not T or U (A or C or G)V 14
Any nucleotide (A or C or G or T or U) N 15
Gap of indeterminate length- 16
Unknown (any character not in table)*0 (default)

Examples

Example 75. Converting a Simple Sequence

Convert a nucleotide sequence from letters to integers.

s = nt2int('ACTGCTAGC') 

s = 
     1    2    4    3    2    4    1    3    2
Example 76. Converting a Random Sequence
  1. Create a random character vector to represent a nucleotide sequence.

    SeqChar = randseq(20)
    
    SeqChar =
    
    TTATGACGTTATTCTACTTT
  2. Convert the nucleotide sequence from letter to integer representation.

    SeqInt = nt2int(SeqChar)
    
    SeqInt =
    
      Columns 1 through 13
         4    4    1    4    3    1    2    3    4    4    1    4    4
    
      Columns 14 through 20 
         2    4    1    2    4    4    4
    

Introduced before R2006a

Was this topic helpful?