nwalign - Globally align two sequences using Needleman-Wunsch algorithm

Syntax

Score = nwalign(Seq1,Seq2)
[Score, Alignment] = nwalign(Seq1,Seq2)
[Score, Alignment, Start] = nwalign(Seq1,Seq2)

... = nwalign(Seq1,Seq2, ...'Alphabet', AlphabetValue, ...)
... = nwalign(Seq1,Seq2, ...'ScoringMatrix', ScoringMatrixValue, ...)
... = nwalign(Seq1,Seq2, ...'Scale', ScaleValue, ...)
... = nwalign(Seq1,Seq2, ...'GapOpen', GapOpenValue, ...)
... = nwalign(Seq1,Seq2, ...'ExtendGap', ExtendGapValue, ...)
... = nwalign(Seq1,Seq2, ...'Showscore', ShowscoreValue, ...)

Arguments

Seq1, Seq2Amino acid or nucleotide sequences. Enter any of the following:
  • Character string of letters representing amino acids or nucleotides, such as returned by int2aa or int2nt

  • Vector of integers representing amino acids or nucleotides, such as returned by aa2int or nt2int

  • Structure containing a Sequence field

AlphabetValueString specifying the type of sequence. Choices are 'AA' (default) or 'NT'.
ScoringMatrixValue

String specifying the scoring matrix to use for the global alignment. Choices for amino acid sequences are:

  • 'PAM40'

  • 'PAM250'

  • 'DAYHOFF'

  • 'GONNET'

  • 'BLOSUM30' increasing by 5 up to 'BLOSUM90'

  • 'BLOSUM62'

  • 'BLOSUM100'

Default is:

  • 'BLOSUM50' (when AlphabetValue equals 'AA')

  • 'NUC44' (when AlphabetValue equals 'NT')

    Note   All of the above scoring matrices have a built-in scale factor that returns Score in bits.

ScaleValuePositive value that specifies the scale factor used to return Score in arbitrary units other than bits. For example, if you enter log(2) for ScaleValue, then nwalign returns Score in nats.
GapOpenValuePositive integer specifying the penalty for opening a gap in the alignment. Default is 8.
ExtendGapValuePositive integer specifying the penalty for extending a gap. Default is equal to GapOpenValue.
ShowscoreValueControls the display of the scoring space and the winning path of the alignment. Choices are true or false (default).

Return Values

ScoreOptimal global alignment score in bits.
Alignment 3-by-N character array showing the two sequences, Seq1 and Seq2, in the first and third rows, and symbols representing the optimal global alignment for them in the second row.
Start2-by-1 vector of indices indicating the starting point in each sequence for the alignment. Because this is a global alignment, Start is always [1;1].

Description

Score = nwalign(Seq1,Seq2) returns the optimal global alignment score in bits. The scale factor used to calculate the score is provided by the scoring matrix.

[Score, Alignment] = nwalign(Seq1,Seq2) returns a 3-by-N character array showing the two sequences, Seq1 and Seq2, in the first and third rows, and symbols representing the optimal global alignment for them in the second row. The symbol | indicates amino acids or nucleotides that match exactly. The symbol : indicates amino acids or nucleotides that are related as defined by the scoring matrix (nonmatches with a zero or positive scoring matrix value).

[Score, Alignment, Start] = nwalign(Seq1,Seq2) returns a 2-by-1 vector of indices indicating the starting point in each sequence for the alignment. Because this is a global alignment, Start is always [1;1].

... = nwalign(Seq1,Seq2, ...'PropertyName', PropertyValue, ...) calls nwalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


... = nwalign(Seq1,Seq2, ...'Alphabet', AlphabetValue, ...)
specifies the type of sequences. Choices are 'AA' (default) or 'NT'.

... = nwalign(Seq1,Seq2, ...'ScoringMatrix', ScoringMatrixValue, ...) specifies the scoring matrix to use for the global alignment. Default is:

... = nwalign(Seq1,Seq2, ...'Scale', ScaleValue, ...) specifies the scale factor used to return Score in arbitrary units other than bits. Choices are any positive value.

... = nwalign(Seq1,Seq2, ...'GapOpen', GapOpenValue, ...) specifies the penalty for opening a gap in the alignment. Choices are any positive integer. Default is 8.

... = nwalign(Seq1,Seq2, ...'ExtendGap', ExtendGapValue, ...) specifies the penalty for extending a gap in the alignment. Choices are any positive integer. Default is equal to GapOpenValue.

... = nwalign(Seq1,Seq2, ...'Showscore', ShowscoreValue, ...) controls the display of the scoring space and winning path of the alignment. Choices are true or false (default)

The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences. The color of each (n1,n2) coordinate in the scoring space represents the best score for the pairing of subsequences Seq1(1:n1) and Seq2(1:n2), where n1 is a position in Seq1 and n2 is a position in Seq2. The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties.

The winning path is represented by black dots in the scoring space and represents the pairing of positions in the optimal global alignment. The color of the last point (lower right) of the winning path represents the optimal global alignment score for the two sequences and is the Score output returned by nwalign.

Examples

  1. Globally align two amino acid sequences using the BLOSUM50 (default) scoring matrix and the default values for the GapOpen and ExtendGap properties. Return the optimal global alignment score in bits and the alignment character array.

    [Score, Alignment] = nwalign('VSPAGMASGYD','IPGKASYD')
    Score =
    
        7.3333
    
    Alignment =
    
    VSPAGMASGYD
    : | | || ||
    I-P-GKAS-YD
  2. Globally align two amino acid sequences specifying the PAM250 scoring matrix and a gap open penalty of 5.

    [Score, Alignment] = nwalign('IGRHRYHIGG','SRYIGRG',...
                                 'scoringmatrix','pam250',...
                                 'gapopen',5)
    Score =
    
        2.3333
    
    Alignment =
    
    IGRHRYHIG-G
     :  || || |
    -S--RY-IGRG
    
  3. Globally align two amino acid sequences returning the Score in nat units (nats) by specifying a scale factor of log(2).

    [Score, Alignment] = nwalign('HEAGAWGHEE','PAWHEAE','Scale',log(2))
                                 
    Score =
    
        0.2310
    
    Alignment =
    
    HEAGAWGHE-E
        || || |
    --P-AW-HEAE

References

[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).

See Also

Bioinformatics Toolbox™ functions: blosum, multialign, nt2aa, pam, profalign, seqdotplot, showalignment, swalign

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS