localalign

Return local optimal and suboptimal alignments between two sequences

Syntax


AlignStruct = localalign(Seq1, Seq2)
AlignStruct = localalign(Seq1, Seq2, ...'NumAln', NumAlnValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'MinScore', MinScoreValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'Percent', PercentValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'DoAlignment', DoAlignmentValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'Alphabet', AlphabetValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'ScoringMatrix', ScoringMatrixValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'Scale', ScaleValue, ...)
AlignStruct = localalign(Seq1, Seq2, ...'GapOpen', GapOpenValue, ...)

Description

AlignStruct = localalign(Seq1, Seq2) returns information about the first optimal (highest scoring) local alignment between two sequences in a MATLAB® structure.

AlignStruct = localalign(Seq1, Seq2, ...'PropertyName', PropertyValue, ...) calls localalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Enclose each PropertyName in single quotation marks. Each PropertyName is case insensitive. These property name/property value pairs are as follows:

AlignStruct = localalign(Seq1, Seq2, ...'NumAln', NumAlnValue, ...) returns information about one or more nonintersecting, local alignments (optimal and suboptimal). It limits the number of alignments to return by specifying the number of local alignments to return. It returns the alignments in decreasing order according to their score.

AlignStruct = localalign(Seq1, Seq2, ...'MinScore', MinScoreValue, ...) returns information about nonintersecting, local alignments (optimal and suboptimal), whose score is greater than MinScoreValue.

AlignStruct = localalign(Seq1, Seq2, ...'Percent', PercentValue, ...) returns information about one or more nonintersecting local alignments (optimal and suboptimal), whose scores are within PercentValue percent of the highest score. It returns the alignments in decreasing order according to their score.

AlignStruct = localalign(Seq1, Seq2, ...'DoAlignment', DoAlignmentValue, ...) specifies whether to include the pairwise alignments in the Alignment field of the output structure. Choices are true (default) or false.

AlignStruct = localalign(Seq1, Seq2, ...'Alphabet', AlphabetValue, ...) specifies the type of sequences. Choices are 'AA' (default) or 'NT'.

AlignStruct = localalign(Seq1, Seq2, ...'ScoringMatrix', ScoringMatrixValue, ...) specifies the scoring matrix to use for the local alignment.

AlignStruct = localalign(Seq1, Seq2, ...'Scale', ScaleValue, ...) specifies a scale factor applied to the output scores, thereby controlling the units of the output scores. Choices are any positive value. Default is 1, which does not change the units of the output score.

AlignStruct = localalign(Seq1, Seq2, ...'GapOpen', GapOpenValue, ...) specifies the penalty for opening a gap in the alignment. Choices are any positive value. Default is 8.

Input Arguments

Seq1

First amino acid or nucleotide sequence specified by any of the following:

Seq2

Second amino acid or nucleotide sequence, which localalign aligns with Seq1.

NumAlnValue

Positive scalar (< or = 2^12) specifying the number of alignments to return. localalign returns the top NumAlnValue local, nonintersecting alignments (optimal and suboptimal). If the number of optimal alignments is greater than NumAlnValue, then localalign returns the first NumAlnValue alignments based on their order in the trace back matrix.

    Note:   If you specify a NumAlnValue, you cannot specify a MinScoreValue or PercentValue.

    Tip   Use NumAlnValue to return multiple alignments when you are aligning low complexity sequences and must consider several local alignments.

Default: 1

MinScoreValue

Positive scalar specifying the minimum score of local, nonintersecting alignments (optimal and suboptimal) to return.

    Note:   If you specify a MinScoreValue, you cannot specify a NumAlnValue or PercentValue.

    Tip   Use MinScoreValue to return suboptimal alignments, for example when you are interested in accounting for sequencing errors or imperfect scoring matrices.

PercentValue

Positive scalar between 0 and 100 that limits the return of local, nonintersecting alignments (optimal and suboptimal) to those alignments with a score within PercentValue percent of the highest score. For example, if the highest score is 10.5 and you specify 5 for PercentValue, then localalign determines a minimum score of 10.5 – (10.5 * 0.05) = 9.975. It returns all alignments with a score of 9.975 or higher.

    Note:   If you specify a PercentValue, you cannot specify a NumAlnValue or MinScoreValue.

    Tip   Use PercentValue to return optimal and suboptimal alignments when you do not know how similar the two sequences are or how well they score against a given scoring matrix.

DoAlignmentValue

Controls the inclusion of the pairwise alignments in the Alignment field of the output structure. Choices are true (default) or false.

AlphabetValue

String specifying the type of sequences. Choices are 'AA' (default) or 'NT'.

ScoringMatrixValue

Either of the following:

  • String specifying the scoring matrix to use for the local alignment. Choices for amino acid sequences are:

    • 'BLOSUM62'

    • 'BLOSUM30' increasing by 5 up to 'BLOSUM90'

    • 'BLOSUM100'

    • 'PAM10' increasing by 10 up to 'PAM500'

    • 'DAYHOFF'

    • 'GONNET'

    Default is:

    • 'BLOSUM50' — When AlphabetValue equals 'AA'

    • 'NUC44' — When AlphabetValue equals 'NT'

      Note:   The previous scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the 'Scale' property to specify an additional scale factor to convert the output score from bits to another unit.

  • Matrix representing the scoring matrix to use for the local alignment, such as returned by the blosum, pam, dayhoff, gonnet, or nuc44 function.

      Note:   If you use a scoring matrix that you created or was created by one of the previous functions, the matrix does not include a scale factor. The output score is returned in the same units as the scoring matrix. You can use the 'Scale' property to specify a scale factor to convert the output score to another unit.

    Note:   If you need to compile localalign into a stand-alone application or software component using MATLAB Compiler™, use a matrix instead of a string for ScoringMatrixValue.

ScaleValue

Positive value that specifies a scale factor that is applied to the output scores, thereby controlling the units of the output scores.

For example, if the output score is initially determined in bits, and you enter log(2) for ScaleValue, then localalign returns Score in nats.

Default is 1, which does not change the units of the output score.

    Note:   If the 'ScoringMatrix' property also specifies a scale factor, then localalign uses it first to scale the output score. It then applies the scale factor specified by ScaleValue to rescale the output score.

    Tip   Before comparing alignment scores from multiple alignments, ensure that the scores are in the same units. Use the 'Scale' property to control the units of the output scores.

GapOpenValue

Positive value specifying the penalty for opening a gap in the alignment.

Default: 8

Output Arguments

AlignStruct

MATLAB structure or array of structures containing information about the local optimal and suboptimal alignments between two sequences. Each structure represents an optimal or suboptimal alignment and contains the following fields.

FieldDescription
Score

Score for the local optimal or suboptimal alignment.

Start

1-by-2 vector of indices indicating the starting point in each sequence for the alignment.

Stop

1-by-2 vector of indices indicating the stopping point in each sequence for the alignment.

Alignment

3-by-N character array showing the two sequences, Seq1 and Seq2, in the first and third rows. It also shows symbols representing the optimal or suboptimal local alignment between the two sequences in the second row.

Examples

Limit the number of alignments to return between two sequences by specifying the number of alignments:

% Create variables containing two amino acid sequences.
Seq1 = 'VSPAGMASGYDPGKA';
Seq2 = 'IPGKATREYDVSPAG';

% Use the NumAln property to return information about the
% top three local alignments.
struct1 = localalign(Seq1, Seq2, 'numaln', 3)

struct1 = 

        Score: [3x1 double]
        Start: [3x2 double]
         Stop: [3x2 double]
    Alignment: {3x1 cell}

% View the scores of the first and second alignments.
struct1.Score(1:2)

ans =

   11.0000
    9.6667

% View the first alignment.
struct1.Alignment{1}

ans =

VSPAG
|||||
VSPAG

Limit the number of alignments to return between two sequences by specifying a minimum score:

% Create variables containing two amino acid sequences.
Seq1 = 'VSPAGMASGYDPGKA';
Seq2 = 'IPGKATREYDVSPAG';

% Use the MinScore property to return information about
% only local alignments with a score greater than 8.
% Use the DoAlignment property to exclude the actual alignments.
struct2 = localalign(Seq1,Seq2,'minscore',8,'doalignment',false)

struct2 = 

    Score: [2x1 double]
    Start: [2x2 double]
     Stop: [2x2 double]

Limit the number of alignments to return between two sequences by specifying a percentage from the maximum score:

% Create variables containing two amino acid sequences.
Seq1 = 'VSPAGMASGYDPGKA';
Seq2 = 'IPGKATREYDVSPAG';

% Use the Percent property to return information about only
% local alignments with a score within 15% of the maximum score.
struct3 = localalign(Seq1, Seq2, 'percent', 15) 

struct3 = 

        Score: [2x1 double]
        Start: [2x2 double]
         Stop: [2x2 double]
    Alignment: {2x1 cell}

Specify a scoring matrix and gap opening penalty when aligning two sequences:

% Create variables containing two nucleotide sequences.
Seq1 = 'CCAATCTACTACTGCTTGCAGTAC';
Seq2 = 'AGTCCGAGGGCTACTCTACTGAAC';

% Create a scoring matrix with a match score of 10 and a mismatch
% score of -9
sm = [10 -9 -9 -9;
      -9 10 -9 -9;
      -9 -9 10 -9;
      -9 -9 -9 10];

% Use the ScoringMatrix and GapOpen properties when returning
% information about the top three local alignments.
struct4 = localalign(Seq1, Seq2, 'alpha', 'nt', ...
         'scoringmatrix', sm, 'gapopen', 20, 'numaln', 3)

struct4 = 

        Score: [3x1 double]
        Start: [3x2 double]
         Stop: [3x2 double]
    Alignment: {3x1 cell} 

References

[1] Barton, G. (1993). An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. CABIOS 9, 729–734.

Was this topic helpful?