multialign - Align multiple sequences using progressive method

Syntax

SeqsMultiAligned = multialign(Seqs)
SeqsMultiAligned = multialign(Seqs, Tree)

multialign(..., 'PropertyName', PropertyValue,...)
multialign(..., 'Weights', WeightsValue)
multialign(..., 'ScoringMatrix', ScoringMatrixValue)
multialign(..., 'SMInterp', SMInterpValue)
multialign(..., 'GapOpen', GapOpenValue)
multialign(..., 'ExtendGap', ExtendGapValue)
multialign(..., 'DelayCutoff', DelayCutoffValue)
multialign(..., 'JobManager', JobManagerValue)
multialign(..., 'WaitInQueue', WaitInQueueValue)
multialign(..., 'Verbose', VerboseValue)
multialign(..., 'ExistingGapAdjust', ExistingGapAdjustValue)
multialign(..., 'TerminalGapAdjust', TerminalGapAdjustValue)

Arguments

Seqs

Vector of structures with the fields 'Sequence' for the residues and 'Header' or 'Name' for the labels.

Seqs may also be a cell array of strings or a char array.

SeqsMultiAligned

Vector of structures (same as Seqs) but with the field 'Sequence' updated with the alignment.

When Seqs is a cell or char array, SeqsMultiAligned is a char array with the output alignment following the same order as the input.

TreePhylogenetic tree calculated with either of the functions seqlinkage or seqneighjoin.
WeightsValueProperty to select the sequence weighting method. Enter either 'THG' (default) or 'equal'.
ScoringMatrixValueProperty to select or specify the scoring matrix. Enter an [MxM] matrix or [MxMxN] array of matrixes withN user-defined scoring matrices. ScoringMatrixValuemay also be a cell array of strings with matrix names.

The default is the BLOSUM80 to BLOSUM30 series for amino acids or a fixed matrix NUC44 for nucleotides. When passing your own series of scoring matrices make sure all of them share the same scale.

SMInterpValueProperty to specify whether linear interpolation of the scoring matrices is on or off. When false, scoring matrix is assigned to a fixed range depending on the distances between the two profiles (or sequences) being aligned. Default is true.
GapOpenValueScalar or a function specified using @. If you enter a function,multialign passes four values to the function: the average score for two matched residues (sm), the average score for two mismatched residues (sx), and, the length of both profiles or sequences (len1, len2). Default is @(sm,sx,len1,len2) 5*sm.
ExtendGapValueScalar or a function specified using @. IF you enter a function, multiialign passes four values to the function: the average score for two matched residues (sm), the average score for two mismatched residues (sx), and the length of both profiles or sequences (len1, len2). Default is @(sm,sx,len1,len2) sm/4.
DelayCutoffValueProperty to specify the threshold delay of divergent sequences. The default is unity where sequences with the closest sequence farther than the median distance are delayed.
JobManagerValueJobManager object representing an available distributed MATLAB® resource. Enter a jobmanager object returned by the Parallel Computing Toolbox™ function findResource.
WaitInQueueValueProperty to control waiting for a distributed MATLAB resource to be available. Enter either true or false. The default value is false.
VerboseValueProperty to control displaying the sequences with sequence information. Default value is false.
ExistingGagAdjustValueProperty to control automatic adjustment based on existing gaps. Default value is true.
TerminalGapAdjustValueProperty to adjusts the penalty for opening a gap at the ends of the sequence. Default value is false.

Description

SeqsMultiAligned = multialign(Seqs) performs a progressive multiple alignment for a set of sequences (Seqs). Pairwise distances between sequences are computed after pairwise alignment with the Gonnet scoring matrix and then by counting the proportion of sites at which each pair of sequences are different (ignoring gaps). The guide tree is calculated by the neighbor-joining method assuming equal variance and independence of evolutionary distance estimates.

SeqsMultiAligned = multialign(Seqs, Tree) uses a tree (Tree) as a guide for the progressive alignment. The sequences (Seqs) should have the same order as the leaves in the tree (Tree) or use a field ('Header' or 'Name') to identify the sequences.


multialign(..., 'PropertyName', PropertyValue,...)
enters optional arguments as property name/value pairs.

multialign(..., 'Weights', WeightsValue) selects the sequence weighting method. Weights emphasize highly divergent sequences by scaling the scoring matrix and gap penalties. Closer sequences receive smaller weights.

Values of the property Weights:

multialign(..., 'ScoringMatrix', ScoringMatrixValue) selects the scoring matrix (ScoringMatrixValue) for the progressive alignment. Match and mismatch scores are interpolated from the series of scoring matrices by considering the distances between the two profiles or sequences being aligned. The first matrix corresponds to the smallest distance and the last matrix to the largest distance. Intermediate distances are calculated using linear interpolation.

multialign(..., 'SMInterp', SMInterpValue), when SMInterpValue is false, turns off the linear interpolation of the scoring matrices. Instead, each supplied scoring matrix is assigned to a fixed range depending on the distances between the two profiles or sequences being aligned.

multialign(..., 'GapOpen', GapOpenValue) specifies the initial penalty for opening a gap.

multialign(..., 'ExtendGap', ExtendGapValue) specifies the initial penalty for extending a gap.

multialign(..., 'DelayCutoff', DelayCutoffValue) specifies a threshold to delay the alignment of divergent sequences whose closest neighbor is farther than

(DelayCutoffValue) * (median patristic distance between sequences)

multialign(..., 'JobManager', JobManagerValue) distributes pairwise alignments into a cluster of computers using the Parallel Computing Toolbox software.

multialign(..., 'WaitInQueue', WaitInQueueValue) when WaitInQueueValue is true, waits in the job manager queue for an available worker. When WaitInQueueValue is false (default) and there are no workers immediately available, multialign errors out. Use this property with the Parallel Computing Toolbox™ software and the multialign property WaitInQueue.

multialign(..., 'Verbose', VerboseValue), when VerboseValue is true, turns on verbosity.

The remaining input optional arguments are analogous to the function profalign and are used through every step of the progressive alignment of profiles.

multialign(..., 'ExistingGapAdjust', ExistingGapAdjustValue), if ExistingGapAdjustValue is false, turns off the automatic adjustment based on existing gaps of the position-specific penalties for opening a gap.

When ExistingGapAdjustValue is true, for every profile position, profalign proportionally lowers the penalty for opening a gap toward the penalty of extending a gap based on the proportion of gaps found in the contiguous symbols and on the weight of the input profile.

multialign(..., 'TerminalGapAdjust', TerminalGapAdjustValue), when TerminalGapAdjustValue is true, adjusts the penalty for opening a gap at the ends of the sequence to be equal to the penalty for extending a gap.

Example1

  1. Align seven cellular tumor antigen p53 sequences.

    p53 = fastaread('p53samples.txt')
    ma = multialign(p53,'verbose',true)
    showalignment(ma)

  2. Use an UPGMA phylogenetic tree instead as a guiding tree.

    dist = seqpdist(p53,'ScoringMatrix',gonnet);
    tree = seqlinkage(dist,'UPGMA',p53)
    
    Phylogenetic tree object with 7 leaves (6 branches)
  3. Score the progressive alignment with the PAM family.

    ma = multialign(p53,tree,'ScoringMatrix',...
                    {'pam150','pam200','pam250'})
    showalignment(ma)

Example 2

  1. Enter an array of sequences.

    seqs = {'CACGTAACATCTC','ACGACGTAACATCTTCT','AAACGTAACATCTCGC'};
    
  2. Promote terminations with gaps in the alignment.

    multialign(seqs,'terminalGapAdjust',true)
    
    ans =
    --CACGTAACATCTC--
    ACGACGTAACATCTTCT
    -AAACGTAACATCTCGC
  3. Compare alignment without termination gap adjustment.

    multialign(seqs)
    
    ans =
    CA--CGTAACATCT--C
    ACGACGTAACATCTTCT
    AA-ACGTAACATCTCGC

See Also

Bioinformatics Toolbox™ functions: hmmprofalign, multialignread, nwalign, profalign, seqprofile, seqconsensus, seqneighjoin, showalignment

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS