Documentation Center

  • Trial Software
  • Product Updates

cigar2align

Convert unaligned sequences to aligned sequences using Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings

Syntax

Alignment = cigar2align(Seqs,Cigars)
[GapSeq, Indices] = cigar2align(Seqs,Cigars)
... = cigar2align(Seqs,Cigars,Name,Value)

Description

Alignment = cigar2align(Seqs,Cigars) converts unaligned sequences in Seqs, a cell array of strings, into Alignment, a matrix of aligned sequences, using the information stored in Cigars, a cell array of CIGAR strings.

[GapSeq, Indices] = cigar2align(Seqs,Cigars) converts unaligned sequences in Seqs, a cell array of strings, into GapSeq, a cell array of strings of aligned sequences, and also returns Indices, a vector of numeric indices, using the information stored in Cigars, a cell array of CIGAR strings. When an alignment has many columns, this syntax uses less memory and is faster.

... = cigar2align(Seqs,Cigars,Name,Value) converts unaligned sequences in Seqs, a cell array of strings, into Alignment, a matrix of aligned sequences, using the information stored in Cigars, a cell array of CIGAR strings, with additional options specified by one or more Name,Value pair arguments.

Input Arguments

Seqs

Cell array of strings containing unaligned sequences. Seqs must contain the same number of elements as Cigars.

Cigars

Cell array of valid CIGAR strings. Cigars must contain the same number of elements as Seqs.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Start'

Vector of positive integers specifying the reference sequence position at which each aligned sequence starts. By default, each aligned sequence starts at position 1 of the reference sequence.

'GapsInRef'

Logical specifying whether to display positions in the aligned sequences that correspond to gaps in the reference sequence. Choices are true (1) or false (0). If your reference sequence has gaps and you set GapsInRef to false (0), and then later use Alignment as input to align2cigar, the returned CIGAR strings will not match the original CIGAR strings.

Default: false (0)

'SoftClipping'

Logical specifying whether to include characters in the aligned read sequences corresponding to soft clipping ends. Choices are true (1) or false (0).

Default: false (0)

'OffsetPad'

Logical specifying whether to add padding blanks to the left of each aligned read sequence to represent the offset of the start position from the first position of the reference sequence. Choices are true (1) or false (0). When false, the matrix of aligned sequences starts at the start position of the leftmost aligned read sequence.

Default: false (0)

Output Arguments

Alignment

Matrix of aligned sequences, in which the number of rows equals the number of strings in Seqs.

GapSeq

Cell array of strings of aligned sequences, in which the number strings equals the number of strings in Seqs.

Indices

Vector of numeric indices indicating the starting column for each aligned sequence in Alignment. These indices are not necessarily the same as the start positions in the reference sequence for each aligned sequence. This is because either of the following:

  • The reference sequence can be extended to account for insertions.

  • An aligned sequence can have leading soft clippings, padding, or insertion characters.

Examples

Create a cell array of strings containing unaligned sequences, create a cell array of strings containing corresponding CIGAR strings associated with a reference sequence of ACGTATGC, and then reconstruct the alignment:

r = {'ACGACTGC', 'ACGTTGC', 'AGGTATC'}; % unaligned sequences
c = {'3M1D1M1I3M', '4M1D1P3M', '5M1P1M1D1M'}; % cigar strings
aln1 = cigar2align(r, c)
aln1 =

ACG-ATGC
ACGT-TGC
AGGTAT-C
 

Reconstruct the same alignment to display positions in the aligned sequences that correspond to gaps in the reference sequence:

aln2 = cigar2align(r, c,'GapsInRef',true)
aln2 =

ACG-ACTGC
ACGT--TGC
AGGTA-T-C
 

Reconstruct the alignment adding an offset padding of 5:

aln3 = cigar2align(r, c, 'start', [5 5 5], 'OffsetPad', true)
aln3 =

    ACG-ATGC
    ACGT-TGC
    AGGTAT-C

Alternatives

If your CIGAR information is captured in the Signature property of a BioMap object, you can use the getAlignment method to construct the alignment.

More About

expand all

Algorithms

When cigar2align reconstructs the alignment, it does not display hard clipped positions (H) or soft clipped positions (S). Also, it does not consider soft clipped positions as start positions for aligned sequences.

References

[1] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Goncalo, A., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 16, 2078–2079.

See Also

| | | | |

Was this topic helpful?