Skip to Main Content Skip to Search
Product Documentation

bamread - Read data from Binary Sequence Alignment/Map (BAM) file

Syntax

BAMStruct = bamread(File,RefSeq,Range)
[BAMStruct,HeaderStruct] = bamread(File,RefSeq,Range)
... = bamread(File,RefSeq,Range,Name,Value)

Description

BAMStruct = bamread(File,RefSeq,Range) reads the alignment records in File, a BAM-formatted file, that align to RefSeq, a reference sequence, in the range specified by Range. It returns the alignment data in BAMStruct, a MATLAB array of structures.

[BAMStruct,HeaderStruct] = bamread(File,RefSeq,Range) also returns the header information in HeaderStruct, a MATLAB structure.

... = bamread(File,RefSeq,Range,Name,Value) reads the alignment records with additional options specified by one or more Name,Value pair arguments.

Tips

Input Arguments

File

String specifying a file name or path and file name of a BAM-formatted file. If you specify only a file name, that file must be on the MATLAB search path or in the Current Folder.

RefSeq

Either of the following:

  • String specifying the name of a reference sequence in the BAM file.

  • Positive integer specifying the index of a reference sequence in the BAM file. This number is also the index of the reference sequence in the Reference field of the InfoStruct structure returned by baminfo.

Range

Two-element vector specifying the begin and end range positions on the reference sequence, RefSeq. Both values must be positive, and are one-based. The second value must be ≥ to the first value.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments, where Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Full'

Controls the return of only alignment records that are fully contained within the range specified by Range. Choices are true or false (default).

Default: false

'Tags'

Controls the reading of the optional tags in addition to the first 11 fields for each alignment in the BAM-formatted file. Choices are true (default) or false.

Default: true

'Index'

MATLAB array of structures that specifies the offsets into the compressed Binary Sequence Alignment/Map (BAM) file and decompressed data block for each reference sequence and range of positions (bins) on each reference sequence. The bamindexread function returns this structure. The bamread function uses this index structure to extract alignment records in a specified range of a specific reference sequence. Providing this index structure improves performance when reading from the same file multiple times. If you do not specify this index structure, bamread calls bamindexread to create it.

'ToFile'

String specifying a nonexisting file name or a path and file name for saving the alignment records in the specified range of a specific reference sequence. The ToFile name-value pair argument creates a SAM-formatted file. If you specify only a file name, the file is saved to the MATLAB Current Folder.

The SAM-formatted file is always one-based, even if you set the ZeroBased name-value pair argument to true. You can use the SAM-formatted file as input when creating a BioMap object.

'ZeroBased'

Logical specifying whether bamread uses zero-based indexing when reading a file. The logical controls the return of zero-based or one-based positions in the Position and MatePosition fields in BAMStruct. Choices are true or false (default), which returns one-based positions.

This name-value pair argument affects the Position and MatePosition fields of BAMStruct. It does not affect the Range input argument or the SAM file created when using the ToFile name-value pair argument. SAM files are always one-based.

    Caution   If you plan to use the BAMStruct output argument to construct a BioMap object, make sure the ZeroBased name-value pair argument is false.

Default: false

Output Arguments

BAMStruct

An N-by-1 array of structures containing sequence alignment and mapping information from a BAM-formatted file, where N is the number of alignment records stored in the specified range. Each structure contains the following fields.

FieldDescription
QueryName

Name of the read sequence (if unpaired) or the name of sequence pair (if paired).

Flag

Integer indicating the bit-wise information that specifies the status of each of 11 flags described by the SAM format specification.

    Tip   You can use the bitget function to determine the status of a specific SAM flag.

ReferenceIndex

Index of the reference sequence.

    Tip   To convert this index to a reference name, see the Reference field in the HeaderStruct output argument

PositionPosition of the forward reference sequence where the leftmost base of the alignment of the read sequence starts. This position is zero-based or one-based, depending on the ZeroBased name-value pair argument.
MappingQualityInteger specifying the mapping quality score for the read sequence.
CigarStringCIGAR-formatted string representing how the read sequence aligns with the reference sequence.
MateReferenceIndexIndex of the reference sequence associated with the mate. If there is no mate, then this value is 0.
MatePositionPosition of the forward reference sequence where the leftmost base of the alignment of the mate of the read sequence starts. This position is zero-based or one-based, depending on the ZeroBased name-value pair argument.
InsertSizeThe number of base positions between the read sequence and its mate, when both are mapped to the same reference sequence. Otherwise, this value is 0.
SequenceString containing the letter representations of the read sequence. It is the reverse complement if the read sequence aligns to the reverse strand of the reference sequence.
QualityString containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.
TagsList of applicable SAM tags and their values.

HeaderStruct

MATLAB structure containing header information for the BAM-formatted file in the following fields.

FieldDescription
NRefsNumber of reference sequences in the BAM-formatted file.
Reference

1-by-NRefs array of structures containing these fields:

  • Name — Name of the reference sequence.

  • Length — Length of the reference sequence.

Header*Structure containing the file format version, sort order, and group order.
SequenceDictionary*

Structure containing the:

  • Sequence name

  • Sequence length

  • Genome assembly identifier

  • MD5 checksum of sequence

  • URI of sequence

  • Species

ReadGroup*

Structure containing the:

  • Read group identifier

  • Sample

  • Library

  • Description

  • Platform unit

  • Predicted median insert size

  • Sequencing center

  • Date

  • Platform

Program*

Structure containing the:

  • Program name

  • Version

  • Command line

* These structures and their fields appear in the output structure only if they are present in the BAM file. The information in these structures depends on the information present in the BAM file.

Examples

Read the header information and the alignment data from the ex1.bam file included with Bioinformatics Toolbox. Read only alignment records that align to the 100 to 200 bp range of the seq1 reference sequence. Return the information in two separate variables.

[data header] = bamread('ex1.bam', 'seq1', [100 200]);
 

Read the BAM index file associated with the ex1.bam file, and then use the return structure to read multiple alignment records from the ex1.bam file that align to two different reference sequences:

ind = bamindexread('ex1.bam');
data1 = bamread('ex1.bam', 'seq1', [100 200], 'index', ind);
data2 = bamread('ex1.bam', 'seq2', [100 200], 'index', ind);
 

Read alignments from the ex1.bam file that are fully contained in the 100 to 200 bp range of the seq1 reference sequence:

data3 = bamread('ex1.bam', 'seq1', [100 200], 'full', true, 'index', ind);
 

Read alignments from the ex1.bam file that align to the 100 to 500 bp range of the seq1 reference sequence, excluding the tags from the BAM-formatted file. Also save the alignment records to a SAM-formatted file, named ex1_example.sam.

bamread('ex1.bam','seq1', [100 500], 'tags', false, 'tofile', 'ex1_example.sam', 'index', ind);
 

Read alignments from the ex1.bam file that align to the 100 to 300 bp range of the seq1 reference sequence. Read the same alignments using zero-based indexing. Compare the position of the 27th record in the two outputs.

data_one = bamread('ex1.bam','seq1', [100 300]);
data_zero = bamread('ex1.bam','seq1', [100 300], 'zerobased', true);
data_one(27).Position
ans =

         135
data_zero(27).Position
ans =

         134

References

[1] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Goncalo, A., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 16, 2078–2079.

See Also

bamindexread | baminfo | BioIndexedFile | BioMap | bowtieread | fastainfo | fastaread | fastawrite | fastqinfo | fastqread | fastqwrite | saminfo | samread | sffinfo | sffread | soapread

How To

Related Links

  


Free Computational Biology Interactive Kit

See how to analyze, visualize, and model biological data and systems using MathWorks products.

Get free kit

Trials Available

Try the latest computational biology products.

Get trial software
 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS