Skip to Main Content Skip to Search
Product Documentation

samread - Read data from Sequence Alignment/Map (SAM) file

Syntax

SAMStruct = samread(File)
[SAMStruct, HeaderStruct]= samread(File)
... = samread(File,'ParameterName',ParameterValue)

Description

SAMStruct = samread(File) reads a SAM-formatted file and returns the data in a MATLAB array of structures.

[SAMStruct, HeaderStruct]= samread(File) returns the alignment and header data in two separate variables.

... = samread(File,'ParameterName',ParameterValue) accepts one or more comma-separated parameter name/value pairs. Specify ParameterName inside single quotes.

Tips

Input Arguments

File

Either of the following:

  • String specifying a file name or path and file name of a SAM-formatted file. If you specify only a file name, that file must be on the MATLAB search path or in the current folder.

  • MATLAB string containing the text of a SAM-formatted file.

Parameter Name/Value Pairs

Tags

Controls the reading of the optional tags in addition to the first 11 fields for each alignment in the SAM-formatted file. Choices are true (default) or false.

ReadGroup

String specifying the read group ID for which to read alignment records from. Default is to read records from all groups.

    Tip   For a list of the read groups (if present), return the header information in a separate Header structure and view the ReadGroup field in this structure.

BlockRead

Scalar or vector that controls the reading of a single sequence entry or block of sequence entries from a SAM-formatted file containing multiple sequences. Enter a scalar N, to read the Nth entry in the file. Enter a 1-by-2 vector [M1, M2], to read a block of entries starting at the M1 entry and ending at the M2 entry. To read all remaining entries in the file starting at the M1 entry, enter a positive value for M1 and enter Inf for M2.

Output Arguments

SAMStruct

An N-by-1 array of structures containing sequence alignment and mapping information from a SAM-formatted file, where N is the number of alignment records stored in the SAM-formatted file. Each structure contains the following fields.

FieldDescription
QueryName

Name of read sequence (if unpaired) or name of sequence pair (if paired).

    Tip   You can use this information to populate the Header property of the BioMap object.

Flag

Integer indicating the bit-wise information that specifies the status of each of 11 flags described by the SAM format specification.

    Tip   You can use the bitget function to determine the status of a specific SAM flag.

ReferenceNameName of the reference sequence.
PositionPosition (one-based offset) of the forward reference sequence where the left-most base of the alignment of the read sequence starts.
MappingQualityInteger specifying the mapping quality score for the read sequence.
CigarStringCIGAR-formatted string representing how the read sequence aligns with the reference sequence.
MateReferenceNameName of the reference sequence associated with the mate. If this name is the same as ReferenceName, then this value is =. If there is no mate, then this value is *.
MatePositionPosition (one-based offset) of the forward reference sequence where the left-most base of the alignment of the mate of the read sequence starts.
InsertSizeThe number of base positions between the read sequence and its mate, when both are mapped to the same reference sequence. Otherwise, this value is 0.
SequenceString containing the letter representations of the read sequence. It is the reverse-complement if the read sequence aligns to the reverse strand of the reference sequence.
QualityString containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.
TagsList of applicable SAM tags and their values.

HeaderStruct

Structure containing header information for the SAM-formatted file in the following fields.

FieldDescription
Header*Structure containing the file format version, sort order, and group order.
SequenceDictionary*

Structure containing the:

  • Sequence name

  • Sequence length

  • Genome assembly identifier

  • MD5 checksum of sequence

  • URI of sequence

  • Species

ReadGroup*

Structure containing the:

  • Read group identifier

  • Sample

  • Library

  • Description

  • Platform unit

  • Predicted median insert size

  • Sequencing center

  • Date

  • Platform

Program*

Structure containing the:

  • Program name

  • Version

  • Command line

* — These structures and their fields appear in the output structure only if they are present in the SAM file. The information in these structures depends on the information present in the SAM file.

Examples

Read the header information and the alignment data from the ex1.sam file included with Bioinformatics Toolbox, and then return the information in two separate variables:

[data header] = samread('ex1.sam');
 

Read a block of entries, excluding the tags, from the ex1.sam file, and then return the information in an array of structures:

% Read entries 5 through 10 and do not include the tags
data = samread('ex1.sam','blockread', [5 10], 'tags', false);

References

[1] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Goncalo, A., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 16, 2078–2079.

See Also

bamindexread | baminfo | bamread | BioIndexedFile | BioMap | bowtieread | fastainfo | fastaread | fastawrite | fastqinfo | fastqread | fastqwrite | saminfo | sffinfo | sffread | soapread

How To

Related Links

  


Free Computational Biology Interactive Kit

See how to analyze, visualize, and model biological data and systems using MathWorks products.

Get free kit

Trials Available

Try the latest computational biology products.

Get trial software
 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS