samread - Read data from Sequence Alignment/Map (SAM) file
Syntax
SAMStruct = samread(File)
[SAMStruct, HeaderStruct]=
samread(File)
... = samread(File,'ParameterName',ParameterValue)
Description
SAMStruct = samread(File) reads
a SAM-formatted file and returns the data in a MATLAB array of
structures.
[SAMStruct, HeaderStruct]=
samread(File) returns the alignment
and header data in two separate variables.
... = samread(File,'ParameterName',ParameterValue) accepts
one or more comma-separated parameter name/value pairs. Specify ParameterName inside
single quotes.
Tips
Use the saminfo function
to investigate the size and content of a SAM-formatted file before
using the samread function to read the file contents
into a MATLAB array of structures.
If your SAM-formatted file is too large to read using
available memory, try one of the following:
Use the BlockRead parameter with
the samread function to read a subset of entries.
Create a BioIndexedFile object from the SAM-formatted
file, then access the entries using methods of the BioIndexedFile class.
Use the SAMStruct output
argument that samread returns to create a BioMap object, which lets you
explore, access, filter, and manipulate all or a subset of the data,
before doing subsequent analyses or viewing the data.
Input Arguments
File |
Either of the following:
String specifying a file name or path and file name
of a SAM-formatted file. If you specify only a file name, that file
must be on the MATLAB search path or in the current folder. MATLAB string containing the text of a SAM-formatted
file.
|
Parameter Name/Value Pairs
Tags |
Controls the reading of the optional tags in addition to the
first 11 fields for each alignment in the SAM-formatted file. Choices
are true (default) or false.
|
ReadGroup |
String specifying the read group ID for which to read alignment
records from. Default is to read records from all groups.
Tip
For a list of the read groups (if present), return the header
information in a separate Header structure
and view the ReadGroup field in this structure. |
|
BlockRead |
Scalar or vector that controls the reading of a single sequence
entry or block of sequence entries from a SAM-formatted file containing
multiple sequences. Enter a scalar N, to
read the Nth entry in the file. Enter a
1-by-2 vector [M1, M2], to read a block
of entries starting at the M1 entry and
ending at the M2 entry. To read all remaining
entries in the file starting at the M1 entry,
enter a positive value for M1 and enter Inf for M2.
|
Output Arguments
SAMStruct |
An N-by-1 array of structures containing
sequence alignment and mapping information from a SAM-formatted file,
where N is the number of alignment records stored
in the SAM-formatted file. Each structure contains the following fields.
| Field | Description |
| QueryName | Name of read sequence (if unpaired) or name of sequence
pair (if paired). Tip
You can use this information to populate the Header property
of the BioMap object. |
|
| Flag | Integer indicating the bit-wise information that specifies
the status of each of 11 flags described by the SAM format specification. Tip
You can use the bitget function
to determine the status of a specific SAM flag. |
|
| ReferenceName | Name of the reference sequence. |
| Position | Position (one-based offset) of the forward reference sequence
where the left-most base of the alignment of the read sequence starts. |
| MappingQuality | Integer specifying the mapping quality score for the read sequence. |
| CigarString | CIGAR-formatted string representing how the read sequence aligns
with the reference sequence. |
| MateReferenceName | Name of the reference sequence associated with the mate. If
this name is the same as ReferenceName, then this
value is =. If there is no mate, then this value
is *. |
| MatePosition | Position (one-based offset) of the forward reference sequence
where the left-most base of the alignment of the mate of the read
sequence starts. |
| InsertSize | The number of base positions between the read sequence and
its mate, when both are mapped to the same reference sequence. Otherwise,
this value is 0. |
| Sequence | String containing the letter representations of the read sequence.
It is the reverse-complement if the read sequence aligns to the reverse
strand of the reference sequence. |
| Quality | String containing the ASCII representation of the per-base
quality score for the read sequence. The quality score is reversed
if the read sequence aligns to the reverse strand of the reference
sequence. |
| Tags | List of applicable SAM tags and their values. |
|
HeaderStruct |
Structure containing header information for the SAM-formatted
file in the following fields.
| Field | Description |
| Header* | Structure containing the file format version, sort order, and
group order. |
| SequenceDictionary* | Structure containing the: |
| ReadGroup* | Structure containing the: |
| Program* | Structure containing the: Program name Version Command line
|
* — These structures and their fields
appear in the output structure only if they are present in the SAM
file. The information in these structures depends on the information
present in the SAM file.
|
Examples
Read the header information and the alignment data from the ex1.sam file
included with Bioinformatics Toolbox, and then return the information
in two separate variables:
[data header] = samread('ex1.sam');Read a block of entries, excluding the tags, from the ex1.sam file,
and then return the information in an array of structures:
% Read entries 5 through 10 and do not include the tags
data = samread('ex1.sam','blockread', [5 10], 'tags', false);References
[1] Li, H., Handsaker, B., Wysoker, A., Fennell,
T., Ruan, J., Homer, N., Marth, G., Goncalo, A., and Durbin, R. (2009).
The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,
16, 2078–2079.
See Also
bamindexread | baminfo | bamread | BioIndexedFile | BioMap | bowtieread | fastainfo | fastaread | fastawrite | fastqinfo | fastqread | fastqwrite | saminfo | sffinfo | sffread | soapread
How To
Related Links
See how to analyze, visualize, and model biological data and systems using MathWorks products.
Get free kit