soapread - Read data from Short Oligonucleotide Analysis Package (SOAP) file
Syntax
SOAPStruct = soapread(File)
SOAPStruct = soapread(File,Name,Value)
Description
SOAPStruct = soapread(File) reads File,
a SOAP-formatted file (version 2.15) and returns the data in SOAPStruct,
a MATLAB array of structures.
SOAPStruct = soapread(File,Name,Value) reads
a SOAP-formatted file with additional options specified by one or
more Name,Value pair arguments.
Tips
If your SOAP-formatted file is too large to read using available
memory, try either of the following:
Use the BlockRead name-value pair
arguments to read a subset of entries.
Create a BioIndexedFile object
from the SOAP-formatted file (using 'TABLE' for
the Format), and then access the entries
using methods of the BioIndexedFile class.
Input Arguments
File |
Either of the following:
String specifying a file name or path and file name
of a SOAP-formatted file. If you specify only a file name, that file
must be on the MATLAB search path or in the Current Folder. MATLAB string containing the text of a SOAP-formatted
file.
The soapread function reads SOAP-formatted
files (version 2.15). |
Name-Value Pair Arguments
Specify optional comma-separated pairs of Name,Value arguments,
where Name is the argument
name and Value is the corresponding
value. Name must appear
inside single quotes (' ').
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN.
'BlockRead' |
Scalar or vector that controls the reading of a single sequence
entry or block of sequence entries from a SOAP-formatted file containing
multiple sequences. Enter a scalar N, to
read the Nth entry in the file. Enter a
1-by-2 vector [M1, M2], to read a block
of entries starting at the M1 entry and
ending at the M2 entry. To read all remaining
entries in the file starting at the M1 entry,
enter a positive value for M1 and enter Inf for M2.
|
'AlignDetails' |
Logical specifying whether or not to include the AlignDetails field
in the SOAPStruct output argument. The AlignDetails field
includes information on mismatches, insertions, and deletions in the
alignment. Choices are true (default) or false.
Default: true |
Output Arguments
SOAPStruct |
An N-by-1 array of structures containing
sequence alignment and mapping information from a SOAP-formatted file,
where N is the number of alignment records stored
in the SOAP-formatted file. Each structure contains the following
fields.
| Field | Description |
| QueryName | Name of aligned read sequence. |
| Sequence | String containing the letter representations of the read sequence.
It is the reverse-complement if the read sequence aligns to the reverse
strand of the reference sequence. |
| Quality | String containing the ASCII representation of the per-base
quality score for the read sequence. The quality score is reversed
if the read sequence aligns to the reverse strand of the reference
sequence. |
| NumHits | The number of total instances where this
read sequence aligned to an identical length of bases on another area
of the reference sequence. |
| PairedEndSourceFile | Flag (a or b) specifying which source file to which the read
sequence belongs. This field applies only to read sequences that are
paired in the alignment. |
| Length | Scalar specifying the length of the read sequence. |
| Strand | + or − specifying direction (forward or reverse) of
reference sequence to which the read sequence aligns. |
| ReferenceName | Name or numeric ID of the reference sequence to which the read
sequence aligns. |
| Position | Position (one-based offset) of the forward reference sequence
where the left-most base of the alignment of the read sequence starts. |
| AlignDetails | Information on mismatches, insertions, and deletions in the
alignment. For SOAP-formatted files v2.15, this field includes CIGAR
strings. |
|
Examples
Read the alignment records (entries) from the sample01.soap file
into a MATLAB array of structures and access some of the data:
% Read the alignment records stored in the file sample01.soap
data = soapread('sample01.soap')data =
17x1 struct array with fields:
QueryName
Sequence
Quality
NumHits
PairedEndSourceFile
Length
Strand
ReferenceName
Position
AlignDetails% Access the quality score for the 6th entry
data(6).Quality
ans =
<>.>>>8>;:1>>>3>6>
% Determine the strand direction (forward or reverse) of the reference
% sequence to which the 12th entry aligns
data(12).Strand
ans =
-
Read a block of alignment records (entries) from the sample01.soap file
into a MATLAB array of structures:
% Read a block of six entries from a SOAP file
data_5_10 = soapread('sample01.soap','blockread', [5 10])data_5_10 =
6x1 struct array with fields:
QueryName
Sequence
Quality
NumHits
PairedEndSourceFile
Length
Strand
ReferenceName
Position
AlignDetailsReferences
[1] Li, R., Yu, C., Li, Y., Lam, T., Yiu,
S., Kristiansen, K., and Wang, J. (2009). SOAP2: an improved ultrafast
tool for short read alignment. Bioinformatics 25, 15,
1966–1967.
[2] Li, R., Li, Y., Kristiansen, K., and Wang,
J. (2008). SOAP: short oligonucleotide alignment program. Bioinformatics 24(5),
713–714.
See Also
bamread | bowtieread | fastqread | samread
How To
Related Links
See how to analyze, visualize, and model biological data and systems using MathWorks products.
Get free kit