Read data from Short Oligonucleotide Analysis Package (SOAP) file
Read and Examine a SOAP-Formatted File
Read the alignment records (entries) from the
data = soapread("sample01.soap")
data=17×1 struct array with fields: QueryName Sequence Quality NumHits PairedEndSourceFile Length Strand ReferenceName Position AlignDetails
View the quality score for the 6th entry.
ans = '<>.>>>8>;:1>>>3>6>'
Determine the strand direction (forward or reverse) of the reference sequence to which the 12th entry aligns
ans = '-'
Modify SOAP-File Reading
Read a block of six alignment records (entries) from the
data_5_10 = soapread('sample01.soap',BlockRead=[5 10])
data_5_10=6×1 struct array with fields: QueryName Sequence Quality NumHits PairedEndSourceFile Length Strand ReferenceName Position AlignDetails
File — File to read
file path | file name
File to read, specified as a path to a SOAP-formatted file (version 2.15) or as a file name. If you specify only a file name, that file must be on the MATLAB search path or in the current folder.
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
SOAPStruct = soapread(File,BlockRead=10)
The names are case-insensitive. For example, you can use
aligndetails instead of
AlignDetails — Indication to include the
AlignDetails field in the
SOAPStruct output argument
true (default) |
Indication to include the
AlignDetails field in the
SOAPStruct output argument, specified as
true (include the field) or
false (do not include the field).
BlockRead — Entries to read
[1 Inf] (default) | positive integer | two-element positive integer vector
Entries to read, specified as a positive integer or as a two-element positive integer vector.
To read entry
File, specify a positive integer
To read the block of entries starting at
N1and ending at
N2, specify a positive integer vector
N1 < N2. To read all the entries starting at
SOAPStruct — Sequence alignment and mapping information
array of structures
Sequence alignment and mapping information, returned as an
N-by-1 array of structures, where
N is the number of alignment records stored in
File. Each structure contains the following
Name of aligned read sequence.
|Character vector containing the letter representations of the read sequence. It is the reverse-complement if the read sequence aligns to the reverse strand of the reference sequence.|
|Character vector containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.|
|The number of total instances where this read sequence aligned to an identical length of bases on another area of the reference sequence.|
|Flag (a or b) specifying which source file to which the read sequence belongs. This field applies only to read sequences that are paired in the alignment.|
|Scalar specifying the length of the read sequence.|
|+ or − specifying direction (forward or reverse) of reference sequence to which the read sequence aligns.|
|Name or numeric ID of the reference sequence to which the read sequence aligns.|
|Position (one-based offset) of the forward reference sequence where the left-most base of the alignment of the read sequence starts.|
|Information on mismatches, insertions, and deletions in the alignment. For SOAP-formatted files v2.15, this field includes CIGAR strings.|
If your SOAP-formatted file is too large to read using available memory, try either of the following:
BlockReadname-value pair arguments to read a subset of entries.
BioIndexedFileobject from the SOAP-formatted file (using
Format), and then access the entries using methods of the
 Li, R., Yu, C., Li, Y., Lam, T., Yiu, S., Kristiansen, K., and Wang, J. (2009). SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 15, 1966–1967.
 Li, R., Li, Y., Kristiansen, K., and Wang, J. (2008). SOAP: short oligonucleotide alignment program. Bioinformatics 24(5), 713–714.
Introduced in R2010b