Quality statistics from a short-read sequence file
BioReadQualityStatistics class contains
quality statistics data from short-read sequences and provides a standard
set of quality control plots for such data.
from short-read sequence data stored in FASTQ, SAM, or BAM files.
Perform data quality analyses using the object's methods to
generate several quality control plots regarding average quality
score for each base position, average quality score distribution,
read count percentage for each base position, percentage of G and
C nucleotides for each base position, G and C content distribution,
and all nucleotide distribution. The object lets parse a sequence
file without creating a
BioRead object and interact
with the quality data in order to compare different data sets or filtering
options and create customized plots.
QSObj = BioReadQualityStatistics(
BioReadQualityStatistics object, from the data
File, a FASTQ-, SAM-, or BAM-formatted
QSObj = BioReadQualityStatistics(___,
using options specified by one or more name-value pair arguments.
Once created, you cannot modify the properties of
String specifying a FASTQ file. The string can contain the path or folder location of the file.
Specify optional comma-separated pairs of
Name is the argument
Value is the corresponding
Name must appear
inside single quotes (
You can specify several name and value pair
arguments in any order as
'Encoding'— Encoding format
Encoding format, specified as
'Solexa'. It is the format that is used for
characters encoding sequence information and quality scores in a FASTQ
'FilterLength'— Number of characters
(default) | positive integer
Number of characters, specified as a positive integer, from each read to be used. No filtering is applied if you use an empty array, which is the default value.
'QualityScoreThreshold'— Average quality threshold
-Inf(default) | real number
Average quality threshold, specified as a real number. Any read with an average score of less than the specified threshold is ignored.
Name of a file used to create
Type of file from which a
String specifying the format of the character encoding sequence
information and quality scores in the file. Supported formats are:
Integer specifying ASCII code where the quality score begins for a sequence.
Integer representing the number of short-read sequences
Integer representing maximum length of a short-read sequence
among all sequences of
Integer specifying minimum Phred quality score  among
all short-read sequences of a
Integer specifying maximum Phred quality score among all short-read
sequences of a
Integer specifying the number of Phred scores that are not considered in the quality score range.
Vector of integers representing average quality distribution per sequence.
s-by-p matrix of integers that represent quality scores (s) per base positions (p).
Vector of integers representing the distribution of G and C nucleotides per sequence.
n-by-p matrix of integers that represents distribution of all nucleotides (n = 5) per base position (p).
String describing the user-defined name for the object.
Integer representing maximum sequence quality score among all scores.
Integer representing minimum sequence quality score among all scores.
Positive integer specifying the length of each read used in quality analysis.
Scalar value specifying minimum average quality threshold for
a read. Any read with an average score of less than the specified
threshold is ignored. The default value is
Vector of integers specifying the index for subset of information from the original sequence data used in analysis.
|plotPerPositionCountByQuality||Plot fractions of reads with Phred scores in ranges|
|plotPerPositionGC||Plot percentages of G or C nucleotides at each base position|
|plotPerPositionQuality||Plot Phred score distributions|
|plotPerSequenceGC||Plot G or C nucleotide distribution|
|plotPerSequenceQuality||Plot distribution of average quality scores|
|plotSummary||Plot summary statistics of a BioReadQualityStatistics object|
|plotTotalGC||Plot distribution of all nucleotides of short-read sequences|
This example shows how to create a BioReadQualityStatistics object and plot summary statistics of it.
Create a BioReadQualityStatistics object from a FASTQ file using only the first 40 characters of each read with a minimum average quality score of 5.
QSObj = BioReadQualityStatistics('SRR005164_1_50.fastq', 'FilterLength',... 40, 'QualityScoreThreshold', 5)
QSObj = BioReadQualityStatistics with properties: FileName: '/mathworks/devel/bat/Bdoc15a/build/matlab/t...' FileType: 'FASTQ' Encoding: 'Illumina18' CharOffset: 33 NumberOfReads: 50 MaxReadLength: 40 MinEncodingPhred: 0 MaxEncodingPhred: 62 SkipPhred:  PerSeqAverageQualityDist: [1x62 double] PerPosQualities: [63x40 double] PerSeqGCDist: [0 0 0 0 3 3 8 5 9 7 6 5 2 2 0 0 0 0 0 0] PerPosBaseDist: [5x40 double] Name: '' MaxScore: 34 MinScore: 1 FilterLength: 40 QualityScoreThreshold: 5 Subset: NaN
Plot the summary statistics of the object.
ans = 0.0084 1.0084 2.0084 3.0084 4.0084 5.0084
 Wikipedia. (2012). Phred quality score, http://en.wikipedia.org/wiki/Phred_quality_score