This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.

BioReadQualityStatistics class

Quality statistics from a short-read sequence file


The BioReadQualityStatistics class contains quality statistics data from short-read sequences and provides a standard set of quality control plots for such data.

Construct a BioReadQualityStatistics object from short-read sequence data stored in FASTQ, SAM, or BAM files. Perform data quality analyses using the object's methods to generate several quality control plots regarding average quality score for each base position, average quality score distribution, read count percentage for each base position, percentage of GC nucleotides for each base position, GC content distribution, and all nucleotide distribution. The object lets you parse a sequence file without creating a BioRead object and interact with the quality data in order to compare different data sets or filtering options and create customized plots.


QSObj = BioReadQualityStatistics(File) constructs QSObj, a BioReadQualityStatistics object, from the data stored in File, a FASTQ-, SAM-, or BAM-formatted file.

QSObj = BioReadQualityStatistics(Obj) constructs QSObj, a BioReadQualityStatistics object, from the data stored in Obj, a BioRead or BioMap object.

QSObj = BioReadQualityStatistics(___,Name,Value) constructs a BioReadQualityStatistics object using options specified by one or more name-value pair arguments.

    Note:   Once created, you cannot modify the properties of QSObj since it is an immutable object.

Input Arguments

expand all


Character vector specifying a FASTQ file. You can also include the path or folder location of the file.


A BioRead or BioMap object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

expand all

Encoding format, specified as 'Sanger', 'Illumina13', 'Illumina15', 'Illumina18', or 'Solexa'. It is the format that is used for characters encoding sequence information and quality scores in a FASTQ file.

Example: 'Encoding','Sanger'

Number of characters, specified as a positive integer, from each read to be used. No filtering is applied if you use an empty array, which is the default value.

Example: 'FilterLength',40

Average quality threshold, specified as a real number. Any read with an average score of less than the specified threshold is ignored.

Example: 'QualityScoreThreshold', 10



Name of a file used to create BioReadQualityStatistics object.


Type of file from which a BioReadQualityStatistics object is created. Supported file types are FASTQ, SAM, and BAM formats.


Character vector specifying the format of the character encoding sequence information and quality scores in the file. Supported formats are: 'Sanger', 'Illumina13', 'Illumina15', 'Illumina18', and 'Solexa'. The default format is 'Illumina18'.


Integer specifying ASCII code where the quality score begins for a sequence.


Integer representing the number of short-read sequences BioReadQualityStatistics object contains.


Integer representing maximum length of a short-read sequence among all sequences of BioReadQualityStatistics object.


Integer specifying minimum Phred quality score [1] among all short-read sequences of a BioReadQualityStatistics object.


Integer specifying maximum Phred quality score among all short-read sequences of a BioReadQualityStatistics object.


Integer specifying the number of Phred scores that are not considered in the quality score range.


Vector of integers representing average quality distribution per sequence.


s-by-p matrix of integers that represent quality scores (s) per base positions (p).


Vector of integers representing the distribution of GC nucleotides per sequence.


n-by-p matrix of integers that represents distribution of all nucleotides (n = 5) per base position (p).


Character vector describing the user-defined name for the object.


Integer representing maximum sequence quality score among all scores.


Integer representing minimum sequence quality score among all scores.


Positive integer specifying the length of each read used in quality analysis.


Scalar value specifying minimum average quality threshold for a read. Any read with an average score of less than the specified threshold is ignored. The default value is –Inf, which causes all reads to be considered.


Vector of integers specifying the index for subset of information from the original sequence data used in analysis.


plotPerPositionCountByQualityPlot fractions of reads with Phred scores in ranges
plotPerPositionGCPlot percentages of G or C nucleotides at each base position
plotPerPositionQuality Plot Phred score distributions
plotPerSequenceGCPlot G or C nucleotide distribution
plotPerSequenceQualityPlot distribution of average quality scores
plotSummaryPlot summary statistics of a BioReadQualityStatistics object
plotTotalGCPlot distribution of all nucleotides of short-read sequences


expand all

This example shows how to create a BioReadQualityStatistics object and plot summary statistics of it.

Create a BioReadQualityStatistics object from a FASTQ file using only the first 40 characters of each read with a minimum average quality score of 5.

QSObj = BioReadQualityStatistics('SRR005164_1_50.fastq', 'FilterLength',...
                                    40, 'QualityScoreThreshold', 5)
QSObj = 

  BioReadQualityStatistics with properties:

                    FileName: '/mathworks/devel/bat/Bdoc16b/build/matlab/t...'
                    FileType: 'FASTQ'
                    Encoding: 'Illumina18'
                  CharOffset: 33
               NumberOfReads: 50
               MaxReadLength: 40
            MinEncodingPhred: 0
            MaxEncodingPhred: 62
                   SkipPhred: []
    PerSeqAverageQualityDist: [1×62 double]
             PerPosQualities: [63×40 double]
                PerSeqGCDist: [0 0 0 0 3 3 8 5 9 7 6 5 2 2 0 0 0 0 0 0]
              PerPosBaseDist: [5×40 double]
                        Name: ''
                    MaxScore: 34
                    MinScore: 1
                FilterLength: 40
       QualityScoreThreshold: 5
                      Subset: NaN

Plot the summary statistics of the object.

ans =



[1] Wikipedia. (2012). Phred quality score,

See Also


Was this topic helpful?