Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

getCounts

Class: BioMap

Return count of read sequences aligned to reference sequence in BioMap object

Syntax

Count = getCounts(BioObj, StartPos, EndPos)
GroupCount = getCounts(BioObj, StartPos, EndPos, Groups)
GroupCount = getCounts(BioObj, StartPos, EndPos, Groups, R)
___ = getCounts(___, Name,Value)

Description

Count = getCounts(BioObj, StartPos, EndPos) returns Count, a nonnegative integer specifying the number of read sequences in BioObj, a BioMap object, that align to a specific range or set of ranges in the reference sequence. The range or set of ranges are defined by StartPos and EndPos. StartPos and EndPos can be two nonnegative integers such that StartPos is less than EndPos, and both integers are smaller than the length of the reference sequence. StartPos and EndPos can also be two column vectors representing a set of ranges (overlapping or segmented).

By default, getCounts counts each read only once. Therefore, if a read spans multiple ranges, that read instance is counted only once. When StartPos and EndPos specify overlapping ranges, the overlapping ranges are considered as one range.

GroupCount = getCounts(BioObj, StartPos, EndPos, Groups) specifies Groups, a vector of integers or character vectors, indicating groups that segmented ranges belong to. The segmented ranges are treated independently.

GroupCount = getCounts(BioObj, StartPos, EndPos, Groups, R) specifies a reference for each of the segmented ranges defined by StartPos, EndPos, and Groups.

___ = getCounts(___, Name,Value) uses additional options specified by one or more Name,Value pair arguments.

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Either of the following:

  • Nonnegative integer that defines the start of a range in the reference sequence. StartPos must be less than EndPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the start of a range in the reference sequence.

EndPos

Either of the following:

  • Nonnegative integer that defines the end of a range in the reference sequence. EndPos must be greater than StartPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the end of a range in the reference sequence.

Groups

Row vector of integers or character vectors, the same size as StartPos and EndPos. This vector indicates the group to which each range belongs.

R

Vector of positive integers indexing the SequenceDictionary property of BioObj, or a cell array of character vectors of the reference names. R must be scalar or must have the same number of elements as Groups.

For a given value of Groups, all the corresponding elements in R must be the same.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Independent'

Logical that specifies whether to treat the ranges defined by StartPos and EndPos independently. If true, Count is a column vector containing the same number of elements as StartPos and EndPos. In this case, a read that spans multiple ranges, is counted once in each range.

    Note:   This name-value pair argument is ignored when using the Groups input argument, because getCounts assumes that each group of ranges is independent.

Default: false

'Overlap'

Specifies the minimum number of base positions that a read must overlap in a range or set of ranges, to be counted. This value can be any of the following:

  • Positive integer

  • 'full' — A read must be fully contained in a range or set of ranges to be counted.

  • 'start' — A read's start position must lie within a range or set of ranges to be counted.

Default: 1

'Spliced'

Logical specifying whether short reads are spliced during mapping (as in mRNA-to-genome mapping). N symbols in the Signature property of the object are not counted.

Default: false

'Method'

Character vector specifying the method to measure the abundance of reads. Choices are:

  • 'raw' — Raw counts

  • 'rpkm' — Counts of reads per kilobase pairs per million aligned reads

  • 'mean' — Average coverage depth computed base-by-base

  • 'max' — Maximum coverage depth computed base-by-base

  • 'min' — Minimum coverage depth computed base-by-base

  • 'sum' — Sum of all aligned bases in all the reads

Default: 'raw'

Output Arguments

Count

Either of the following:

  • When Independent is false, this value is a nonnegative integer. The integer specifies the number of reads that align to a range or set of ranges (overlapping or segmented) of the reference sequence in BioObj, a BioMap object. Each read is counted only once, even if the read spans multiple ranges.

  • When Independent is true, this value is a vector of nonnegative integers. This vector indicates the number of reads that align to the independent ranges specified by StartPos and EndPos. This vector contains the same number of elements as StartPos and EndPos.

GroupCount

Either of the following:

  • If no reference or a single reference is specified, this value is a vector containing the number of reads for each unique group in Groups. The order of elements in GroupsCount corresponds to the ascending order of unique elements in Groups.

  • If multiple references are specified, GroupCount is a cell array, where the ith element contains the number of reads for each unique group in the ith reference. The order of elements in GroupsCount corresponds to the ascending order of unique elements in R.

Examples

expand all

Create a BioMap object.

obj = BioMap('ex1.sam');

Return the number of reads that cover at least one base of the segmented range 1:50 and 71:100. By default, the ranges are not treated independently, that is, a read is counted once even if it maps to both segmented ranges.

counts_1 = getCounts(obj,[1;71],[50;100])
counts_1 =

    37

Compute the number of reads, treating the segmented ranges [1:50] and [71:100] independently. Observe that sum(counts_2) is greater than counts_1 because there are four reads that span over the two segments and are counted twice in the second case.

counts_2 = getCounts(obj,[1;71],[50;100], 'Independent', true)
counts_2 =

    20
    21

Compute the number of reads that align to the segmented range 30:60 (associated with group 1) and the segmented range [1:10 50:60] (associated with group 2).

counts_3 = getCounts(obj,[1;30;50],[10;60;60],[2 1 2])
counts_3 =

    25
    22

Return the total number of reads aligned to the reference sequence.

getCounts(obj, min(getStart(obj)), max(getStop(obj)))
ans =

        1482

Was this topic helpful?