Perform circular binary segmentation (CBS) on arraybased comparative genomic hybridization (aCGH) data
SegmentStruct
= cghcbs(CGHData
)SegmentStruct
= cghcbs(CGHData
,
...'Alpha', AlphaValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Permutations', PermutationsValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Method', MethodValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'StoppingRule', StoppingRuleValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Smooth', SmoothValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Prune', PruneValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Errsum', ErrsumValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'WindowSize', WindowSizeValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'SampleIndex', SampleIndexValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Chromosome', ChromosomeValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Showplot', ShowplotValue
, ...)SegmentStruct
= cghcbs(CGHData
,
...'Verbose', VerboseValue
, ...)
CGHData  Arraybased comparative genomic hybridization (aCGH) data in
either of the following forms:
 
AlphaValue  Scalar that specifies the significance level for the statistical
tests to accept change points. Default is 0.01 .  
PermutationsValue  Scalar that specifies the number of permutations used for pvalue
estimation. Default is 10,000 .  
MethodValue  Character vector that specifies the method to estimate the
pvalues. Choices are 'Perm' or 'Hybrid' (default). 'Perm' does
a full permutation, while 'Hybrid' uses a faster,
tail probabilitybased permutation. When using the 'Hybrid' method,
the 'Perm' method is applied automatically when
segment data length becomes less than 200.  
StoppingRuleValue  Controls the use of a heuristic stopping rule, based on the
method described by Venkatraman
and Olshen (2007), to declare a change without performing the
full number of permutations for the pvalue estimation, whenever it
becomes very likely that a change has been detected. Choices are true or false (default).
 
SmoothValue  Controls the smoothing of outliers before segmenting using
the procedure explained by Olshen
et al. (2004). Choices are true (default)
or false .  
PruneValue  Controls the elimination of change points identified due to
local trends in the data that are not indicative of real copy number
change, using the procedure explained by Olshen et al. (2004). Choices are true or false (default).  
ErrsumValue  Scalar that specifies the allowed proportional increase in
the error sum of squares when eliminating change points using the 'Prune' property.
Commonly used values are 0.05 and 0.1 .
Default is 0.05 .  
WindowSizeValue  Scalar that specifies the size of the window (in data points)
used to divide the data when using the 'Perm' method
on large data sets. Default is 200 .  
SampleIndexValue  A single sample index or a vector of sample indices that specify the sample(s) to analyze. Default is all sample indices.  
ChromosomeValue  A single chromosome number or a vector of chromosome numbers that specify the data to analyze. Default is all chromosome numbers.  
ShowplotValue  Controls the display of plots of the segment means over the original data. Choices are either:
Default is:
 
VerboseValue  Controls the display of a progress report of the analysis.
Choices are true (default) or false . 
SegmentStruct  Structure containing segmentation information in the following fields:

performs
circular binary segmentation (CBS) on arraybased comparative genomic
hybridization (aCGH) data to determine the copy number alteration
segments (neighboring regions of DNA that exhibit a statistical difference
in copy number) and change points. SegmentStruct
= cghcbs(CGHData
)
Note:
The CBS algorithm recursively splits chromosomes into segments
based on a maximum t statistic estimated by permutation. This computation
can be time consuming. If 
calls SegmentStruct
= cghcbs(CGHData
,
...'PropertyName
', PropertyValue
,
...)cghcbs
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
specifies
the significance level for the statistical tests to accept change
points. Default is SegmentStruct
= cghcbs(CGHData
,
...'Alpha', AlphaValue
, ...)0.01
.
specifies
the number of permutations used for pvalue estimation. Default is SegmentStruct
= cghcbs(CGHData
,
...'Permutations', PermutationsValue
, ...)10,000
.
specifies
the method to estimate the pvalues. Choices are SegmentStruct
= cghcbs(CGHData
,
...'Method', MethodValue
, ...)'Perm'
or 'Hybrid'
(default). 'Perm'
does
a full permutation, while 'Hybrid'
uses a faster,
tail probabilitybased permutation. When using the 'Hybrid'
method,
the 'Perm'
method is applied automatically when
segment data length becomes less than 200.
controls
the use of a heuristic stopping rule, based on the method described
by Venkatraman and Olshen
(2007), to declare a change without performing the full number
of permutations for the pvalue estimation, whenever it becomes very
likely that a change has been detected. Choices are SegmentStruct
= cghcbs(CGHData
,
...'StoppingRule', StoppingRuleValue
, ...)true
or false
(default).
controls
the smoothing of outliers before segmenting, using the procedure explained
by Olshen et al. (2004).
Choices are SegmentStruct
= cghcbs(CGHData
,
...'Smooth', SmoothValue
, ...)true
(default) or false
.
controls
the elimination of change points identified due to local trends in
the data that are not indicative of real copy number change, using
the procedure explained by Olshen
et al. (2004). Choices are SegmentStruct
= cghcbs(CGHData
,
...'Prune', PruneValue
, ...)true
or false
(default).
specifies
the allowed proportional increase in the error sum of squares when
eliminating change points using the SegmentStruct
= cghcbs(CGHData
,
...'Errsum', ErrsumValue
, ...)'Prune'
property.
Commonly used values are 0.05
and 0.1
.
Default is 0.05
.
specifies
the size of the window (in data points) used to divide the data when
using the SegmentStruct
= cghcbs(CGHData
,
...'WindowSize', WindowSizeValue
, ...)'Perm'
method on large data sets. Default
is 200
.
analyzes
only the sample(s) specified by SegmentStruct
= cghcbs(CGHData
,
...'SampleIndex', SampleIndexValue
, ...)SampleIndexValue
,
which can be a single sample index or a vector of sample indices.
Default is all sample indices.
analyzes
only the data on the chromosomes specified by SegmentStruct
= cghcbs(CGHData
,
...'Chromosome', ChromosomeValue
, ...)ChromosomeValue
,
which can be a single chromosome number or a vector of chromosome
numbers. Default is all chromosome numbers.
controls
the display of plots of the segment means over the original data.
Choices are SegmentStruct
= cghcbs(CGHData
,
...'Showplot', ShowplotValue
, ...)true
, false
, W
, S
,
or I
, an integer specifying one of the
chromosomes in CGHData
. When ShowplotValue
is true
,
all chromosomes in all samples are plotted. If there are multiple
samples in CGHData
, then each sample is
plotted in a separate Figure window. When ShowplotValue
is W
,
the layout displays all chromosomes in one plot in the Figure window.
When ShowplotValue
is S
,
the layout displays each chromosome in a subplot in the Figure window.
When ShowplotValue
is I
,
only the specified chromosome is plotted. Default is either:
false
— When return values
are specified.
true
and W
—
When return values are not specified.
controls
the display of a progress report of the analysis. Choices are SegmentStruct
= cghcbs(CGHData
,
...'Verbose', VerboseValue
, ...)true
(default)
or false
.
Analyzing Data from the Coriell Cell Line Study
Load a MATfile, included with the Bioinformatics Toolbox™ software,
which contains coriell_data
, a structure of arraybased
CGH data.
load coriell_baccgh
Analyze all chromosomes of sample 3 (GM05296) of the
aCGH data and return segmentation data in a structure, S
.
Plot the segment means over the original data for all chromosomes
of this sample.
S = cghcbs(coriell_data,'sampleindex',3,'showplot',true);
Chromosome 10 shows a gain, while chromosome 11 shows a loss.
The coriell_baccgh.mat
file used in this
example contains data from Snijders et al., 2001.
Analyzing Data from a Pancreatic Cancer Study
Load a MATfile, included with the Bioinformatics Toolbox software,
which contains pancrea_data
, a structure of arraybased
CGH data from a pancreatic cancer study.
load pancrea_oligocgh
Analyze only chromosome 9 in sample 32 of the CGH
data and return the segmentation data in a structure, PS
.
Plot the segment means over the original data for chromosome 9 in
this sample.
PS = cghcbs(pancrea_data,'sampleindex',32,'chromosome',9,... 'showplot',9);
Chromosome 9 contains two segments that indicate losses. For more detailed information on interpreting the data, see Aguirre et al. (2004).
Use the chromosomeplot
function
with the 'addtoplot'
property to add the ideogram
of chromosome 9 for Homo sapiens to the plot
of the segmentation data.
chromosomeplot('hs_cytoBand.txt', 9, 'addtoplot', gca)
The pancrea_oligocgh.mat
file used in this
example contains data from Aguirre et al., 2004.
Displaying Copy Number Alteration Regions Aligned to a Chromosome Ideogram
Create a structure containing segment gain and loss
information for chromosomes 10 and 11 from sample 3 from the Coriell
cell line study, making sure the segment data is in bp units. (You
can determine copy number variance (CNV) information by exploring S
,
the structure of segments returned by the cghcbs
function
in Analyzing Data from the Coriell Cell Line Study.)
For the 'CNVType'
field, use 1
to
indicate a loss and 2
to indicate a gain.
cnvStruct = struct('Chromosome', [10 11],... 'CNVType', [2 1],... 'Start', [S.SegmentData(10).Start(2),... S.SegmentData(11).Start(2)]*1000,... 'End', [S.SegmentData(10).End(2),... S.SegmentData(11).End(2)]*1000) cnvStruct = Chromosome: [10 11] CNVType: [2 1] Start: [66905000 35416000] End: [110412000 43357000]
Pass the structure to the chromosomeplot
function
using the 'CNV'
property to display the copy number
gains (green) and losses (red) aligned to the human chromosome ideogram.
Specify kb units for the display of segment information in the data
tip.
chromosomeplot('hs_cytoBand.txt', 'cnv', cnvStruct, 'unit', 2)
The coriell_baccgh.mat
file used in this
example contains data from Snijders et al., 2001.
[1] Olshen, A.B., Venkatraman, E.S., Lucito, R., and Wigler, M. (2004). Circular binary segmentation for the analysis of arraybased DNA copy number data. Biostatistics 5, 4, 557–572.
[2] Venkatraman, E.S., and Olshen, A.B. (2007). A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data. Bioinformatics 23(6), 657–663.
[3] Venkatraman, E.S., and Olshen, A.B. (2006).
DNAcopy: A Package for Analyzing DNA Copy Data. http://www.bioconductor.org/packages/2.1/bioc/html/DNAcopy.html
[4] Snijders, A.M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., Hamilton, G., Hindle, A.K., Huey, B., Kimura, K., Law, S., Myambo, K., Palmer, J., Ylstra, B., Yue, J.P., Gray, J.W., Jain, A.N., Pinkel, D., and Albertson, D.G. (2001). Assembly of microarrays for genomewide measurement of DNA copy number. Nature Genetics 29, 263–264.
[5] Aguirre, A.J., Brennan, C., Bailey, G., Sinha, R., Feng, B., Leo, C., Zhang, Y., Zhang, J., Gans, J.D., Bardeesy, N., Cauwels, C., CordonCardo, C., Redston, M.S., DePinho, R.A., and Chin, L. (2004). Highresolution characterization of the pancreatic adenocarcinoma genome. PNAS 101, 24, 9067–9072.