Main Content

genevarfilter

Filter genes with small profile variance

Syntax

Mask = genevarfilter(Data)
[Mask, FData] = genevarfilter(Data)
[Mask, FData, FNames] = genevarfilter(Data, Names)
genevarfilter(..., 'Percentile', PercentileValue, ...)
genevarfilter(..., 'AbsValue', AbsValueValue, ...)

Arguments

Data

DataMatrix object or numeric matrix where each row corresponds to a gene. If a matrix, the first column is the names of the genes, and each additional column is the results from an experiment.

Names

Cell array of character vectors or string vector where each element corresponds to the name of a gene for each row of experimental data. Names has same number of rows as Data with each row containing the name or ID of the gene in the data set.

PercentileValue

Specifies a percentile below which gene expression profiles are removed. Choices are integers from 0 to 100. Default is 10.

AbsValueValue

Property to specify an absolute value below which gene expression profiles are removed.

Description

Gene profiling experiments typically include genes that exhibit little variation in their profile and are generally not of interest. These genes are commonly removed from the data.

Mask = genevarfilter(Data) calculates the variance for each gene expression profile in Data and returns Mask, which identifies the gene expression profiles with a variance less than the 10th percentile. Mask is a logical vector with one element for each row in Data. The elements of Mask corresponding to rows with a variance greater than the threshold have a value of 1, and those with a variance less than the threshold are 0.

[Mask, FData] = genevarfilter(Data) calculates the variance for each gene expression profile in Data and returns FData, a filtered data matrix, in which the low-variation gene expression profiles are removed. You can also create FData using FData = Data(Mask,:).

[Mask, FData, FNames] = genevarfilter(Data, Names) returns FNames, a filtered names array, in which the names associated with low-variation gene expression profiles are removed. Names is a cell array of character vectors or string vector of the names of the genes corresponding to each row in Data. You can also create FNames using FNames = Names(Mask).

Note

If Data is a DataMatrix object with specified row names, you do not need to provide the second input Names to return the third output FNames.

genevarfilter(..., 'PropertyName', PropertyValue, ...) calls genevarfilter with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

genevarfilter(..., 'Percentile', PercentileValue, ...) removes from Data, the experimental data, the gene expression profiles with a variance less than the percentile specified by PercentileValue. Choices are integers from 0 to 100. Default is 10.

genevarfilter(..., 'AbsValue', AbsValueValue, ...) removes from Data , the experimental data, the gene expression profiles with a variance less than AbsValueValue.

Examples

  1. Load the MAT-file, provided with the Bioinformatics Toolbox™ software, that contains yeast data. This MAT-file includes three variables: yeastvalues, a matrix of gene expression data, genes, a cell array of GenBank® accession numbers for labeling the rows in yeastvalues, and times, a vector of time values for labeling the columns in yeastvalues

    load yeastdata
  2. Filter genes with a small profile variance.

    [fyeastvalues, fgenes] = genevarfilter(yeastvalues,genes);

References

[1] Kohane I.S., Kho A.T., Butte A.J. (2003), Microarrays for an Integrative Genomics, Cambridge, MA:MIT Press.

Version History

Introduced before R2006a