Skip to Main Content Skip to Search
Product Documentation

affyinvarsetnorm - Perform rank invariant set normalization on probe intensities from multiple Affymetrix CEL or DAT files

Syntax

NormData = affyinvarsetnorm(Data)
[NormData, MedStructure] = affyinvarsetnorm(Data)

... affyinvarsetnorm(..., 'Baseline', BaselineValue, ...)
... affyinvarsetnorm(..., 'Thresholds', ThresholdsValue, ...)
... affyinvarsetnorm(..., 'StopPercentile', StopPercentileValue, ...)
... affyinvarsetnorm(..., 'RayPercentile', RayPercentileValue, ...)
... affyinvarsetnorm(..., 'Method', MethodValue, ...)
... affyinvarsetnorm(..., 'Showplot', ShowplotValue, ...)

Arguments

Data

Matrix of intensity values where each row corresponds to a perfect match (PM) probe and each column corresponds to an Affymetrix CEL or DAT file. (Each CEL or DAT file is generated from a separate chip. All chips should be of the same type.)

MedStructure

Structure of each column's intensity median before and after normalization, and the index of the column chosen as the baseline.

BaselineValue

Property to control the selection of the column index N from Data to be used as the baseline column. Default is the column index whose median intensity is the median of all the columns.

ThresholdsValue

Property to set the thresholds for the lowest average rank and the highest average rank, which are used to determine the invariant set. The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold. The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank. Select these two thresholds empirically to limit the spread of the invariant set, but allow enough data points to determine the normalization relationship.

ThresholdsValue is a 1-by-2 vector [LT, HT] where LT is the threshold for the lowest average rank and HT is threshold for the highest average rank. Values must be between 0 and 1. Default is [0.05, 0.005].

StopPercentileValue

Property to stop the iteration process when the number of data points in the invariant set reaches N percent of the total number of data points. Default is 1.

    Note   If you do not use this property, the iteration process continues until no more data points are eliminated.

RayPercentileValue

Property to select the N percentage of the highest ranked invariant set of data points to fit a straight line through, while the remaining data points are fitted to a running median curve. The final running median curve is a piecewise linear curve. Default is 1.5.

MethodValue

Property to select the smoothing method used to normalize the data. Enter 'lowess' or 'runmedian'. Default is 'lowess'.

ShowplotValue

Property to control the plotting of two pairs of scatter plots (before and after normalization). The first pair plots baseline data versus data from a specified column (chip) from the matrix Data. The second is a pair of M-A scatter plots, which plots M (ratio between baseline and sample) versus A (the average of the baseline and sample). Enter either 'all' (plot a pair of scatter plots for each column or chip) or specify a subset of columns (chips) by entering the column number(s) or a range of numbers.

For example:

  • ..., 'Showplot', 3, ...) plots data from column 3.

  • ..., 'Showplot', [3,5,7], ...) plots data from columns 3, 5, and 7.

  • ..., 'Showplot', 3:9, ...) plots data from columns 3 to 9.

Description

NormData = affyinvarsetnorm(Data) normalizes the values in each column (chip) of probe intensities in Data to a baseline reference, using the invariant set method. NormData is a matrix of normalized probe intensities from Data.

Specifically, affyinvarsetnorm:

[NormData, MedStructure] = affyinvarsetnorm(Data) also returns a structure of the index of the column chosen as the baseline and each column's intensity median before and after normalization.

... affyinvarsetnorm(..., 'PropertyName', PropertyValue, ...) calls affyinvarsetnorm with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


... affyinvarsetnorm(..., 'Baseline', BaselineValue, ...)
lets you select the column index N from Data to be the baseline column. Default is the index of the column whose median intensity is the median of all the columns.

... affyinvarsetnorm(..., 'Thresholds', ThresholdsValue, ...) sets the thresholds for the lowest average rank and the highest average rank, which are used to determine the invariant set. The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold. The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank. Select these two thresholds empirically to limit the spread of the invariant set, but allow enough data points to determine the normalization relationship.

ThresholdsValue is a 1-by-2 vector [LT, HT], where LT is the threshold for the lowest average rank and HT is threshold for the highest average rank. Values must be between 0 and 1. Default is [0.05, 0.005].

... affyinvarsetnorm(..., 'StopPercentile', StopPercentileValue, ...) stops the iteration process when the number of data points in the invariant set reaches N percent of the total number of data points. Default is 1.

... affyinvarsetnorm(..., 'RayPercentile', RayPercentileValue, ...) selects the N percentage of the highest ranked invariant set of data points to fit a straight line through, while the remaining data points are fitted to a running median curve. The final running median curve is a piecewise linear curve. Default is 1.5.

... affyinvarsetnorm(..., 'Method', MethodValue, ...) selects the smoothing method for normalizing the data. When MethodValue is 'lowess', affyinvarsetnorm uses the lowess method. When MethodValue is 'runmedian', affyinvarsetnorm uses the running median method. Default is 'lowess'.

... affyinvarsetnorm(..., 'Showplot', ShowplotValue, ...) plots two pairs of scatter plots (before and after normalization). The first pair plots baseline data versus data from a specified column (chip) from the matrix Data. The second is a pair of M-A scatter plots, which plots M (ratio between baseline and sample) versus A (the average of the baseline and sample). When ShowplotValue is 'all', affyinvarsetnorm plots a pair of scatter plots for each column or chip. When ShowplotValue is a number(s) or range of numbers, affyinvarsetnorm plots a pair of scatter plots for the indicated column numbers (chips).

For example:

Examples

  1. Load a MAT-file, included with the Bioinformatics Toolbox software, which contains Affymetrix data variables, including pmMatrix, a matrix of PM probe intensity values from multiple CEL files.

    load prostatecancerrawdata
    
  2. Normalize the data in pmMatrix, using the affyinvarsetnorm function.

    NormMatrix = affyinvarsetnorm(pmMatrix);
    

The prostatecancerrawdata.mat file used in the previous example contains data from Best et al., 2005.

References

[1] Li, C., and Wong, W.H. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2(8): research0032.1-0032.11.

[2] http://biosun1.harvard.edu/complab/dchip/normalizing%20arrays.htm#isn

[3] Best, C.J.M., Gillespie, J.W., Yi, Y., Chandramouli, G.V.R., Perlmutter, M.A., Gathright, Y., Erickson, H.S., Georgevich, L., Tangrea, M.A., Duray, P.H., Gonzalez, S., Velasco, A., Linehan, W.M., Matusik, R.J., Price, D.K., Figg, W.D., Emmert-Buck, M.R., and Chuaqui, R.F. (2005). Molecular alterations in primary prostate cancer after androgen ablation therapy. Clinical Cancer Research 11, 6823–6834.

See Also

affyread | celintensityread | mainvarsetnorm | malowess | manorm | quantilenorm | rmabackadj | rmasummary

  


Free Computational Biology Interactive Kit

See how to analyze, visualize, and model biological data and systems using MathWorks products.

Get free kit

Trials Available

Try the latest computational biology products.

Get trial software
 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS