affyread - Read microarray data from Affymetrix® GeneChip® file

Syntax

AffyStruct = affyread(File)
AffyStruct = affyread(File, LibraryPath)

Arguments

File

String specifying a file name or a path and file name of one of the following Affymetrix file types:

  • EXP — Data file containing information about experimental conditions and protocols.

  • DAT — Data file containing raw image data (pixel intensity values).

  • CEL — Data file containing information about the intensity values of the individual probes.

  • CHP — Data file containing summary information of the probe sets, including intensity values.

  • CDF — Library file containing information about which probes belong to which probe set.

  • GIN — Library file containing information about the probe sets, such as the gene name with which the probe set is associated.

If you specify only a file name, that file must be on the MATLAB® search path or in the current directory. If you specify only a file name of a CDF or GIN library file, you can specify the path and directory in the LibraryPath input argument.

LibraryPath

String specifying the path and directory of a:

  • CDF library file associated with File when File is a CHP file.

  • CDF library file when File is a CDF file.

  • GIN library file when File is a GIN file.

    Note   If you do not specify LibraryPath when reading a CHP file, affyread looks in the current directory for the CDF file. If it does not find the CDF file, it still reads the CHP file, but the probe set names and types will be omitted from the return value, AffyStruct.

Return Values

AffyStructMATLAB structure containing information from an Affymetrix data or library file, for expression, genotyping (SNP), or resequencing assay types.

Description

AffyStruct = affyread(File) reads File, an Affymetrix file, and creates AffyStruct, a MATLAB structure. The affyread function can read Affymetrix EXP, DAT, CEL, CHP, CDF, and GIN files created from Affymetrix GeneChip arrays for expression, genotyping (SNP), or resequencing assays.

AffyStruct = affyread(File, LibraryPath) specifies the path and directory of a:

You can learn more about the Affymetrix GeneChip files and download sample files from:

http://www.affymetrix.com/support/technical/sample_data/demo_data.affx

The following tables describe the fields in AffyStruct for the different Affymetrix file types.

File Types EXP, DAT, CEL, CHP, CDF, and GIN

FieldDescription
NameFile name.
DataPathPath and directory of the file.
LibPathPath and directory of the CDF and GIN library files associated with the file being read.
FullPathNamePath and directory of the file.
ChipTypeName of the AffymetrixGeneChip array (for example, DrosGenome1 or HG-Focus).
DateDate the file was created.

EXP File

FieldDescription
ChipLot
Operator
SampleType
SampleDesc
Project
Comments
Reagents
ReagentLot
Protocol
Station
Module
HydridizeDate
ScanPixelSize
ScanFilter
ScanDate
ScannerID
NumberOfScans
ScannerType
NumProtocolSteps
ProtocolSteps
Information about experimental conditions and protocols captured by the Affymetrix software.

DAT File

FieldDescription
NumPixelsPerRowNumber of pixels per row in the image created from the GeneChip array (number of columns).
NumRowsNumber of rows in the image created from the GeneChip array.
MinDataMinimum intensity value in the image created from the GeneChip array.
MaxDataMaximum intensity value in the image created from the GeneChip array.
PixelSizeSize of one pixel in the image created from the GeneChip array.
CellMarginSize of gaps between cells in the image created from the GeneChip array.
ScanSpeedSpeed of the scanner used to create the image.
ScanDateDate the scan was performed.
ScannerIDName of the scanning device used.
UpperLeftX
UpperLeftY
UpperRightX
UpperRightY
LowerLeftX
LowerLeftY
LowerRightX
LowerRightY
Pixel coordinates of the scanned image.
ServerNameNot used.
ImageA NumRows-by-NumPixelsPerRow image of the scanned GeneChip array.

CEL File

FieldDescription
FileVersionVersion of the CEL file format.
AlgorithmAlgorithm used in the image processing step that converts from DAT format to CEL format.
AlgParamsString containing parameters used by the algorithm in the image processing step.
NumAlgParamsNumber of parameters in AlgParams.
CellMarginSize of gaps between cells in the image created from the GeneChip array, used for computing the intensity values of the cells.
RowsNumber of rows of probes.
ColsNumber of columns of probes.
NumMaskedNumber of probes that are masked and not used in subsequent processing.
NumOutliersNumber of cells identified as outliers (very high or very low intensity) by the image processing step.
NumProbesNumber of probes (Rows * Cols) on the GeneChip array.
UpperLeftX
UpperLeftY
UpperRightX
UpperRightY
LowerLeftX
LowerLeftY
LowerRightX
LowerRightY
Pixel coordinates of the scanned image.
ProbeColumnNamesCell array containing the eight column names in the Probes field:
  • PosXx-coordinate of the cell

  • PosYy-coordinate of the cell

  • Intensity — Intensity value of the cell

  • StdDev — Standard deviation of intensity value

  • Pixels — Number of pixels in the cell

  • Outlier — True/false flag indicating if the cell was marked as an outlier

  • Masked — True/false flag indicating if the cell was masked

  • ProbeType — Integer indicating the probe type (for example, 1 = expression)

ProbesNumProbes-by-8 array of information about the individual probes, including intensity values. The columns of this array are contained in the ProbeColumnNames field.

CHP File

FieldDescription
AssayTypeType of assay that the GeneChip array contained (for example, Expression, Genotyping, or Resequencing).
CellFileFile name of the CEL file from which the CHP file was created.
AlgorithmAlgorithm used to convert from CEL format to CHP format.
AlgVersionVersion of the algorithm used to create the CHP file.
NumAlgParamsNumber of parameters in AlgParams.
AlgParamsString containing parameters used in steps needed to create the CHP file (for example, background correction).
NumChipSummaryNumber of entries in ChipSummary.
ChipSummarySummary information for the GeneChip array, including background average, standard deviation, max, and min.
BackgroundZonesStructure containing information about the zones used in the background adjustment step.
RowsNumber of rows of probes.
ColsNumber of columns of probes.
NumProbeSetsNumber of probe sets on the GeneChip array.
NumQCProbeSetsNumber of QC probe sets on the GeneChip array.

ProbeSets

(Expression GeneChip array)

A NumProbeSets-by-1 structure array containing information for each expression probe set, including the following fields:
  • Name — Name of the probe set.

  • ProbeSetType — Type of the probe set.

  • CompDataExists — True/false flag indicating if the probe set has additional computed information.

  • NumPairs — Number of probe pairs in the probe set.

  • NumPairsUsed — Number of probe pairs in the probe set used for calculating the probe set signal (not masked).

  • Signal — Summary intensity value for the probe set.

  • Detection — Indicator of statistically significant difference between the intensity value of the PM probes and the intensity value of the MM probes in a single probe set (Present, Absent, or Marginal).

  • DetectionPValue — P value for the Detection indicator.

  • CommonPairs — When CompDataExists is true, contains the number of common pairs between the experiment and the baseline after outliers and masked probes have been removed.

  • SignalLogRatio — When CompDataExists is true, contains the change in signal between the experiment and baseline.

  • SignalLogRatioLow — When CompDataExists is true, contains the lowest ratios of probes between the experiment and the baseline.

  • SignalLogRatioHigh — When CompDataExists is true, contains the highest ratios of probes between the experiment and the baseline.

  • Change — When CompDataExists is true, describes how the probe is changed versus a baseline experiment. Choices are Increase, Marginal Increase, No Change, Decrease, or Marginal Decrease.

  • ChangePValue — When CompDataExists is true, contains the p-value associated with Change.

ProbeSets

(Genotyping GeneChip array)

A NumProbeSets-by-1 structure array containing information for each genotyping probe set, including the following fields:
  • Name — Name of the probe set.

  • AlleleCall — Allele that is present for the probe set. Possibilities are AA (homozygous for the major allele), AB (heterozygous for the major and minor allele), BB (homozygous for the minor allele), or NoCall (unable to determine allele).

  • Confidence — A measure of the accuracy of the allele call.

  • RAS1 — Relative Allele Signal 1 for the SNP site, which is calculated using sense probes.

  • RAS2— Relative Allele Signal 2 for the SNP site, which is calculated using antisense probes.

  • PValueAA — p-value for an AA call.

  • PValueAB — p-value for an AB call.

  • PValueBB — p-value for a BB call.

  • PValueNoCall — p-value for a NoCall call.

ProbeSets

(Resequencing GeneChip array)

A NumProbeSets-by-1 structure array containing information for each resequencing probe set, including the following fields:
  • CalledBases — A 1-by-NumProbeSets character array containing the bases called by the resequencing algorithm. Possible values are a, c, g, t, and n.

  • Scores — A 1-by-NumProbeSets array containing the score associated with each base call.

CDF File

FieldDescription
RowsNumber of rows of probes.
ColsNumber of columns of probes.
NumProbeSetsNumber of probe sets on the GeneChip array.
NumQCProbeSetsNumber of QC probe sets on the GeneChip array.
ProbeSetColumnNamesCell array containing the six column names in the ProbePairs field in the ProbeSets array:
  • GroupNumber — Number identifying the group to which the probe pair belongs. For expression arrays, this is always 1. For genotyping arrays, this is typically 1 (allele A, sense), 2 (allele B, sense), 3 (allele A, antisense), or 4 (allele B, antisense).

  • Direction — Number identifying the direction of the probe pair. 1 = sense and 2 = antisense.

  • PMPosXx-coordinate of the perfect match probe.

  • PMPosYy-coordinate of the perfect match probe.

  • MMPosXx-coordinate of the mismatch probe.

  • MMPosYy-coordinate of the mismatch probe.

ProbeSetsA NumProbeSets-by-1 structure array containing information for each probe set, including the following fields:
  • Name — Name of the probe set.

  • ProbeSetType — Type of the probe set.

  • CompDataExists — True/false flag indicating if the probe set has additional computed information.

  • NumPairs — Number of probe pairs in the probe set.

  • NumQCProbes — Number of QC probes in the probe set.

  • QCType — Type of QC probes.

  • GroupNames — Name of the group to which the probe set belongs. For expression arrays, this is the name of the probe set. For genotyping arrays, this is the name of the alleles, for example {'A' 'C' 'A' 'C'}'.

  • ProbePairsNumPairs-by-6 array of information about the probe pairs. The column names of this array are contained in the ProbeSetColumnNames field.

GIN File

FieldDescription
VersionGIN file format version.
ProbeSetNameProbe set ID/name.
IDIdentifier for the probe set (gene ID).
DescriptionDescription of the probe set.
SourceNamesSource(s) of the probe sets.
SourceURLSource URL(s) for the probe sets.
SourceIDVector of numbers specifying which SourceNames or SourceURL each probe set is associated with.

Examples

The following example uses the demo data and CDF library file from the E. coli Antisense Genome array, which you can download from:

http://www.affymetrix.com/support/technical/sample_data/demo_data.affx

After you download the demo data, you will need the Affymetrix Data Transfer Tool to extract the CEL, DAT, and CHP files from a DTT file. You can download the Affymetrix Data Transfer Tool from:

http://www.affymetrix.com/products/software/specific/dtt.affx

The following example assumes that files Ecoli-antisense-121502.CEL, Ecoli-antisense-121502.dat, and Ecoli-antisense-121502.chp are stored on the MATLAB search path or in the current directory. It also assumes that the associated CDF library file, Ecoli_ASv2.CDF, is stored at D:\Affymetrix\LibFiles\Ecoli.

  1. Read the contents of a CEL file into a MATLAB structure.

    celStruct = affyread('Ecoli-antisense-121502.CEL');
  2. Display a spatial plot of the probe intensities.

    maimage(celStruct, 'Intensity')

  3. Zoom in on a specific region of the plot.

    axis([200 340 0 70])

  4. Read the contents of a DAT file into a MATLAB structure, display the raw image data, and then use the axis image function to set the correct aspect ratio.

    datStruct = affyread('Ecoli-antisense-121502.dat');
    imagesc(datStruct.Image)
    axis image

  5. Zoom in on a specific region of the plot.

    axis([1900 2800 160 650])

  6. Read the contents of a CHP file into a MATLAB structure, specifying the location of the associated CDF library file. Then extract information for probe set 3315278.

    chpStruct = affyread('Ecoli-antisense-121502.chp',...
                'D:\Affymetrix\LibFiles\Ecoli');
    geneName = probesetlookup(chpStruct,'3315278')
    
    geneName = 
    
          Identifier: '3315278'
        ProbeSetName: 'argG_b3172_at'
            CDFIndex: 5213
            GINIndex: 3074
         Description: [1x82 char]
              Source: 'NCBI EColi Genome'
           SourceURL: [1x74 char]
    

See Also

Bioinformatics Toolbox™ functions: agferead, celintensityread, gprread, ilmnbsread, probelibraryinfo, probesetlink, probesetlookup, probesetplot, probesetvalues, sptread

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS