AffyStruct |
MATLAB structure containing information from an Affymetrix data
or library file, for expression, genotyping (SNP), or resequencing
assay types.
The following tables describe the fields in AffyStruct for
the different Affymetrix file types.
EXP, DAT, CEL, CHP, CLF, BGP, CDF, and GIN Files | Field | Description |
| Name | File name. |
| DataPath | Path and directory of the file. |
| LibPath | Path and directory of the CDF and GIN library files associated
with the file you are reading. |
| FullPathName | Path and directory of the file. |
| ChipType | Name of the AffymetrixGeneChip array (for example,
DrosGenome1 or HG-Focus). |
| Date or CreateDate | File creation
date. |
EXP File | Field | Description |
ChipLot Operator SampleType SampleDesc Project Comments Reagents ReagentLot Protocol Station Module HybridizeDate ScanPixelSize ScanFilter ScanDate ScannerID NumberOfScans ScannerType NumProtocolSteps ProtocolSteps | Information about experimental conditions and protocols captured
by the Affymetrix software. |
DAT File | Field | Description |
| NumPixelsPerRow | Number of pixels per row in the image created from the GeneChip array
(number of columns). |
| NumRows | Number of rows in the image created from the GeneChip array. |
| MinData | Minimum intensity value in the image created from the GeneChip array. |
| MaxData | Maximum intensity value in the image created from the GeneChip array. |
| PixelSize | Size of one pixel in the image created from the GeneChip array. |
| CellMargin | Size of gaps between cells in the image created from the GeneChip array. |
| ScanSpeed | Speed of the scanner used to create the image. |
| ScanDate | Date the scan was performed. |
| ScannerID | Name of the scanning device used. |
UpperLeftX UpperLeftY UpperRightX UpperRightY LowerLeftX LowerLeftY LowerRightX LowerRightY | Pixel coordinates of the scanned image. |
| ServerName | Not used. |
| Image | A NumRows-by-NumPixelsPerRow image
of the scanned GeneChip array. |
CEL File | Field | Description |
| FileVersion | Version of the CEL file format. |
| Algorithm | Algorithm used in the image-processing step that converts from
DAT format to CEL format. |
| AlgParams | String containing parameters used by the algorithm in the image-processing
step. |
| NumAlgParams | Number of parameters in AlgParams. |
| CellMargin | Size of gaps between cells in the image created from the GeneChip array,
used for computing the intensity values of the cells. |
| Rows | Number of rows of probes. |
| Cols | Number of columns of probes. |
| NumMasked | Number of masked probes, which are not used in subsequent processing. |
| NumOutliers | Number of cells identified as outliers (extremely high or extremely
low intensity) by the image-processing step. |
| NumProbes | Number of probes (Rows * Cols)
on the GeneChip array. |
UpperLeftX UpperLeftY UpperRightX UpperRightY LowerLeftX LowerLeftY LowerRightX LowerRightY | Pixel coordinates of the scanned image. |
| ProbeColumnNames | Cell array containing the eight column names in the Probes field:PosX — x-coordinate
of the cell PosY — y-coordinate
of the cell Intensity — Intensity value
of the cell StdDev — Standard deviation
of intensity value Pixels — Number of pixels
in the cell Outlier — True/false flag
indicating if the cell was marked as an outlier Masked — True/false flag
indicating if the cell was masked ProbeType — Integer indicating
the probe type (for example, 1 = expression)
|
| Probes | NumProbes-by-8 array of information about
the individual probes, including intensity values. The ProbeColumnNames field
contains the column names of this array. |
CHP File | Field | Description |
| AssayType | Type of assay associated with the GeneChip array (for
example, Expression, Genotyping, or Resequencing). |
| CellFile | File name of the CEL file from which the CHP file was created. |
| Algorithm | Algorithm used to convert from CEL format to CHP format. |
| AlgVersion | Version of the algorithm used to create the CHP file. |
| NumAlgParams | Number of parameters in AlgParams. |
| AlgParams | String containing parameters used in steps required to create
the CHP file (for example, background correction). |
| NumChipSummary | Number of entries in ChipSummary. |
| ChipSummary | Summary information for the GeneChip array, including
background average, standard deviation, max, and min. |
| BackgroundZones | Structure containing information about the zones used in the
background adjustment step. |
| Rows | Number of rows of probes. |
| Cols | Number of columns of probes. |
| NumProbeSets | Number of probe sets on the GeneChip array. |
| NumQCProbeSets | Number of QC probe sets on the GeneChip array. |
ProbeSets (Expression GeneChip array) | NumProbeSets-by-1 structure array containing
information for each expression probe set, including the following
fields:Name — Name of the probe
set. ProbeSetType — Type of the
probe set. CompDataExists — True/false
flag indicating if the probe set has additional computed information. NumPairs — Number of probe
pairs in the probe set. NumPairsUsed — Number of
probe pairs in the probe set used for calculating the probe set signal
(not masked). Signal — Summary intensity
value for the probe set. Detection — Indicator of
statistically significant difference between the intensity value of
the PM probes and the intensity value of the MM probes in a single
probe set (Present, Absent,
or Marginal). DetectionPValue — P-value
for the Detection indicator. CommonPairs — When CompDataExists is true,
contains the number of common pairs between the experiment and the
baseline after the removal of outliers and masked probes. SignalLogRatio — When CompDataExists is true,
contains the change in signal between the experiment and baseline. SignalLogRatioLow — When CompDataExists is true,
contains the lowest ratios of probes between the experiment and the
baseline. SignalLogRatioHigh — When CompDataExists is true,
contains the highest ratios of probes between the experiment and the
baseline. Change — When CompDataExists is true,
describes how the probe changes versus a baseline experiment. Choices
are Increase, Marginal Increase, No
Change, Decrease, or Marginal
Decrease. ChangePValue — When CompDataExists is true,
contains the p-value associated with Change.
|
ProbeSets (Genotyping GeneChip array) | NumProbeSets-by-1 structure array containing
information for each genotyping probe set, including the following
fields:Name — Name of the probe
set. AlleleCall — Allele that
is present for the probe set. Possibilities are AA (homozygous
for the major allele), AB (heterozygous for the
major and minor allele), BB (homozygous for the
minor allele), or NoCall (unable to determine allele). Confidence — Measure of
the accuracy of the allele call. RAS1 — Relative Allele Signal
1 for the SNP site, which is calculated using sense probes. RAS2— Relative Allele Signal
2 for the SNP site, which is calculated using antisense probes. PValueAA — p-value for an AA call. PValueAB — p-value for an AB call. PValueBB — p-value for a BB call. PValueNoCall — p-value for
a NoCall call.
|
ProbeSets (Resequencing GeneChip array) | NumProbeSets-by-1 structure array containing
information for each resequencing probe set, including the following
fields:CalledBases — 1-by-NumProbeSets character
array containing the bases called by the resequencing algorithm. Possible
values are a, c, g, t,
and n. Scores — 1-by-NumProbeSets array
containing the score associated with each base call.
|
CLF File | Field | Description |
| LibSetName | Name of a collection of related library files for a given chip.
There is only one LibSetName for a CLF file. For
example, PGF and CLF files intended for use together must have the
same LibSetName. |
| LibSetVersion | Version of a collection of related library files for a given
chip. There is only one LibSetVersion for a CLF
file. For example, PGF and CLF files intended for use together must
have the same LibSetVersion. |
| GUID | Unique identifier for the CLF file. |
| CLFFormatVersion | Version of the CLF file format. |
| Rows | Number of rows in the CEL file. Note
The CLF file is 1 base, which means the first row and column
are designated 1,1, not 0,0. |
|
| Cols | Number of columns in the CEL file. Note
The CLF file is 1 base, which means the first row and column
are designated 1,1, not 0,0. |
|
| StartID | Starting number for the numbering of elements in the
CLF file. Tip
This information is useful when numbering does not start with
1. |
|
| EndID | Ending number for the numbering of elements in the CLF
file. Tip
This information is useful when numbering does not start with
1 and/or there are gaps in the numbering. |
|
| Order | Order in which the probe IDs are numbered in the CEL file,
either 'row_major' or 'col_major'. |
| DataColNames | Names of the columns in the CEL file that contain data. |
| Data | If the numbering of elements in the CLF file is sequential,
this field contains a function handle that calculates the x-
and y- coordinates of each element in the file
from the probe ID. If the numbering of elements in the
CLF file is not sequential, this field contains a matrix indicating
the number value of each element in the file. |
BGP File | Field | Description |
| LibSetName | Name of a collection of related library files for a given chip.
There is only one LibSetName for a BGP file. |
| LibSetVersion | Version of a collection of related library files for a given
chip. There is only one LibSetVersion for a BGP
file. |
| GUID | Unique identifier for a BGP file. |
| ExecGUID | Information about the algorithm used
to generate the BGP file. |
| ExecVersion |
| Cmd |
| Data | Structure containing the following fields:probe_id — ID of the probe
to use for background correction. probeset_id — ID of the
probe set in the PGF file to which the probe belongs. type — Classification information
for the probe. gc_count — Combined number
of G and C bases in the probe. probe_length— Length of
the probe in base pairs. interrogation_position —
Interrogation position of the probe. It is typically 13 for 25-mer
PM/MM probes. probe_sequence — Sequence
of the probe on the array, going in the direction from array surface
to solution. For most standard Affymetrix arrays, this direction
is from 3' to 5'. For example, for a sense target (st) probe (see
the probe_type field), complement the sequence
in this field before looking for matches to transcript sequences.
For an antisense target (at), reverse this sequence. atom_id — ID of the atom
to which the probe belongs. x — Column coordinate of
the probe in the CEL file. y — Row coordinate of the
probe in the CEL file. probeset_type — Classification
information for the probe set, such as control, affx, or spike. This
type information can include multiple classifications and can also
be nested. probe_type — Classification
information for the probe, such as pm (perfect match), mm (mismatch),
st (sense target), or at (antisense target). This type information
can include multiple classifications and can also be nested.
|
CDF File | Field | Description |
| Rows | Number of rows of probes. |
| Cols | Number of columns of probes. |
| NumProbeSets | Number of probe sets on the GeneChip array. |
| NumQCProbeSets | Number of QC probe sets on the GeneChip array. |
| ProbeSetColumnNames | Cell array containing the six column names in the ProbePairs field
in the ProbeSets array:GroupNumber — Number identifying
the group to which the probe pair belongs. For expression arrays,
this value is always 1. For genotyping arrays,
this value is typically 1 (allele A, sense), 2 (allele
B, sense), 3 (allele A, antisense), or 4 (allele
B, antisense). Direction — Number identifying
the direction of the probe pair. 1 = sense and 2 =
antisense. PMPosX — x-coordinate
of the perfect match probe. PMPosY — y-coordinate
of the perfect match probe. MMPosX — x-coordinate
of the mismatch probe. MMPosY — y-coordinate
of the mismatch probe.
|
| ProbeSets | NumProbeSets-by-1 structure array containing
information for each probe set, including the following fields:Name — Name of the probe
set. ProbeSetType — Type of the
probe set. CompDataExists — True/false
flag indicating if the probe set has additional computed information. NumPairs — Number of probe
pairs in the probe set. NumQCProbes — Number of
QC probes in the probe set. QCType — Type of QC probes. GroupNames — Name of the
group to which the probe set belongs. For expression arrays, this
field contains the name of the probe set. For genotyping arrays, this
field contains the name of the alleles, for example {'A'
'C' 'A' 'C'}'. ProbePairs — NumPairs-by-6
array of information about the probe pairs. The column names of this
array are contained in the ProbeSetColumnNames field.
|
GIN File | Field | Description |
| Version | GIN file format version. |
| ProbeSetName | Probe set ID/name. |
| ID | Identifier for the probe set (gene ID). |
| Description | Description of the probe set. |
| SourceNames | Source or sources of the probe sets. |
| SourceURL | Source URL or URLs for the probe sets. |
| SourceID | Vector of numbers specifying which SourceNames or SourceURL each
probe set is associated with. |
|