affyprobeseqread - Read data file containing probe sequence information for Affymetrix® GeneChip® array

Syntax

Struct = affyprobeseqread(SeqFile, CDFFile)

Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqPath', SeqPathValue, ...)
Struct = affyprobeseqread(SeqFile, CDFFile, ...'CDFPath', CDFPathValue, ...)
Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqOnly', SeqOnlyValue, ...)

Arguments

SeqFile

String specifying a file name of a sequence file (tab-separated or FASTA) that contains the following information for a specific type of Affymetrix GeneChip array:

  • Probe set IDs

  • Probe x-coordinates

  • Probe y-coordinates

  • Probe sequences in each probe set

  • Affymetrix GeneChip array type (FASTA file only)

The sequence file (tab-separated or FASTA) must be on the MATLAB® search path or in the Current Directory (unless you use the SeqPath property). In a tab-separated file, each row represents a probe; in a FASTA file, each header represents a probe.

CDFFile

Either of the following:

  • String specifying a file name of an Affymetrix CDF library file, which contains information that specifies which probe set each probe belongs to on a specific type of Affymetrix GeneChip array. The CDF library file must be on the MATLAB search path or in the MATLAB Current Directory (unless you use the CDFPath property).

  • CDF structure, such as returned by the affyread function, which contains information that specifies which probe set each probe belongs to on a specific type of Affymetrix GeneChip array.

    Caution   Make sure that SeqFile and CDFFile contain information for the same type of Affymetrix GeneChip array.

SeqPathValueString specifying a directory or path and directory where SeqFile is stored.
CDFPathValueString specifying a directory or path and directory where CDFFile is stored.
SeqOnlyValueControls the return of a structure, Struct, with only one field, SequenceMatrix. Choices are true or false (default).

Return Values

Struct

MATLAB structure containing the following fields:

  • ProbeSetIDs

  • ProbeIndices

  • SequenceMatrix

Description

Struct = affyprobeseqread(SeqFile, CDFFile) reads the data from files SeqFile and CDFFile, and stores the data in the MATLAB structure Struct, which contains the following fields.

FieldDescription
ProbeSetIDsCell array containing the probe set IDs from the Affymetrix CDF library file.
ProbeIndicesColumn vector containing probe indexing information. Probes within a probe set are numbered 0 through N - 1, where N is the number of probes in the probe set.
SequenceMatrixAn N-by-25 matrix of sequence information for the perfect match (PM) probes on the AffymetrixGeneChip array, where N is the number of probes on the array. Each row corresponds to a probe, and each column corresponds to one of the 25 sequence positions. Nucleotides in the sequences are represented by one of the following integers:
  • 0 — None

  • 1 — A

  • 2 — C

  • 3 — G

  • 4 — T

    Note   Probes without sequence information are represented in SequenceMatrix as a row containing all 0s.

    Tip   You can use the int2nt function to convert the nucleotide sequences in SequenceMatrix to letter representation.

Struct = affyprobeseqread(SeqFile, CDFFile, ...'PropertyName', PropertyValue, ...) calls affyprobeseqread with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqPath', SeqPathValue, ...)
lets you specify a path and directory where SeqFile is stored.

Struct = affyprobeseqread(SeqFile, CDFFile, ...'CDFPath', CDFPathValue, ...) lets you specify a path directory where CDFFile is stored.

Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqOnly', SeqOnlyValue, ...) controls the return of a structure, Struct, with only one field, SequenceMatrix. Choices are true or false (default).

Examples

  1. Read the data from a FASTA file and associated CDF library file, assuming both are located on the MATLAB search path or in the Current Directory.

    S1 = affyprobeseqread('HG-U95A_probe_fasta', 'HG_U95A.CDF');
    
  2. Read the data from a tab-separated file and associated CDF structure, assuming the tab-separated file is located in the specified directory and the CDF structure is in your MATLAB Workspace.

    S2 = affyprobeseqread('HG-U95A_probe_tab',hgu95aCDFStruct,...
         'seqpath','C:\Affymetrix\SequenceFiles\HGGenome');
    
  3. Access the nucleotide sequences of the first probe set (rows 1 through 20) in the SequenceMatrix field of the S2 structure.

    seq = int2nt(S2.SequenceMatrix(1:20,:))

See Also

Bioinformatics Toolbox™ functions: affyinvarsetnorm, affyread, celintensityread, int2nt, probelibraryinfo, probesetlink, probesetlookup, probesetplot, probesetvalues

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS