Bioinformatics Toolbox 3.4
Working with Objects for Microarray Experiment Data
Microarray experimental data are very complex, usually consisting of data and information from a number of different sources. Storing and managing the large and complex data sets in a coherent manner has always being a challenge. Bioinformatics Toolbox™ provides a set of objects to represent the different pieces of data from a microarray experiment.
Contents
It is easier to manage all the data from a microarray experiment if the different pieces can be organized and stored into a single data structure. The ExpressionSet class is a single convenient data structure for storing and coordinating the different data objects from a microarray gene expression experiment.
An ExpressionSet object consists of these four components that are common to all microarray gene expression experiments:
Experiment data: Expression values from microarray experiments. These data are stored as an instance of the ExptData class.
Sample information: The metadata describing the samples in the experiment. The sample metadata are stored as an instance of the MetaData class.
Array feature annotations: The annotations about the features or probes on the array used in the experiment. The annotations can be stored as an instance of the MetaData class.
Experiment descriptions: Information to describe the experiment methods and conditions. The information can be stored as an instance of the MIAME class.
The ExpressionSet class coordinates and validates these data components. The class provides methods for retrieving and setting the data stored in an ExpressionSet object. An ExpressionSet object also behaves like many other MATLAB® data structures that can be subsetted and copied. In this demonstration, you will create and manipulate the data objects designed for storing data from a microarray experiment.
Experiment Data
In a microarray gene expression experiment, the measured expression values for each feature per sample can be represented as a two-dimensional (2D) matrix. The matrix has F rows and S columns, where F is the number of features on the array, and S is the number of samples on which the expression values were measured. A DataMatrix object is designed to contain this type of data. It is a 2D matrix with row and column names. A DataMatrix object can be indexed not only by its row and column numbers, logical vectors, but also by its row and column names. But linear indexing is not supported.
For example, create a Datamatrix with row and column names:
dm = bioma.data.DataMatrix(rand(5,4), 'RowNames','Feature', 'ColNames', 'Sam ple')
dm =
Sample1 Sample2 Sample3 Sample4
Feature1 0.48374 0.82328 0.51091 0.61947
Feature2 0.026944 0.54336 0.13792 0.63465
Feature3 0.30483 0.18523 0.20027 0.72983
Feature4 0.545 0.88924 0.28461 0.31455
Feature5 0.12275 0.14999 0.086404 0.14322
The function size returns the number of rows and columns in a DataMatrix object.
size(dm)
ans =
5 4
You can index into a DataMatrix object like other MATLAB numeric arrays by using row and column numbers. For example, access the elements at rows 1 and 2, column 3 of dm:
dm(1:2, 3)
ans =
Sample3
Feature1 0.51091
Feature2 0.13792
You can also index into a DataMatrix object by using its row and column names. Reassign the elements in row 2 and 3, column 1 and 4 to different values:
dm({'Feature2', 'Feature3'}, {'Sample1', 'Sample4'}) = [2, 3; 4, 5
]
dm =
Sample1 Sample2 Sample3 Sample4
Feature1 0.48374 0.82328 0.51091 0.61947
Feature2 2 0.54336 0.13792 3
Feature3 4 0.18523 0.20027 5
Feature4 0.545 0.88924 0.28461 0.31455
Feature5 0.12275 0.14999 0.086404 0.14322
The example gene expression data used in this demonstration is a small set of data from a microarray experiment profiling adult mouse gene expression patterns in common strains on the Affymetrix® MG-U74Av2 array [1]. The file mouseExprsData.txt contains the small set of expression values in a table format.
Read the expression values from the file mouseExprsData.txt into MATLAB Workspace as a DataMatrix object:
exprsData = bioma.data.DataMatrix('file', 'mouseExprsData.txt');
class(exprsData)
ans = bioma.data.DataMatrix
Get the properties of the DataMatrix object, exprsData.
get(exprsData)
Name: 'mouseExprsData'
RowNames: {500x1 cell}
ColNames: {1x26 cell}
NRows: 500
NCols: 26
NDims: 2
ElementClass: 'double'
Check the sample names:
colnames(exprsData)
ans =
Columns 1 through 18
'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K
' 'L' 'M' 'N' 'O' 'P' 'Q' 'R'
Columns 19 through 26
'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z'
View the first 10 rows and 5 columns:
exprsData(1:10, 1:5)
ans =
A B C D E
100001_at 2.26 20.14 31.66 14.58 16.04
100002_at 158.86 236.25 206.27 388.71 388.09
100003_at 68.11 105.45 82.92 82.9 60.38
100004_at 74.32 96.68 84.87 72.26 98.38
100005_at 75.05 53.17 57.94 60.06 63.91
100006_at 80.36 42.89 77.21 77.24 40.31
100007_at 216.64 191.32 219.48 237.28 298.18
100009_r_at 3806.7 1425 2468.5 2172.7 2237.2
100010_at NaN NaN NaN 7.18 22.37
100011_at 81.72 72.27 127.61 91.01 98.13
Many of the basic MATLAB array operations also work with a DataMatrix object. For example, you can log2 transform the expression values:
exprsData_log2 = log2(exprsData);
View the first 10 rows and 5 columns
exprsData_log2(1:10, 1:5)
ans =
A B C D E
100001_at 1.1763 4.332 4.9846 3.8659 4.0036
100002_at 7.3116 7.8842 7.6884 8.6026 8.6002
100003_at 6.0898 6.7204 6.3736 6.3733 5.916
100004_at 6.2157 6.5951 6.4072 6.1751 6.6203
100005_at 6.2298 5.7325 5.8565 5.9083 5.998
100006_at 6.3284 5.4226 6.2707 6.2713 5.3331
100007_at 7.7592 7.5798 7.7779 7.8904 8.22
100009_r_at 11.894 10.477 11.269 11.085 11.127
100010_at NaN NaN NaN 2.844 4.4835
100011_at 6.3526 6.1753 6.9956 6.508 6.6166
Change the Name property to be descriptive about exprsData_log2:
exprsData_log2 = set(exprsData_log2, 'Name', 'Log2 Based mouseExprsData');
get(exprsData_log2)
Name: 'Log2 Based mouseExprsData'
RowNames: {500x1 cell}
ColNames: {1x26 cell}
NRows: 500
NCols: 26
NDims: 2
ElementClass: 'double'
In a microarray experiment, the data set often contains one or more matrices that have the same number of rows and columns and identical row names and column names, like in two-color microarray experiments. ExptData class is designed to contain and coordinate one or more data matrices with same dimensional properties, i.e. same dimension size, identical row names and column names. The data values are stored in an ExptData object as DataMatrix objects. Each DataMatrix object is considered an element in an ExptData object. The ExptData class is responsible for data validation and coordination between these DataMatrix objects. For demonstration purposes, you will store the gene expression data of natural scale and log2 base expression values separately in an instance of ExptData class.
mouseExptData = bioma.data.ExptData(exprsData, exprsData_log2,... 'ElementNames', {'natualExprs', 'log2Exprs'})
mouseExptData = Experiment Data: 500 features, 26 samples 2 elements Element names: natualExprs, log2Exprs
Access a DataMatrix element in mouseExptData using the element name.
exprsData2 = mouseExptData('log2Exprs');
get(exprsData2)
Name: 'Log2 Based mouseExprsData'
RowNames: {500x1 cell}
ColNames: {1x26 cell}
NRows: 500
NCols: 26
NDims: 2
ElementClass: 'double'
ExptData does not allow input matrices of different size or DataMatrix objecs with different row or column names. It would error in following case.
try mouseExptData = bioma.data.ExptData(exprsData, dm,... 'ElementNames', {'naturalExprs', 'log2Exprs'}) catch ME disp(ME.message) end
The input variables have mismatching size.
Sample Metadata
The metadata about the samples in a microarray experiment can be represented as a table with S rows and V columns, where S is the number of samples, and V is the number of variables. The contents of the table are the values for each sample per variable. For example, the file mouseSampleData.txt contains such a table. Alternately, this table of variable values can be stored in a dataset array.
Users often find that simple column names do not provide enough information about the variables. What is the name supposed to represent? What units are the variables measured in? Another table can contain such description metadata about variables. In this table of metadata, rows represent variables and at least one column contains a description of each variables. For example, the file mouseSampleData.txt contains descriptions about the sample variables (The lines are each prefaced with a # symbol. The metadata about the variables can also be stored in a dataset array.
The MetaData class is designed for storing and manipulating variable values and their metadata in a coordinated fashion. You can read the mouseSampleData.txt file into MATLAB as a MetaData object.
sData = bioma.data.MetaData('file', 'mouseSampleData.txt', 'vardescchar', '# ')
sData =
Sample Names:
A, B, ...,Z (26 total)
Variable Names and Meta Information:
VariableDescription
Gender ' Gender of the mouse in study'
Age ' The number of weeks since mouse birth'
Type ' Genetic characters'
Strain ' The mouse strain'
Source ' The tissue source for RNA collection'
The properties of MetaData class provide information about the size and dimension labels. There are 26 rows of samples and 5 columns of variables in the example sample data file.
sData.NSamples
ans =
26
sData.NVariables
ans =
5
The variable values and the variable descriptions for the samples are stored as two dataset arrays in a MetaData class. The MetaData class provides access methods to the variable values and the meta information describing the variables. Access the sample metadata using the variableValues method.
sData.variableValues
ans =
Gender Age Type Strain Source
A 'Male' 8 'Wild type' '129S6/SvEvTac' 'amygdala'
B 'Male' 8 'Wild type' '129S6/SvEvTac' 'amygdala'
C 'Male' 8 'Wild type' '129S6/SvEvTac' 'amygdala'
D 'Male' 8 'Wild type' 'A/J ' 'amygdala'
E 'Male' 8 'Wild type' 'A/J ' 'amygdala'
F 'Male' 8 'Wild type' 'C57BL/6J ' 'amygdala'
G 'Male' 8 'Wild type' 'C57BL/6J' 'amygdala'
H 'Male' 8 'Wild type' '129S6/SvEvTac' 'cingulate corte
x'
I 'Male' 8 'Wild type' '129S6/SvEvTac' 'cingulate corte
x'
J 'Male' 8 'Wild type' 'A/J' 'cingulate corte
x'
K 'Male' 8 'Wild type' 'A/J' 'cingulate corte
x'
L 'Male' 8 'Wild type' 'A/J' 'cingulate corte
x'
M 'Male' 8 'Wild type' 'C57BL/6J' 'cingulate corte
x'
N 'Male' 8 'Wild type' 'C57BL/6J' 'cingulate corte
x'
O 'Male' 8 'Wild type' '129S6/SvEvTac' 'hippocampus'
P 'Male' 8 'Wild type' '129S6/SvEvTac' 'hippocampus'
Q 'Male' 8 'Wild type' 'A/J' 'hippocampus'
R 'Male' 8 'Wild type' 'A/J' 'hippocampus'
S 'Male' 8 'Wild type' 'C57BL/6J' 'hippocampus'
T 'Male' 8 'Wild type' 'C57BL/6J4' 'hippocampus'
U 'Male' 8 'Wild type' '129S6/SvEvTac' 'hypothalamus'
V 'Male' 8 'Wild type' '129S6/SvEvTac' 'hypothalamus'
W 'Male' 8 'Wild type' 'A/J' 'hypothalamus'
X 'Male' 8 'Wild type' 'A/J' 'hypothalamus'
Y 'Male' 8 'Wild type' 'C57BL/6J' 'hypothalamus'
Z 'Male' 8 'Wild type' 'C57BL/6J' 'hypothalamus'
View a summary of the sample metadata and the variables it contains.
summary(sData.variableValues)
Gender: [26x1 cell string]
Age: [26x1 double]
min 1st Q median 3rd Q max
8 8 8 8 8
Type: [26x1 cell string]
Strain: [26x1 cell string]
Source: [26x1 cell string]
The sampleNames and variableNames methods are convenient ways to access the names of samples and variables. Retrieve the variable names of the sData object.
variableNames(sData)
ans =
'Gender' 'Age' 'Type' 'Strain' 'Source'
You can retrieve the meta information about the variables describing the samples using the variableDesc method. In this example, it contains only the descriptions about the variables.
variableDesc(sData)
ans =
VariableDescription
Gender ' Gender of the mouse in study'
Age ' The number of weeks since mouse birth'
Type ' Genetic characters'
Strain ' The mouse strain'
Source ' The tissue source for RNA collection'
You can subset the sample data sData object the same way as a dataset array.
sData(3:6, :)
ans =
Sample Names:
C, D, ...,F (4 total)
Variable Names and Meta Information:
VariableDescription
Gender ' Gender of the mouse in study'
Age ' The number of weeks since mouse birth'
Type ' Genetic characters'
Strain ' The mouse strain'
Source ' The tissue source for RNA collection'
To see the mouse strain of the 2nd and 14th samples.
sData.Strain([2 14])
ans =
'129S6/SvEvTac'
'C57BL/6J'
Note that the row names in sData and the column names in exprsData are the same. It is important feature of relationship between the expression data and the sample data in the same experiment.
all(ismember(sampleNames(sData), colnames(exprsData)))
ans =
1
Feature Annotation Metadata
The gene expression data in the example is obtained using an Affymetrix MG-U74Av2 array. The metadata about the features or probe set on an array can be very large and diverse, and important for the experiment. The chip makers usually provide a specific annotation file about the features of each type of array. The metadata can be stored as a MetaData object for a specific experiment. In this example, the annotation file for the MG-U74Av2 array can be downloaded from the Affymetrix web site. Download the file and read it into MATLAB as a dataset array.
mgU74Av2 = dataset('xlsfile', 'MG_U74Av2_annot.csv');
Warning: Variable names were modified to make them valid MATLAB identifiers.
Inspect the properties of this dataset array.
get(mgU74Av2)
Description: ''
VarDescription: {}
Units: {}
DimNames: {'Observations' 'Variables'}
UserData: []
ObsNames: {}
VarNames: {1x43 cell}
Retrieve the names of variables describing the features on the MG-U74Av2 array.
fDataVariables = get(mgU74Av2, 'VarNames');
View the first 20 variable names.
fDataVariables(1:20)'
ans =
'ProbeSetID'
'GeneChipArray'
'SpeciesScientificName'
'AnnotationDate'
'SequenceType'
'SequenceSource'
'TranscriptID0x28ArrayDesign0x29'
'TargetDescription'
'RepresentativePublicID'
'ArchivalUniGeneCluster'
'UniGeneID'
'GenomeVersion'
'Alignments'
'GeneTitle'
'GeneSymbol'
'ChromosomalLocation'
'UnigeneClusterType'
'Ensembl'
'EntrezGene'
'SwissProt'
Check and see the number of probe set IDs in the annotation file.
numel(mgU74Av2.ProbeSetID)
ans =
12488
Because the expression data in this example is only a small set of the full expression values, you will work with only the features in the exprsData DataMatrix object. Find the matching features in exprsData.
[fTF, fLoc] =ismember(rownames(exprsData), mgU74Av2.ProbeSetID);
In many cases, not all the information read from the array annotation file is useful, it is better to store only the annotation information applicable to the experiment. In this example, extract annotations GeneTitle, GeneSymbol, ChromosomalLocation, and Pathway for the features unique to the data in exprsData.
fIdx = ismember(fDataVariables, {'GeneTitle',...
'GeneSymbol',...
'ChromosomalLocation',...
'Pathway'});
features = mgU74Av2(fLoc, fDataVariables(fIdx));
features = set(features, 'ObsNames', mgU74Av2.ProbeSetID(fLoc));
View the annotation information about probeset 100709_at.
features('100709_at', :)
ans =
GeneTitle GeneSymbol Chrom
osomalLocation Pathway
100709_at 'trophoblast specific protein alpha' 'Tpbpa' 'chr1
3 B2|13 36.0 cM' '---'
You can store the feature annotation dataset array as an instance of the MetaData class.
fData = bioma.data.MetaData(features)
fData =
Sample Names:
100001_at, 100002_at, ...,100717_at (500 total)
Variable Names and Meta Information:
VariableDescription
GeneTitle 'NA'
GeneSymbol 'NA'
ChromosomalLocation 'NA'
Pathway 'NA'
Notice that there are not descriptions for the feature variables in the fData MetaData object. You can add descriptions about the variables in fData using the variableDesc method.
fData = variableDesc(fData, {'Gene title of a probe set',...
'Probe set gene symbol',...
'Probe set chromosomal locations',...
'The pathway the genes involved in'})
fData =
Sample Names:
100001_at, 100002_at, ...,100717_at (500 total)
Variable Names and Meta Information:
VariableDescription
GeneTitle 'Gene title of a probe set'
GeneSymbol 'Probe set gene symbol'
ChromosomalLocation 'Probe set chromosomal locations'
Pathway 'The pathway the genes involved in'
Experiment Information
The MIAME class is a flexible data container designed for a collection of basic descriptions about a microarray experiment, for instance, investigators or laboratory where the experiment was done, and description about the array designs. The MIAME class is designed to be light-weight and loosely follows the Minimum Information About a Microarray Experiment (MIAME) specification [2]. The information can be accessed through the 14 properties of the MIAME class.
Create a MIAME object by providing some basic information.
expDesc = bioma.data.MIAME('investigator', 'Jane OneName',... 'lab', 'Bioinformatics Laboratory',... 'title', 'Example Gene Expression Experime nt',... 'abstract', 'An example of using microarray o bjects.',... 'other', {'Notes: Created from a text files.'})
expDesc =
Experiment Description:
Author name: Jane OneName
Laboratory: Bioinformatics Laboratory
Contact information:
URL:
PubMedIDs:
Abstract: A 5 word abstract is available. Use the Abstract property.
No experiment design summary available.
Other notes:
'Notes: Created from a text files.'
Another way to create an MIAME object is from GEO series data. The MIAME class will populate the corresponding properties from the data structure returned by the getgeodata function. Create an MIAME object for the experiment information about the mouse gene profile experiment in the example. The dataset is available in the GEO database with a series accession number of GSE3327 [1]. Note: The GSE3327 dataset is quite large it takes some time to download.
geoSeries = getgeodata('GSE3327')
geoSeries =
Header: [1x1 struct]
Data: [12488x87 bioma.data.DataMatrix]
exptGSE3327 = bioma.data.MIAME(geoSeries)
exptGSE3327 =
Experiment Description:
Author name: Iiris,,Hovatta
David,J,Lockhart
Carrolee,,Barlow
Laboratory: The Salk Institute for Biological Studies
Contact information: Carrolee,,Barlow
URL: http://www.teragenomics.com
PubMedIDs: 16244648
Abstract: A 14 word abstract is available. Use the Abstract property.
Experiment Design: A 8 word summary is available. Use the ExptDesign prope
rty.
Other notes:
[1x80 char]
View the abstract of the experiment and its PubMed IDs.
exptGSE3327.Abstract
ans = Adult mouse gene expression patterns in common strains Keywords: mouse strain and brain region comparison
exptGSE3327.PubMedID
ans = 16244648
Assembling an ExpressionSet Object
The ExpressionSet class is designed specifically for microarray gene expression experiment data. Assemble an ExpressionSet object for the example mouse gene expression experiment from the different data objects you just created.
exptSet = bioma.ExpressionSet(exprsData, 'SData', sData,... 'FData', fData,... 'Einfo', exptGSE3327)
exptSet =
ExpressionSet
Experiment Data: 500 features, 26 samples
Element names: Expressions
Sample Data:
Sample names: A, B, ...,Z (26 total)
Sample variable names and meta information:
Gender: Gender of the mouse in study
Age: The number of weeks since mouse birth
Type: Genetic characters
Strain: The mouse strain
Source: The tissue source for RNA collection
Feature Data:
Feature names: 100001_at, 100002_at, ...,100717_at (500 total)
Feature variable names and meta information:
GeneTitle: Gene title of a probe set
GeneSymbol: Probe set gene symbol
ChromosomalLocation: Probe set chromosomal locations
Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'
You can also create an ExpressionSet object with only the expression values in a DataMatrix or a numeric matrix.
miniExprSet = bioma.ExpressionSet(exprsData)
miniExprSet = ExpressionSet Experiment Data: 500 features, 26 samples Element names: Expressions Sample Data: none Feature Data: none Experiment Information: none
Saving and Loading an ExpressionSet Object
The data objects for a microarray experiment can be saved as MAT files. Save the ExpressionSet object exptSet to a MAT file named mouseExpressionSet.mat.
save mouseExpressionSet exptSet
Clear all the variables from the MATLAB Workspace.
clear all
Load the MAT file mouseExpressionSet into the MATLAB Workspace.
load mouseExpressionSet
Inspect the loaded ExpressionSet object.
exptSet.elementNames
ans =
'Expressions'
exptSet.NSamples
ans =
26
exptSet.NFeatures
ans = 500
Accessing Data Components of an ExpressionSet Object
A number of methods are available to access and update data stored in an ExpressionSet object. In this example, you will explore some of the data access methods and basic operations of the ExpressionSet class.
You can also access the columns of the sample data using dot notation.
exptSet.Strain(1:5)
ans =
'129S6/SvEvTac'
'129S6/SvEvTac'
'129S6/SvEvTac'
'A/J '
'A/J '
Retrieve the feature names using the featureNames method. In this example, the feature names are the probe set identifiers on the array.
featureNames(exptSet, 1:5)
ans =
'100001_at'
'100002_at'
'100003_at'
'100004_at'
'100005_at'
The unique identifier of the samples can be accessed via the sampleNames method.
exptSet.sampleNames(1:5)
ans =
'A' 'B' 'C' 'D' 'E'
The sampleVarNames method lists the variable names in the sample data.
exptSet.sampleVarNames
ans =
'Gender' 'Age' 'Type' 'Strain' 'Source'
Extract the dataset array containing sample information.
sDataset = sampleVarValues(exptSet)
sDataset =
Gender Age Type Strain Source
A 'Male' 8 'Wild type' '129S6/SvEvTac' 'amygdala'
B 'Male' 8 'Wild type' '129S6/SvEvTac' 'amygdala'
C 'Male' 8 'Wild type' '129S6/SvEvTac' 'amygdala'
D 'Male' 8 'Wild type' 'A/J ' 'amygdala'
E 'Male' 8 'Wild type' 'A/J ' 'amygdala'
F 'Male' 8 'Wild type' 'C57BL/6J ' 'amygdala'
G 'Male' 8 'Wild type' 'C57BL/6J' 'amygdala'
H 'Male' 8 'Wild type' '129S6/SvEvTac' 'cingulate corte
x'
I 'Male' 8 'Wild type' '129S6/SvEvTac' 'cingulate corte
x'
J 'Male' 8 'Wild type' 'A/J' 'cingulate corte
x'
K 'Male' 8 'Wild type' 'A/J' 'cingulate corte
x'
L 'Male' 8 'Wild type' 'A/J' 'cingulate corte
x'
M 'Male' 8 'Wild type' 'C57BL/6J' 'cingulate corte
x'
N 'Male' 8 'Wild type' 'C57BL/6J' 'cingulate corte
x'
O 'Male' 8 'Wild type' '129S6/SvEvTac' 'hippocampus'
P 'Male' 8 'Wild type' '129S6/SvEvTac' 'hippocampus'
Q 'Male' 8 'Wild type' 'A/J' 'hippocampus'
R 'Male' 8 'Wild type' 'A/J' 'hippocampus'
S 'Male' 8 'Wild type' 'C57BL/6J' 'hippocampus'
T 'Male' 8 'Wild type' 'C57BL/6J4' 'hippocampus'
U 'Male' 8 'Wild type' '129S6/SvEvTac' 'hypothalamus'
V 'Male' 8 'Wild type' '129S6/SvEvTac' 'hypothalamus'
W 'Male' 8 'Wild type' 'A/J' 'hypothalamus'
X 'Male' 8 'Wild type' 'A/J' 'hypothalamus'
Y 'Male' 8 'Wild type' 'C57BL/6J' 'hypothalamus'
Z 'Male' 8 'Wild type' 'C57BL/6J' 'hypothalamus'
Retrieve the ExptData object containing expression values. There are can be more than one DataMatrix object with identical dimensions in an ExptData object. While in an ExpressionSet object, there is always a element DataMatrix object named Expressions containing the expression matrix.
exptDS = exptData(exptSet)
exptDS = Experiment Data: 500 features, 26 samples 1 elements Element names: Expressions
Extract only the expression DataMatrix instance.
dMatrix = expressions(exptSet);
The returned expression DataMatrix should be identical to the exprsData DataMatrix object that you created earlier.
get(dMatrix)
Name: 'mouseExprsData'
RowNames: {500x1 cell}
ColNames: {1x26 cell}
NRows: 500
NCols: 26
NDims: 2
ElementClass: 'double'
Get PubMed IDs for the experiment stored in exptSet.
exptSet.pubMedID
ans = 16244648
Subsetting an ExpressionSet Object
Subsetting is a very useful operation. Subsetting an ExpressionSet object is very similar to subsetting a DataMatrix object or a dataset array. The first indexing argument subsets the features and the second argument subsets the samples. For example, you can create a new ExpressionSet object consisting of the first five features and the samples named A, B, and C.
mySet = exptSet(1:5, {'A', 'B', 'C'})
mySet =
ExpressionSet
Experiment Data: 5 features, 3 samples
Element names: Expressions
Sample Data:
Sample names: A, B, C
Sample variable names and meta information:
Gender: Gender of the mouse in study
Age: The number of weeks since mouse birth
Type: Genetic characters
Strain: The mouse strain
Source: The tissue source for RNA collection
Feature Data:
Feature names: 100001_at, 100002_at, ...,100005_at (5 total)
Feature variable names and meta information:
GeneTitle: Gene title of a probe set
GeneSymbol: Probe set gene symbol
ChromosomalLocation: Probe set chromosomal locations
Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'
Inspect the subset mySet
size(mySet)
ans =
5 3
featureNames(mySet)
ans =
'100001_at'
'100002_at'
'100003_at'
'100004_at'
'100005_at'
sampleNames(mySet)
ans =
'A' 'B' 'C'
Another example is to create a subset consisting of only the samples from hippocampus tissues.
hippocampusSet = exptSet(:, nominal(exptSet.Source)== 'hippocampus')
hippocampusSet =
ExpressionSet
Experiment Data: 500 features, 6 samples
Element names: Expressions
Sample Data:
Sample names: O, P, ...,T (6 total)
Sample variable names and meta information:
Gender: Gender of the mouse in study
Age: The number of weeks since mouse birth
Type: Genetic characters
Strain: The mouse strain
Source: The tissue source for RNA collection
Feature Data:
Feature names: 100001_at, 100002_at, ...,100717_at (500 total)
Feature variable names and meta information:
GeneTitle: Gene title of a probe set
GeneSymbol: Probe set gene symbol
ChromosomalLocation: Probe set chromosomal locations
Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'
hippocampusSet.Source
ans =
'hippocampus'
'hippocampus'
'hippocampus'
'hippocampus'
'hippocampus'
'hippocampus'
hippocampusExprs = expressions(hippocampusSet);
get(hippocampusExprs)
Name: 'mouseExprsData'
RowNames: {500x1 cell}
ColNames: {'O' 'P' 'Q' 'R' 'S' 'T'}
NRows: 500
NCols: 6
NDims: 2
ElementClass: 'double'
You can find more details about the basic operations and available methods for the microarray experiment data objects in the help and reference pages.
References
[1] Hovatta, I., Tennant, R. S., Helton, R., et al. (2005). Glyoxalase 1 and glutathione reductase 1 regulate anxiety in mice. Nature, 438, 662-666.
Store