Bioinformatics Toolbox

Working with Objects for Microarray Experiment Data

This example shows how to create and manipulate MATLAB® containers designed for storing data from a microarray experiment.

Containers for Gene Expression Experiment Data

Microarray experimental data are very complex, usually consisting of data and information from a number of different sources. Storing and managing the large and complex data sets in a coherent manner has always being a challenge. Bioinformatics Toolbox™ provides a set of objects to represent the different pieces of data from a microarray experiment.

It is easier to manage all the data from a microarray experiment if the different pieces can be organized and stored into a single data structure. The ExpressionSet class is a single convenient data structure for storing and coordinating the different data objects from a microarray gene expression experiment.

An ExpressionSet object consists of these four components that are common to all microarray gene expression experiments:

Experiment data: Expression values from microarray experiments. These data are stored as an instance of the ExptData class.

Sample information: The metadata describing the samples in the experiment. The sample metadata are stored as an instance of the MetaData class.

Array feature annotations: The annotations about the features or probes on the array used in the experiment. The annotations can be stored as an instance of the MetaData class.

Experiment descriptions: Information to describe the experiment methods and conditions. The information can be stored as an instance of the MIAME class.

The ExpressionSet class coordinates and validates these data components. The class provides methods for retrieving and setting the data stored in an ExpressionSet object. An ExpressionSet object also behaves like many other MATLAB data structures that can be subsetted and copied.

Experiment Data

In a microarray gene expression experiment, the measured expression values for each feature per sample can be represented as a two-dimensional (2D) matrix. The matrix has F rows and S columns, where F is the number of features on the array, and S is the number of samples on which the expression values were measured. A DataMatrix object is designed to contain this type of data. It is a 2D matrix with row and column names. A DataMatrix object can be indexed not only by its row and column numbers, logical vectors, but also by its row and column names. But linear indexing is not supported.

For example, create a Datamatrix with row and column names:

dm = bioma.data.DataMatrix(rand(5,4), 'RowNames','Feature', 'ColNames', 'Sample')
dm = 

                Sample1    Sample2    Sample3    Sample4
    Feature1    0.75127    0.95929    0.84072    0.34998
    Feature2     0.2551    0.54722    0.25428     0.1966
    Feature3    0.50596    0.13862    0.81428    0.25108
    Feature4    0.69908    0.14929    0.24352    0.61604
    Feature5     0.8909    0.25751    0.92926    0.47329

The function size returns the number of rows and columns in a DataMatrix object.

size(dm)
ans =

     5     4

You can index into a DataMatrix object like other MATLAB numeric arrays by using row and column numbers. For example, access the elements at rows 1 and 2, column 3 of dm:

dm(1:2, 3)
ans = 

                Sample3
    Feature1    0.84072
    Feature2    0.25428

You can also index into a DataMatrix object by using its row and column names. Reassign the elements in row 2 and 3, column 1 and 4 to different values:

dm({'Feature2', 'Feature3'}, {'Sample1', 'Sample4'}) = [2, 3; 4, 5]
dm = 

                Sample1    Sample2    Sample3    Sample4
    Feature1    0.75127    0.95929    0.84072    0.34998
    Feature2          2    0.54722    0.25428          3
    Feature3          4    0.13862    0.81428          5
    Feature4    0.69908    0.14929    0.24352    0.61604
    Feature5     0.8909    0.25751    0.92926    0.47329

The example gene expression data used in this example is a small set of data from a microarray experiment profiling adult mouse gene expression patterns in common strains on the Affymetrix® MG-U74Av2 array [1]. The file mouseExprsData.txt contains the small set of expression values in a table format.

Read the expression values from the file mouseExprsData.txt into MATLAB Workspace as a DataMatrix object:

exprsData = bioma.data.DataMatrix('file', 'mouseExprsData.txt');
class(exprsData)
ans =

bioma.data.DataMatrix

Get the properties of the DataMatrix object, exprsData.

get(exprsData)
            Name: 'mouseExprsData'
        RowNames: {500x1 cell}
        ColNames: {1x26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

Check the sample names:

colnames(exprsData)
ans = 

  Columns 1 through 11

    'A'    'B'    'C'    'D'    'E'    'F'    'G'    'H'    'I'    'J'    'K'

  Columns 12 through 22

    'L'    'M'    'N'    'O'    'P'    'Q'    'R'    'S'    'T'    'U'    'V'

  Columns 23 through 26

    'W'    'X'    'Y'    'Z'

View the first 10 rows and 5 columns:

exprsData(1:10, 1:5)
ans = 

                   A         B         C         D         E     
    100001_at        2.26     20.14     31.66     14.58     16.04
    100002_at      158.86    236.25    206.27    388.71    388.09
    100003_at       68.11    105.45     82.92      82.9     60.38
    100004_at       74.32     96.68     84.87     72.26     98.38
    100005_at       75.05     53.17     57.94     60.06     63.91
    100006_at       80.36     42.89     77.21     77.24     40.31
    100007_at      216.64    191.32    219.48    237.28    298.18
    100009_r_at    3806.7      1425    2468.5    2172.7    2237.2
    100010_at         NaN       NaN       NaN      7.18     22.37
    100011_at       81.72     72.27    127.61     91.01     98.13

Many of the basic MATLAB array operations also work with a DataMatrix object. For example, you can log2 transform the expression values:

exprsData_log2 = log2(exprsData);

View the first 10 rows and 5 columns

exprsData_log2(1:10, 1:5)
ans = 

                   A         B         C         D         E     
    100001_at      1.1763     4.332    4.9846    3.8659    4.0036
    100002_at      7.3116    7.8842    7.6884    8.6026    8.6002
    100003_at      6.0898    6.7204    6.3736    6.3733     5.916
    100004_at      6.2157    6.5951    6.4072    6.1751    6.6203
    100005_at      6.2298    5.7325    5.8565    5.9083     5.998
    100006_at      6.3284    5.4226    6.2707    6.2713    5.3331
    100007_at      7.7592    7.5798    7.7779    7.8904      8.22
    100009_r_at    11.894    10.477    11.269    11.085    11.127
    100010_at         NaN       NaN       NaN     2.844    4.4835
    100011_at      6.3526    6.1753    6.9956     6.508    6.6166

Change the Name property to be descriptive about exprsData_log2:

exprsData_log2 = set(exprsData_log2, 'Name', 'Log2 Based mouseExprsData');
get(exprsData_log2)
            Name: 'Log2 Based mouseExprsData'
        RowNames: {500x1 cell}
        ColNames: {1x26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

In a microarray experiment, the data set often contains one or more matrices that have the same number of rows and columns and identical row names and column names, like in two-color microarray experiments. ExptData class is designed to contain and coordinate one or more data matrices with same dimensional properties, i.e. same dimension size, identical row names and column names. The data values are stored in an ExptData object as DataMatrix objects. Each DataMatrix object is considered an element in an ExptData object. The ExptData class is responsible for data validation and coordination between these DataMatrix objects. For the purposes of this example, you will store the gene expression data of natural scale and log2 base expression values separately in an instance of ExptData class.

mouseExptData = bioma.data.ExptData(exprsData, exprsData_log2,...
                    'ElementNames', {'natualExprs', 'log2Exprs'})
mouseExptData = 

Experiment Data:
  500 features,  26 samples
  2 elements
  Element names: natualExprs, log2Exprs

Access a DataMatrix element in mouseExptData using the element name.

exprsData2 = mouseExptData('log2Exprs');
get(exprsData2)
            Name: 'Log2 Based mouseExprsData'
        RowNames: {500x1 cell}
        ColNames: {1x26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

ExptData does not allow input matrices of different size or DataMatrix objecs with different row or column names. It would error in following case.

try
    mouseExptData = bioma.data.ExptData(exprsData, dm,...
        'ElementNames', {'naturalExprs', 'log2Exprs'})
catch ME
    disp(ME.message)
end
The input variables have mismatching size.

Sample Metadata

The metadata about the samples in a microarray experiment can be represented as a table with S rows and V columns, where S is the number of samples, and V is the number of variables. The contents of the table are the values for each sample per variable. For example, the file mouseSampleData.txt contains such a table. Alternately, this table of variable values can be stored in a dataset array.

Users often find that simple column names do not provide enough information about the variables. What is the name supposed to represent? What units are the variables measured in? Another table can contain such description metadata about variables. In this table of metadata, rows represent variables and at least one column contains a description of each variables. For example, the file mouseSampleData.txt contains descriptions about the sample variables (The lines are each prefaced with a # symbol. The metadata about the variables can also be stored in a dataset array.

The MetaData class is designed for storing and manipulating variable values and their metadata in a coordinated fashion. You can read the mouseSampleData.txt file into MATLAB as a MetaData object.

sData = bioma.data.MetaData('file', 'mouseSampleData.txt', 'vardescchar', '#')
sData = 

Sample Names:
    A, B, ...,Z (26 total)
Variable Names and Meta Information:
              VariableDescription                         
    Gender    ' Gender of the mouse in study'             
    Age       ' The number of weeks since mouse birth'    
    Type      ' Genetic characters'                       
    Strain    ' The mouse strain'                         
    Source    ' The tissue source for RNA collection'     

The properties of MetaData class provide information about the size and dimension labels. There are 26 rows of samples and 5 columns of variables in the example sample data file.

sData.NSamples
ans =

    26

sData.NVariables
ans =

     5

The variable values and the variable descriptions for the samples are stored as two dataset arrays in a MetaData class. The MetaData class provides access methods to the variable values and the meta information describing the variables. Access the sample metadata using the variableValues method.

sData.variableValues
ans = 

         Gender        Age    Type               Strain             
    A    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    B    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    C    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    D    'Male'        8      'Wild type'        'A/J '             
    E    'Male'        8      'Wild type'        'A/J '             
    F    'Male'        8      'Wild type'        'C57BL/6J '        
    G    'Male'        8      'Wild type'        'C57BL/6J'         
    H    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    I    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    J    'Male'        8      'Wild type'        'A/J'              
    K    'Male'        8      'Wild type'        'A/J'              
    L    'Male'        8      'Wild type'        'A/J'              
    M    'Male'        8      'Wild type'        'C57BL/6J'         
    N    'Male'        8      'Wild type'        'C57BL/6J'         
    O    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    P    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    Q    'Male'        8      'Wild type'        'A/J'              
    R    'Male'        8      'Wild type'        'A/J'              
    S    'Male'        8      'Wild type'        'C57BL/6J'         
    T    'Male'        8      'Wild type'        'C57BL/6J4'        
    U    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    V    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    W    'Male'        8      'Wild type'        'A/J'              
    X    'Male'        8      'Wild type'        'A/J'              
    Y    'Male'        8      'Wild type'        'C57BL/6J'         
    Z    'Male'        8      'Wild type'        'C57BL/6J'         


         Source                
    A    'amygdala'            
    B    'amygdala'            
    C    'amygdala'            
    D    'amygdala'            
    E    'amygdala'            
    F    'amygdala'            
    G    'amygdala'            
    H    'cingulate cortex'    
    I    'cingulate cortex'    
    J    'cingulate cortex'    
    K    'cingulate cortex'    
    L    'cingulate cortex'    
    M    'cingulate cortex'    
    N    'cingulate cortex'    
    O    'hippocampus'         
    P    'hippocampus'         
    Q    'hippocampus'         
    R    'hippocampus'         
    S    'hippocampus'         
    T    'hippocampus'         
    U    'hypothalamus'        
    V    'hypothalamus'        
    W    'hypothalamus'        
    X    'hypothalamus'        
    Y    'hypothalamus'        
    Z    'hypothalamus'        

View a summary of the sample metadata and the variables it contains.

summary(sData.variableValues)
Gender: [26x1 cell string]

Age: [26x1 double]

    min    1st quartile    median    3rd quartile    max
    8      8               8         8               8  

Type: [26x1 cell string]

Strain: [26x1 cell string]

Source: [26x1 cell string]

The sampleNames and variableNames methods are convenient ways to access the names of samples and variables. Retrieve the variable names of the sData object.

variableNames(sData)
ans = 

    'Gender'    'Age'    'Type'    'Strain'    'Source'

You can retrieve the meta information about the variables describing the samples using the variableDesc method. In this example, it contains only the descriptions about the variables.

variableDesc(sData)
ans = 

              VariableDescription                         
    Gender    ' Gender of the mouse in study'             
    Age       ' The number of weeks since mouse birth'    
    Type      ' Genetic characters'                       
    Strain    ' The mouse strain'                         
    Source    ' The tissue source for RNA collection'     

You can subset the sample data sData object the same way as a dataset array.

sData(3:6, :)
ans = 

Sample Names:
    C, D, ...,F (4 total)
Variable Names and Meta Information:
              VariableDescription                         
    Gender    ' Gender of the mouse in study'             
    Age       ' The number of weeks since mouse birth'    
    Type      ' Genetic characters'                       
    Strain    ' The mouse strain'                         
    Source    ' The tissue source for RNA collection'     

To see the mouse strain of the 2nd and 14th samples.

sData.Strain([2 14])
ans = 

    '129S6/SvEvTac'
    'C57BL/6J'

Note that the row names in sData and the column names in exprsData are the same. It is important feature of relationship between the expression data and the sample data in the same experiment.

all(ismember(sampleNames(sData), colnames(exprsData)))
ans =

     1

Feature Annotation Metadata

The gene expression data in the example is obtained using an Affymetrix MG-U74Av2 array. The metadata about the features or probe set on an array can be very large and diverse, and important for the experiment. The chip makers usually provide a specific annotation file about the features of each type of array. The metadata can be stored as a MetaData object for a specific experiment. In this example, the annotation file for the MG-U74Av2 array can be downloaded from the Affymetrix web site. Download the file and read it into MATLAB as a dataset array. Header lines in the annotation file have been manually removed. Alternatively you can use the Range option in the dataset constructor. MATLAB removes blank spaces in the variable names (columns) so they may be used as valid MATLAB identifiers; this is indicated in the command prompt by displaying a warning.

mgU74Av2 = dataset('xlsfile', 'MG_U74Av2_annot.csv');
Warning: Variable names were modified to make them valid MATLAB identifiers. 

Inspect the properties of this dataset array.

get(mgU74Av2)
       Description: ''
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {1x41 cell}

Check and see the number of probe set IDs in the annotation file.

numel(mgU74Av2.ProbeSetID)
ans =

       12488

Retrieve the names of variables describing the features on the MG-U74Av2 array and view the first 20 variable names.

fDataVariables = get(mgU74Av2, 'VarNames');
fDataVariables(1:20)'
ans = 

    'ProbeSetID'
    'GeneChipArray'
    'SpeciesScientificName'
    'AnnotationDate'
    'SequenceType'
    'SequenceSource'
    'TranscriptID0x28ArrayDesign0x29'
    'TargetDescription'
    'RepresentativePublicID'
    'ArchivalUniGeneCluster'
    'UniGeneID'
    'GenomeVersion'
    'Alignments'
    'GeneTitle'
    'GeneSymbol'
    'ChromosomalLocation'
    'UnigeneClusterType'
    'Ensembl'
    'EntrezGene'
    'SwissProt'

Set the ObsNames property to the probe set IDs, this allows you to access individual gene annotations by indexing the dataset with probe set IDS.

mgU74Av2 = set(mgU74Av2,'ObsNames',mgU74Av2.ProbeSetID);
mgU74Av2('100709_at',{'GeneSymbol','ChromosomalLocation'})
ans = 

                 GeneSymbol     ChromosomalLocation      
    100709_at    'Tpbpa'        'chr13 B2|13 36.0 cM'    

In many cases, not all the information read from the array annotation file is useful, it is better to store only the annotation information applicable to the experiment. In this example, extract annotations GeneTitle, GeneSymbol, ChromosomalLocation, and Pathway for the features unique to the data in exprsData.

mgU74Av2 = mgU74Av2(:,{'GeneTitle',...
                       'GeneSymbol',...
                       'ChromosomalLocation',...
                       'Pathway'});

Because the expression data in this example is only a small set of the full expression values, you will work with only the features in the exprsData DataMatrix object. Find the matching features in exprsData.

mgU74Av2 = mgU74Av2(rownames(exprsData),:);
get(mgU74Av2)
       Description: ''
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {500x1 cell}
          VarNames: {1x4 cell}

You can store the feature annotation dataset array as an instance of the MetaData class.

fData = bioma.data.MetaData(mgU74Av2)
fData = 

Sample Names:
    100001_at, 100002_at, ...,100717_at (500 total)
Variable Names and Meta Information:
                           VariableDescription
    GeneTitle              'NA'               
    GeneSymbol             'NA'               
    ChromosomalLocation    'NA'               
    Pathway                'NA'               

Notice that there are not descriptions for the feature variables in the fData MetaData object. You can add descriptions about the variables in fData using the variableDesc method.

fData = variableDesc(fData, {'Gene title of a probe set',...
                             'Probe set gene symbol',...
                             'Probe set chromosomal locations',...
                             'The pathway the genes involved in'})
fData = 

Sample Names:
    100001_at, 100002_at, ...,100717_at (500 total)
Variable Names and Meta Information:
                           VariableDescription                    
    GeneTitle              'Gene title of a probe set'            
    GeneSymbol             'Probe set gene symbol'                
    ChromosomalLocation    'Probe set chromosomal locations'      
    Pathway                'The pathway the genes involved in'    

Experiment Information

The MIAME class is a flexible data container designed for a collection of basic descriptions about a microarray experiment, for instance, investigators or laboratory where the experiment was done, and description about the array designs. The MIAME class is designed to be light-weight and loosely follows the Minimum Information About a Microarray Experiment (MIAME) specification [2]. The information can be accessed through the 14 properties of the MIAME class.

Create a MIAME object by providing some basic information.

expDesc = bioma.data.MIAME('investigator', 'Jane OneName',...
                           'lab',          'Bioinformatics Laboratory',...
                           'title',        'Example Gene Expression Experiment',...
                           'abstract',     'An example of using microarray objects.',...
                           'other',        {'Notes: Created from a text files.'})
expDesc = 

Experiment Description:
  Author name: Jane OneName
  Laboratory: Bioinformatics Laboratory
  Contact information: 
  URL: 
  PubMedIDs: 
  Abstract: A 5 word abstract is available. Use the Abstract property.
  No experiment design summary available.
  Other notes: 
    'Notes: Created from a text files.'

Another way to create an MIAME object is from GEO series data. The MIAME class will populate the corresponding properties from the data structure returned by the getgeodata function. Create an MIAME object for the experiment information about the mouse gene profile experiment in the example. The dataset is available in the GEO database with a series accession number of GSE3327 [1]. Note: The GSE3327 dataset is quite large it takes some time to download.

geoSeries = getgeodata('GSE3327')
geoSeries = 

    Header: [1x1 struct]
      Data: [12488x87 bioma.data.DataMatrix]

exptGSE3327 = bioma.data.MIAME(geoSeries)
exptGSE3327 = 

Experiment Description:
  Author name: Iiris,,Hovatta
David,J,Lockhart
Carrolee,,Barlow
  Laboratory: The Salk Institute for Biological Studies
  Contact information: Carrolee,,Barlow
  URL: http://www.teragenomics.com
  PubMedIDs: 16244648
  Abstract: A 14 word abstract is available. Use the Abstract property.
  Experiment Design: A 8 word summary is available. Use the ExptDesign property.
  Other notes: 
    [1x80 char]

View the abstract of the experiment and its PubMed IDs.

exptGSE3327.Abstract
ans =

Adult mouse gene expression patterns in common strains
Keywords: mouse strain and brain region comparison

exptGSE3327.PubMedID
ans =

16244648

Assembling an ExpressionSet Object

The ExpressionSet class is designed specifically for microarray gene expression experiment data. Assemble an ExpressionSet object for the example mouse gene expression experiment from the different data objects you just created.

exptSet = bioma.ExpressionSet(exprsData, 'SData', sData,...
                                         'FData', fData,...
                                         'Einfo', exptGSE3327)
exptSet = 

ExpressionSet
Experiment Data: 500 features, 26 samples
  Element names: Expressions
Sample Data:
    Sample names:     A, B, ...,Z (26 total)
    Sample variable names and meta information: 
        Gender:  Gender of the mouse in study
        Age:  The number of weeks since mouse birth
        Type:  Genetic characters
        Strain:  The mouse strain
        Source:  The tissue source for RNA collection
Feature Data:
    Feature names:     100001_at, 100002_at, ...,100717_at (500 total)
    Feature variable names and meta information: 
        GeneTitle: Gene title of a probe set
        GeneSymbol: Probe set gene symbol
        ChromosomalLocation: Probe set chromosomal locations
        Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'

You can also create an ExpressionSet object with only the expression values in a DataMatrix or a numeric matrix.

miniExprSet = bioma.ExpressionSet(exprsData)
miniExprSet = 

ExpressionSet
Experiment Data: 500 features, 26 samples
  Element names: Expressions
Sample Data: none
Feature Data: none
Experiment Information: none

Saving and Loading an ExpressionSet Object

The data objects for a microarray experiment can be saved as MAT files. Save the ExpressionSet object exptSet to a MAT file named mouseExpressionSet.mat.

save mouseExpressionSet exptSet

Clear all the variables from the MATLAB Workspace.

clear all

Load the MAT file mouseExpressionSet into the MATLAB Workspace.

load mouseExpressionSet

Inspect the loaded ExpressionSet object.

exptSet.elementNames
ans = 

    'Expressions'

exptSet.NSamples
ans =

    26

exptSet.NFeatures
ans =

   500

Accessing Data Components of an ExpressionSet Object

A number of methods are available to access and update data stored in an ExpressionSet object. In this example, you will explore some of the data access methods and basic operations of the ExpressionSet class.

You can also access the columns of the sample data using dot notation.

exptSet.Strain(1:5)
ans = 

    '129S6/SvEvTac'
    '129S6/SvEvTac'
    '129S6/SvEvTac'
    'A/J '
    'A/J '

Retrieve the feature names using the featureNames method. In this example, the feature names are the probe set identifiers on the array.

featureNames(exptSet, 1:5)
ans = 

    '100001_at'
    '100002_at'
    '100003_at'
    '100004_at'
    '100005_at'

The unique identifier of the samples can be accessed via the sampleNames method.

exptSet.sampleNames(1:5)
ans = 

    'A'    'B'    'C'    'D'    'E'

The sampleVarNames method lists the variable names in the sample data.

exptSet.sampleVarNames
ans = 

    'Gender'    'Age'    'Type'    'Strain'    'Source'

Extract the dataset array containing sample information.

sDataset = sampleVarValues(exptSet)
sDataset = 

         Gender        Age    Type               Strain             
    A    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    B    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    C    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    D    'Male'        8      'Wild type'        'A/J '             
    E    'Male'        8      'Wild type'        'A/J '             
    F    'Male'        8      'Wild type'        'C57BL/6J '        
    G    'Male'        8      'Wild type'        'C57BL/6J'         
    H    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    I    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    J    'Male'        8      'Wild type'        'A/J'              
    K    'Male'        8      'Wild type'        'A/J'              
    L    'Male'        8      'Wild type'        'A/J'              
    M    'Male'        8      'Wild type'        'C57BL/6J'         
    N    'Male'        8      'Wild type'        'C57BL/6J'         
    O    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    P    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    Q    'Male'        8      'Wild type'        'A/J'              
    R    'Male'        8      'Wild type'        'A/J'              
    S    'Male'        8      'Wild type'        'C57BL/6J'         
    T    'Male'        8      'Wild type'        'C57BL/6J4'        
    U    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    V    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    W    'Male'        8      'Wild type'        'A/J'              
    X    'Male'        8      'Wild type'        'A/J'              
    Y    'Male'        8      'Wild type'        'C57BL/6J'         
    Z    'Male'        8      'Wild type'        'C57BL/6J'         


         Source                
    A    'amygdala'            
    B    'amygdala'            
    C    'amygdala'            
    D    'amygdala'            
    E    'amygdala'            
    F    'amygdala'            
    G    'amygdala'            
    H    'cingulate cortex'    
    I    'cingulate cortex'    
    J    'cingulate cortex'    
    K    'cingulate cortex'    
    L    'cingulate cortex'    
    M    'cingulate cortex'    
    N    'cingulate cortex'    
    O    'hippocampus'         
    P    'hippocampus'         
    Q    'hippocampus'         
    R    'hippocampus'         
    S    'hippocampus'         
    T    'hippocampus'         
    U    'hypothalamus'        
    V    'hypothalamus'        
    W    'hypothalamus'        
    X    'hypothalamus'        
    Y    'hypothalamus'        
    Z    'hypothalamus'        

Retrieve the ExptData object containing expression values. There are can be more than one DataMatrix object with identical dimensions in an ExptData object. While in an ExpressionSet object, there is always a element DataMatrix object named Expressions containing the expression matrix.

exptDS = exptData(exptSet)
exptDS = 

Experiment Data:
  500 features,  26 samples
  1 elements
  Element names: Expressions

Extract only the expression DataMatrix instance.

dMatrix = expressions(exptSet);

The returned expression DataMatrix should be identical to the exprsData DataMatrix object that you created earlier.

get(dMatrix)
            Name: 'mouseExprsData'
        RowNames: {500x1 cell}
        ColNames: {1x26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

Get PubMed IDs for the experiment stored in exptSet.

exptSet.pubMedID
ans =

16244648

Subsetting an ExpressionSet Object

Subsetting is a very useful operation. Subsetting an ExpressionSet object is very similar to subsetting a DataMatrix object or a dataset array. The first indexing argument subsets the features and the second argument subsets the samples. For example, you can create a new ExpressionSet object consisting of the first five features and the samples named A, B, and C.

mySet = exptSet(1:5, {'A', 'B', 'C'})
mySet = 

ExpressionSet
Experiment Data: 5 features, 3 samples
  Element names: Expressions
Sample Data:
    Sample names:     A, B, C
    Sample variable names and meta information: 
        Gender:  Gender of the mouse in study
        Age:  The number of weeks since mouse birth
        Type:  Genetic characters
        Strain:  The mouse strain
        Source:  The tissue source for RNA collection
Feature Data:
    Feature names:     100001_at, 100002_at, ...,100005_at (5 total)
    Feature variable names and meta information: 
        GeneTitle: Gene title of a probe set
        GeneSymbol: Probe set gene symbol
        ChromosomalLocation: Probe set chromosomal locations
        Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'

Inspect the subset mySet

size(mySet)
ans =

     5     3

featureNames(mySet)
ans = 

    '100001_at'
    '100002_at'
    '100003_at'
    '100004_at'
    '100005_at'

sampleNames(mySet)
ans = 

    'A'    'B'    'C'

Another example is to create a subset consisting of only the samples from hippocampus tissues.

hippocampusSet = exptSet(:, nominal(exptSet.Source)== 'hippocampus')
hippocampusSet = 

ExpressionSet
Experiment Data: 500 features, 6 samples
  Element names: Expressions
Sample Data:
    Sample names:     O, P, ...,T (6 total)
    Sample variable names and meta information: 
        Gender:  Gender of the mouse in study
        Age:  The number of weeks since mouse birth
        Type:  Genetic characters
        Strain:  The mouse strain
        Source:  The tissue source for RNA collection
Feature Data:
    Feature names:     100001_at, 100002_at, ...,100717_at (500 total)
    Feature variable names and meta information: 
        GeneTitle: Gene title of a probe set
        GeneSymbol: Probe set gene symbol
        ChromosomalLocation: Probe set chromosomal locations
        Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'
hippocampusSet.Source
ans = 

    'hippocampus'
    'hippocampus'
    'hippocampus'
    'hippocampus'
    'hippocampus'
    'hippocampus'

hippocampusExprs = expressions(hippocampusSet);
get(hippocampusExprs)
            Name: 'mouseExprsData'
        RowNames: {500x1 cell}
        ColNames: {'O'  'P'  'Q'  'R'  'S'  'T'}
           NRows: 500
           NCols: 6
           NDims: 2
    ElementClass: 'double'

You can find more details about the basic operations and available methods for the microarray experiment data objects in the help and reference pages.

References

[1] Hovatta, I., et al., "Glyoxalase 1 and glutathione reductase 1 regulate anxiety in mice", Nature, 438(7068):662-6, 2005.

[2] http://www.mged.org/Workgroups/MIAME/miame_1.1.html.