Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Working with Objects for Microarray Experiment Data

This example shows how to create and manipulate MATLAB® containers designed for storing data from a microarray experiment.

Containers for Gene Expression Experiment Data

Microarray experimental data are very complex, usually consisting of data and information from a number of different sources. Storing and managing the large and complex data sets in a coherent manner is a challenge. Bioinformatics Toolbox™ provides a set of objects to represent the different pieces of data from a microarray experiment.

The ExpressionSet class is a single, convenient data structure for storing and managing different types of data from a microarray gene expression experiment.

An ExpressionSet object consists of these four components that are common to all microarray gene expression experiments:

Experiment data: Expression values from microarray experiments. These data are stored as an instance of the ExptData class.

Sample information: The metadata describing the samples in the experiment. The sample metadata are stored as an instance of the MetaData class.

Array feature annotations: The annotations about the features or probes on the array used in the experiment. The annotations can be stored as an instance of the MetaData class.

Experiment descriptions: Information to describe the experiment methods and conditions. The information can be stored as an instance of the MIAME class.

The ExpressionSet class coordinates and validates these data components. The class provides methods for retrieving and setting the data stored in an ExpressionSet object. An ExpressionSet object also behaves like many other MATLAB data structures that can be subsetted and copied.

Experiment Data

In a microarray gene expression experiment, the measured expression values for each feature per sample can be represented as a two-dimensional matrix. The matrix has F rows and S columns, where F is the number of features on the array, and S is the number of samples on which the expression values were measured. A DataMatrix object is a two-dimensional matrix that you can index by row and column numbers, logical vectors, or row and column names.

Create a Datamatrix with row and column names.

dm = bioma.data.DataMatrix(rand(5,4), 'RowNames','Feature', 'ColNames', 'Sample')
dm = 

                Sample1    Sample2    Sample3    Sample4
    Feature1    0.81472    0.09754    0.15761    0.14189
    Feature2    0.90579     0.2785    0.97059    0.42176
    Feature3    0.12699    0.54688    0.95717    0.91574
    Feature4    0.91338    0.95751    0.48538    0.79221
    Feature5    0.63236    0.96489    0.80028    0.95949

The function size returns the number of rows and columns in a DataMatrix object.

size(dm)
ans =

     5     4

You can index into a DataMatrix object like other MATLAB numeric arrays by using row and column numbers. For example, you can access the elements at rows 1 and 2, column 3.

dm(1:2, 3)
ans = 

                Sample3
    Feature1    0.15761
    Feature2    0.97059

You can also index into a DataMatrix object by using its row and column names. Reassign the elements in row 2 and 3, column 1 and 4 to different values.

dm({'Feature2', 'Feature3'}, {'Sample1', 'Sample4'}) = [2, 3; 4, 5]
dm = 

                Sample1    Sample2    Sample3    Sample4
    Feature1    0.81472    0.09754    0.15761    0.14189
    Feature2          2     0.2785    0.97059          3
    Feature3          4    0.54688    0.95717          5
    Feature4    0.91338    0.95751    0.48538    0.79221
    Feature5    0.63236    0.96489    0.80028    0.95949

The gene expression data used in this example is a small set of data from a microarray experiment profiling adult mouse gene expression patterns in common strains on the Affymetrix® MG-U74Av2 array [1].

Read the expression values from the tab-formatted file mouseExprsData.txt into MATLAB Workspace as a DataMatrix object.

exprsData = bioma.data.DataMatrix('file', 'mouseExprsData.txt');
class(exprsData)
ans =

    'bioma.data.DataMatrix'

Get the properties of the DataMatrix object, exprsData.

get(exprsData)
            Name: 'mouseExprsData'
        RowNames: {500×1 cell}
        ColNames: {1×26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

Check the sample names.

colnames(exprsData)
ans =

  1×26 cell array

  Columns 1 through 11

    'A'    'B'    'C'    'D'    'E'    'F'    'G'    'H'    'I'    'J'    'K'

  Columns 12 through 22

    'L'    'M'    'N'    'O'    'P'    'Q'    'R'    'S'    'T'    'U'    'V'

  Columns 23 through 26

    'W'    'X'    'Y'    'Z'

View the first 10 rows and 5 columns.

exprsData(1:10, 1:5)
ans = 

                   A         B         C         D         E     
    100001_at        2.26     20.14     31.66     14.58     16.04
    100002_at      158.86    236.25    206.27    388.71    388.09
    100003_at       68.11    105.45     82.92      82.9     60.38
    100004_at       74.32     96.68     84.87     72.26     98.38
    100005_at       75.05     53.17     57.94     60.06     63.91
    100006_at       80.36     42.89     77.21     77.24     40.31
    100007_at      216.64    191.32    219.48    237.28    298.18
    100009_r_at    3806.7      1425    2468.5    2172.7    2237.2
    100010_at         NaN       NaN       NaN      7.18     22.37
    100011_at       81.72     72.27    127.61     91.01     98.13

Perform a log2 transformation of the expression values.

exprsData_log2 = log2(exprsData);
exprsData_log2(1:10, 1:5)
ans = 

                   A         B         C         D         E     
    100001_at      1.1763     4.332    4.9846    3.8659    4.0036
    100002_at      7.3116    7.8842    7.6884    8.6026    8.6002
    100003_at      6.0898    6.7204    6.3736    6.3733     5.916
    100004_at      6.2157    6.5951    6.4072    6.1751    6.6203
    100005_at      6.2298    5.7325    5.8565    5.9083     5.998
    100006_at      6.3284    5.4226    6.2707    6.2713    5.3331
    100007_at      7.7592    7.5798    7.7779    7.8904      8.22
    100009_r_at    11.894    10.477    11.269    11.085    11.127
    100010_at         NaN       NaN       NaN     2.844    4.4835
    100011_at      6.3526    6.1753    6.9956     6.508    6.6166

Change the Name property to be more descriptive|.

exprsData_log2 = set(exprsData_log2, 'Name', 'Log2 Based mouseExprsData');
get(exprsData_log2)
            Name: 'Log2 Based mouseExprsData'
        RowNames: {500×1 cell}
        ColNames: {1×26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

In a microarray experiment, the data set often contains one or more matrices that have the same number of rows and columns and identical row names and column names. ExptData class is designed to contain and coordinate one or more data matrices having identical row and column names with the same dimension size. The data values are stored as DataMatrix objects. Each DataMatrix object is an element of an ExptData object. The ExptData class is responsible for data validation and coordination between these DataMatrix objects.

Store the gene expression data of natural scale and log2 base expression values separately in an ExptData object.

mouseExptData = bioma.data.ExptData(exprsData, exprsData_log2,...
                    'ElementNames', {'naturalExprs', 'log2Exprs'})
mouseExptData = 

Experiment Data:
  500 features,  26 samples
  2 elements
  Element names: naturalExprs, log2Exprs

Access a DataMatrix element in mouseExptData using the element name.

exprsData2 = mouseExptData('log2Exprs');
get(exprsData2)
            Name: 'Log2 Based mouseExprsData'
        RowNames: {500×1 cell}
        ColNames: {1×26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

Sample Metadata

The metadata about the samples in a microarray experiment can be represented as a table with S rows and V columns, where S is the number of samples, and V is the number of variables. The contents of the table are the values of each variable for each sample. For example, the file mouseSampleData.txt contains such a table. The description of each sample variable is marked by a # symbol.

The MetaData class is designed for storing and manipulating variable values and their metadata in a coordinated fashion. You can read the mouseSampleData.txt file into MATLAB as a MetaData object.

sData = bioma.data.MetaData('file', 'mouseSampleData.txt', 'vardescchar', '#')
sData = 

Sample Names:
    A, B, ...,Z (26 total)
Variable Names and Meta Information:
              VariableDescription                         
    Gender    ' Gender of the mouse in study'             
    Age       ' The number of weeks since mouse birth'    
    Type      ' Genetic characters'                       
    Strain    ' The mouse strain'                         
    Source    ' The tissue source for RNA collection'     

The properties of MetaData class provide information about the samples and variables.

numSamples = sData.NSamples
numVariables = sData.NVariables
numSamples =

    26


numVariables =

     5

The variable values and the variable descriptions for the samples are stored as two dataset arrays in a MetaData class. The MetaData class provides access methods to the variable values and the meta information describing the variables.

Access the sample metadata using the variableValues method.

sData.variableValues
ans = 

         Gender        Age    Type               Strain             
    A    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    B    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    C    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    D    'Male'        8      'Wild type'        'A/J '             
    E    'Male'        8      'Wild type'        'A/J '             
    F    'Male'        8      'Wild type'        'C57BL/6J '        
    G    'Male'        8      'Wild type'        'C57BL/6J'         
    H    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    I    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    J    'Male'        8      'Wild type'        'A/J'              
    K    'Male'        8      'Wild type'        'A/J'              
    L    'Male'        8      'Wild type'        'A/J'              
    M    'Male'        8      'Wild type'        'C57BL/6J'         
    N    'Male'        8      'Wild type'        'C57BL/6J'         
    O    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    P    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    Q    'Male'        8      'Wild type'        'A/J'              
    R    'Male'        8      'Wild type'        'A/J'              
    S    'Male'        8      'Wild type'        'C57BL/6J'         
    T    'Male'        8      'Wild type'        'C57BL/6J4'        
    U    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    V    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    W    'Male'        8      'Wild type'        'A/J'              
    X    'Male'        8      'Wild type'        'A/J'              
    Y    'Male'        8      'Wild type'        'C57BL/6J'         
    Z    'Male'        8      'Wild type'        'C57BL/6J'         


         Source                
    A    'amygdala'            
    B    'amygdala'            
    C    'amygdala'            
    D    'amygdala'            
    E    'amygdala'            
    F    'amygdala'            
    G    'amygdala'            
    H    'cingulate cortex'    
    I    'cingulate cortex'    
    J    'cingulate cortex'    
    K    'cingulate cortex'    
    L    'cingulate cortex'    
    M    'cingulate cortex'    
    N    'cingulate cortex'    
    O    'hippocampus'         
    P    'hippocampus'         
    Q    'hippocampus'         
    R    'hippocampus'         
    S    'hippocampus'         
    T    'hippocampus'         
    U    'hypothalamus'        
    V    'hypothalamus'        
    W    'hypothalamus'        
    X    'hypothalamus'        
    Y    'hypothalamus'        
    Z    'hypothalamus'        

View a summary of the sample metadata.

summary(sData.variableValues)
Gender: [26x1 cell array of character vectors]

Age: [26x1 double]

    min    1st quartile    median    3rd quartile    max
    8      8               8         8               8  

Type: [26x1 cell array of character vectors]

Strain: [26x1 cell array of character vectors]

Source: [26x1 cell array of character vectors]

The sampleNames and variableNames methods are convenient ways to access the names of samples and variables. Retrieve the variable names of the sData object.

variableNames(sData)
ans =

  1×5 cell array

    'Gender'    'Age'    'Type'    'Strain'    'Source'

You can retrieve the meta information about the variables describing the samples using the variableDesc method. In this example, it contains only the descriptions about the variables.

variableDesc(sData)
ans = 

              VariableDescription                         
    Gender    ' Gender of the mouse in study'             
    Age       ' The number of weeks since mouse birth'    
    Type      ' Genetic characters'                       
    Strain    ' The mouse strain'                         
    Source    ' The tissue source for RNA collection'     

You can subset the sample data sData object using numerical indexing.

sData(3:6, :)
ans = 

Sample Names:
    C, D, ...,F (4 total)
Variable Names and Meta Information:
              VariableDescription                         
    Gender    ' Gender of the mouse in study'             
    Age       ' The number of weeks since mouse birth'    
    Type      ' Genetic characters'                       
    Strain    ' The mouse strain'                         
    Source    ' The tissue source for RNA collection'     

You can display the mouse strain of specific samples by using numerical indexing.

sData.Strain([2 14])
ans =

  2×1 cell array

    '129S6/SvEvTac'
    'C57BL/6J'

Note that the row names in sData and the column names in exprsData are the same. It is an important relationship between the expression data and the sample data in the same experiment.

all(ismember(sampleNames(sData), colnames(exprsData)))
ans =

  logical

   1

Feature Annotation Metadata

The metadata about the features or probe set on an array can be very large and diverse. The chip manufacturers usually provide a specific annotation file for the features of each type of array. The metadata can be stored as a MetaData object for a specific experiment. In this example, the annotation file for the MG-U74Av2 array can be downloaded from the Affymetrix web site. You will need to convert the file from CSV to XLSX format using a spreadsheet software application.

Read the entire file into MATLAB as a dataset array. Alternatively, you can use the Range option in the dataset constructor. Any blank spaces in the variable names are removed to make them valid MATLAB variable names. A warning is displayed each time this happens.

mgU74Av2 = dataset('xlsfile', 'MG_U74Av2_annot.xlsx');
Warning: Variable names were modified to make them valid MATLAB identifiers. 

Inspect the properties of this dataset array.

get(mgU74Av2)
       Description: ''
    VarDescription: {1×43 cell}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {1×43 cell}

Determine the number of probe set IDs in the annotation file.

numel(mgU74Av2.ProbeSetID)
ans =

       12488

Retrieve the names of variables describing the features on the array and view the first 20 variable names.

fDataVariables = get(mgU74Av2, 'VarNames');
fDataVariables(1:20)'
ans =

  20×1 cell array

    'ProbeSetID'
    'GeneChipArray'
    'SpeciesScientificName'
    'AnnotationDate'
    'SequenceType'
    'SequenceSource'
    'TranscriptID_ArrayDesign_'
    'TargetDescription'
    'RepresentativePublicID'
    'ArchivalUniGeneCluster'
    'UniGeneID'
    'GenomeVersion'
    'Alignments'
    'GeneTitle'
    'GeneSymbol'
    'ChromosomalLocation'
    'UnigeneClusterType'
    'Ensembl'
    'EntrezGene'
    'SwissProt'

Set the ObsNames property to the probe set IDs, so that you can access individual gene annotations by indexing with probe set IDs.

mgU74Av2 = set(mgU74Av2,'ObsNames',mgU74Av2.ProbeSetID);
mgU74Av2('100709_at',{'GeneSymbol','ChromosomalLocation'})
ans = 

                 GeneSymbol     ChromosomalLocation      
    100709_at    'Tpbpa'        'chr13 B2|13 36.0 cM'    

In some cases, it is useful to extract specific annotations that are relevant to the analysis. Extract annotations for GeneTitle, GeneSymbol, ChromosomalLocation, and Pathway relative to the features in exprsData.

mgU74Av2 = mgU74Av2(:,{'GeneTitle',...
                       'GeneSymbol',...
                       'ChromosomalLocation',...
                       'Pathway'});

mgU74Av2 = mgU74Av2(rownames(exprsData),:);
get(mgU74Av2)
       Description: ''
    VarDescription: {1×4 cell}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {500×1 cell}
          VarNames: {1×4 cell}

You can store the feature annotation dataset array as an instance of the MetaData class.

fData = bioma.data.MetaData(mgU74Av2)
fData = 

Sample Names:
    100001_at, 100002_at, ...,100717_at (500 total)
Variable Names and Meta Information:
                           VariableDescription
    GeneTitle              'NA'               
    GeneSymbol             'NA'               
    ChromosomalLocation    'NA'               
    Pathway                'NA'               

Notice that there are no descriptions for the feature variables in the fData MetaData object. You can add descriptions about the variables in fData using the variableDesc method.

fData = variableDesc(fData, {'Gene title of a probe set',...
                             'Probe set gene symbol',...
                             'Probe set chromosomal locations',...
                             'The pathway the genes involved in'})
fData = 

Sample Names:
    100001_at, 100002_at, ...,100717_at (500 total)
Variable Names and Meta Information:
                           VariableDescription                    
    GeneTitle              'Gene title of a probe set'            
    GeneSymbol             'Probe set gene symbol'                
    ChromosomalLocation    'Probe set chromosomal locations'      
    Pathway                'The pathway the genes involved in'    

Experiment Information

The MIAME class is a flexible data container designed for a collection of basic descriptions about a microarray experiment, such as investigators, laboratories, and array designs. The MIAME class loosely follows the Minimum Information About a Microarray Experiment (MIAME) specification [2].

Create a MIAME object by providing some basic information.

expDesc = bioma.data.MIAME('investigator', 'Jane OneName',...
                           'lab',          'Bioinformatics Laboratory',...
                           'title',        'Example Gene Expression Experiment',...
                           'abstract',     'An example of using microarray objects.',...
                           'other',        {'Notes: Created from a text files.'})
expDesc = 

Experiment Description:
  Author name: Jane OneName
  Laboratory: Bioinformatics Laboratory
  Contact information: 
  URL: 
  PubMedIDs: 
  Abstract: A 5 word abstract is available. Use the Abstract property.
  No experiment design summary available.
  Other notes: 
    'Notes: Created from a text files.'

Another way to create a MIAME object is from GEO series data. The MIAME class will populate the corresponding properties from the GEO series structure. The information associated with the gene profile experiment in this example is available from the GEO database under the accession number GSE3327 [1]. Retrieve the GEO Series data using the getgeodata function.

getgeodata('GSE3327', 'ToFile', 'GSE3327.txt');

Read the data into a structure.

geoSeries = geoseriesread('GSE3327.txt')
geoSeries = 

  struct with fields:

    Header: [1×1 struct]
      Data: [12488×87 bioma.data.DataMatrix]

Create a MIAME object.

exptGSE3327 = bioma.data.MIAME(geoSeries)
exptGSE3327 = 

Experiment Description:
  Author name: Iiris,,Hovatta
David,J,Lockhart
Carrolee,,Barlow
  Laboratory: The Salk Institute for Biological Studies
  Contact information: Carrolee,,Barlow
  URL: 
  PubMedIDs: 16244648
  Abstract: A 14 word abstract is available. Use the Abstract property.
  Experiment Design: A 8 word summary is available. Use the ExptDesign property.
  Other notes: 
    'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE3327/GSE3327_RAW.tar'

View the abstract of the experiment and its PubMed IDs.

abstract = exptGSE3327.Abstract
pubmedID = exptGSE3327.PubMedID
abstract =

    'Adult mouse gene expression patterns in common strains
     Keywords: mouse strain and brain region comparison'


pubmedID =

    '16244648'

Creating an ExpressionSet Object

The ExpressionSet class is designed specifically for microarray gene expression experiment data. Assemble an ExpressionSet object for the example mouse gene expression experiment from the different data objects you just created.

exptSet = bioma.ExpressionSet(exprsData, 'SData', sData,...
                                         'FData', fData,...
                                         'Einfo', exptGSE3327)
exptSet = 

ExpressionSet
Experiment Data: 500 features, 26 samples
  Element names: Expressions
Sample Data:
    Sample names:     A, B, ...,Z (26 total)
    Sample variable names and meta information: 
        Gender:  Gender of the mouse in study
        Age:  The number of weeks since mouse birth
        Type:  Genetic characters
        Strain:  The mouse strain
        Source:  The tissue source for RNA collection
Feature Data:
    Feature names:     100001_at, 100002_at, ...,100717_at (500 total)
    Feature variable names and meta information: 
        GeneTitle: Gene title of a probe set
        GeneSymbol: Probe set gene symbol
        ChromosomalLocation: Probe set chromosomal locations
        Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'

You can also create an ExpressionSet object with only the expression values in a DataMatrix or a numeric matrix.

miniExprSet = bioma.ExpressionSet(exprsData)
miniExprSet = 

ExpressionSet
Experiment Data: 500 features, 26 samples
  Element names: Expressions
Sample Data: none
Feature Data: none
Experiment Information: none

Saving and Loading an ExpressionSet Object

The data objects for a microarray experiment can be saved as MAT files. Save the ExpressionSet object exptSet to a MAT file named mouseExpressionSet.mat.

save mouseExpressionSet exptSet

Clear variables from the MATLAB Workspace.

clear dm exprs* mouseExptData ME sData

Load the MAT file mouseExpressionSet into the MATLAB Workspace.

load mouseExpressionSet

Inspect the loaded ExpressionSet object.

exptSet.elementNames
ans =

  cell

    'Expressions'

exptSet.NSamples
ans =

    26

exptSet.NFeatures
ans =

   500

Accessing Data Components of an ExpressionSet Object

A number of methods are available to access and update data stored in an ExpressionSet object.

You can access the columns of the sample data using dot notation.

exptSet.Strain(1:5)
ans =

  5×1 cell array

    '129S6/SvEvTac'
    '129S6/SvEvTac'
    '129S6/SvEvTac'
    'A/J '
    'A/J '

Retrieve the feature names using the featureNames method. In this example, the feature names are the probe set identifiers on the array.

featureNames(exptSet, 1:5)
ans =

  5×1 cell array

    '100001_at'
    '100002_at'
    '100003_at'
    '100004_at'
    '100005_at'

The unique identifier of the samples can be accessed via the sampleNames method.

exptSet.sampleNames(1:5)
ans =

  1×5 cell array

    'A'    'B'    'C'    'D'    'E'

The sampleVarNames method lists the variable names in the sample data.

exptSet.sampleVarNames
ans =

  1×5 cell array

    'Gender'    'Age'    'Type'    'Strain'    'Source'

Extract the dataset array containing sample information.

sDataset = sampleVarValues(exptSet)
sDataset = 

         Gender        Age    Type               Strain             
    A    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    B    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    C    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    D    'Male'        8      'Wild type'        'A/J '             
    E    'Male'        8      'Wild type'        'A/J '             
    F    'Male'        8      'Wild type'        'C57BL/6J '        
    G    'Male'        8      'Wild type'        'C57BL/6J'         
    H    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    I    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    J    'Male'        8      'Wild type'        'A/J'              
    K    'Male'        8      'Wild type'        'A/J'              
    L    'Male'        8      'Wild type'        'A/J'              
    M    'Male'        8      'Wild type'        'C57BL/6J'         
    N    'Male'        8      'Wild type'        'C57BL/6J'         
    O    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    P    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    Q    'Male'        8      'Wild type'        'A/J'              
    R    'Male'        8      'Wild type'        'A/J'              
    S    'Male'        8      'Wild type'        'C57BL/6J'         
    T    'Male'        8      'Wild type'        'C57BL/6J4'        
    U    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    V    'Male'        8      'Wild type'        '129S6/SvEvTac'    
    W    'Male'        8      'Wild type'        'A/J'              
    X    'Male'        8      'Wild type'        'A/J'              
    Y    'Male'        8      'Wild type'        'C57BL/6J'         
    Z    'Male'        8      'Wild type'        'C57BL/6J'         


         Source                
    A    'amygdala'            
    B    'amygdala'            
    C    'amygdala'            
    D    'amygdala'            
    E    'amygdala'            
    F    'amygdala'            
    G    'amygdala'            
    H    'cingulate cortex'    
    I    'cingulate cortex'    
    J    'cingulate cortex'    
    K    'cingulate cortex'    
    L    'cingulate cortex'    
    M    'cingulate cortex'    
    N    'cingulate cortex'    
    O    'hippocampus'         
    P    'hippocampus'         
    Q    'hippocampus'         
    R    'hippocampus'         
    S    'hippocampus'         
    T    'hippocampus'         
    U    'hypothalamus'        
    V    'hypothalamus'        
    W    'hypothalamus'        
    X    'hypothalamus'        
    Y    'hypothalamus'        
    Z    'hypothalamus'        

Retrieve the ExptData object containing expression values. There may be more than one DataMatrix object with identical dimensions in an ExptData object. In an ExpressionSet object, there is always a element DataMatrix object named Expressions containing the expression matrix.

exptDS = exptData(exptSet)
exptDS = 

Experiment Data:
  500 features,  26 samples
  1 elements
  Element names: Expressions

Extract only the expression DataMatrix instance.

dMatrix = expressions(exptSet);

The returned expression DataMatrix should be identical to the exprsData DataMatrix object that you created earlier.

get(dMatrix)
            Name: 'mouseExprsData'
        RowNames: {500×1 cell}
        ColNames: {1×26 cell}
           NRows: 500
           NCols: 26
           NDims: 2
    ElementClass: 'double'

Get PubMed IDs for the experiment stored in exptSet.

exptSet.pubMedID
ans =

    '16244648'

Subsetting an ExpressionSet Object

You can subset an ExpressionSet object so that you can focus on the samples and features of interest. The first indexing argument subsets the features and the second argument subsets the samples.

Create a new ExpressionSet object consisting of the first five features and the samples named A, B, and C.

mySet = exptSet(1:5, {'A', 'B', 'C'})
mySet = 

ExpressionSet
Experiment Data: 5 features, 3 samples
  Element names: Expressions
Sample Data:
    Sample names:     A, B, C
    Sample variable names and meta information: 
        Gender:  Gender of the mouse in study
        Age:  The number of weeks since mouse birth
        Type:  Genetic characters
        Strain:  The mouse strain
        Source:  The tissue source for RNA collection
Feature Data:
    Feature names:     100001_at, 100002_at, ...,100005_at (5 total)
    Feature variable names and meta information: 
        GeneTitle: Gene title of a probe set
        GeneSymbol: Probe set gene symbol
        ChromosomalLocation: Probe set chromosomal locations
        Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'
size(mySet)
ans =

     5     3

featureNames(mySet)
ans =

  5×1 cell array

    '100001_at'
    '100002_at'
    '100003_at'
    '100004_at'
    '100005_at'

sampleNames(mySet)
ans =

  1×3 cell array

    'A'    'B'    'C'

You can also create a subset consisting of only the samples from hippocampus tissues.

hippocampusSet = exptSet(:, nominal(exptSet.Source)== 'hippocampus')
hippocampusSet = 

ExpressionSet
Experiment Data: 500 features, 6 samples
  Element names: Expressions
Sample Data:
    Sample names:     O, P, ...,T (6 total)
    Sample variable names and meta information: 
        Gender:  Gender of the mouse in study
        Age:  The number of weeks since mouse birth
        Type:  Genetic characters
        Strain:  The mouse strain
        Source:  The tissue source for RNA collection
Feature Data:
    Feature names:     100001_at, 100002_at, ...,100717_at (500 total)
    Feature variable names and meta information: 
        GeneTitle: Gene title of a probe set
        GeneSymbol: Probe set gene symbol
        ChromosomalLocation: Probe set chromosomal locations
        Pathway: The pathway the genes involved in
Experiment Information: use 'exptInfo(obj)'
hippocampusSet.Source
ans =

  6×1 cell array

    'hippocampus'
    'hippocampus'
    'hippocampus'
    'hippocampus'
    'hippocampus'
    'hippocampus'

hippocampusExprs = expressions(hippocampusSet);
get(hippocampusExprs)
            Name: 'mouseExprsData'
        RowNames: {500×1 cell}
        ColNames: {'O'  'P'  'Q'  'R'  'S'  'T'}
           NRows: 500
           NCols: 6
           NDims: 2
    ElementClass: 'double'

References

[1] Hovatta, I., et al., "Glyoxalase 1 and glutathione reductase 1 regulate anxiety in mice", Nature, 438(7068):662-6, 2005.

[2] Brazma, A., et al., "Minimum information about a microarray experiment (MIAME) - toward standards for microarray data", Nat. Genet. 29(4):365-371, 2001.

Was this topic helpful?