## Documentation Center |

On this page… |
---|

Exploring the Microarray Data Set |

This example looks at the various ways to visualize microarray data. The data comes from a pharmacological model of Parkinson's disease (PD) using a mouse brain. The microarray data for this example is from Brown, V.M., Ossadtchi, A., Khan, A.H., Yee, S., Lacan, G., Melega, W.P., Cherry, S.R., Leahy, R.M., and Smith, D.J.; "Multiplex three dimensional brain gene expression mapping in a mouse model of Parkinson's disease"; Genome Research 12(6): 868-884 (2002).

The microarray data used in this example is available in a Web
supplement to the paper by Brown et al. and in the file `mouse_a1pd.gpr` included
with the Bioinformatics Toolbox™ software.

http://labs.pharmacology.ucla.edu/smithlab/genome_multiplex/

The microarray data is also available on the Gene Expression Omnibus Web site at

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30

The GenePix^{®} GPR-formatted file `mouse_a1pd.gpr` contains
the data for one of the microarrays used in the study. This is data
from voxel A1 of the brain of a mouse in which a pharmacological model
of Parkinson's disease (PD) was induced using methamphetamine. The
voxel sample was labeled with Cy3 (green) and the control, RNA from
a total (not voxelated) normal mouse brain, was labeled with Cy5 (red).
GPR formatted files provide a large amount of information about the
array, including the mean, median, and standard deviation of the foreground
and background intensities of each spot at the 635 nm wavelength (the
red, Cy5 channel) and the 532 nm wavelength (the green, Cy3 channel).

This procedure illustrates how to import data from the Web into
the MATLAB^{®} environment, using data from a study about gene expression
in mouse brains as an example. See Overview of the Mouse Example.

Read data from a file into a MATLAB structure. For example, in the MATLAB Command Window, type

pd = gprread('mouse_a1pd.gpr')

Information about the structure displays in the MATLAB Command Window:

pd = Header: [1x1 struct] Data: [9504x38 double] Blocks: [9504x1 double] Columns: [9504x1 double] Rows: [9504x1 double] Names: {9504x1 cell} IDs: {9504x1 cell} ColumnNames: {38x1 cell} Indices: [132x72 double] Shape: [1x1 struct]

Access the fields of a structure using

`StructureName.FieldName`. For example, you can access the field`ColumnNames`of the structure`pd`by typingpd.ColumnNames

The column names are shown below.

ans = 'X' 'Y' 'Dia.' 'F635 Median' 'F635 Mean' 'F635 SD' 'B635 Median' 'B635 Mean' 'B635 SD' '% > B635+1SD' '% > B635+2SD' 'F635 % Sat.' 'F532 Median' 'F532 Mean' 'F532 SD' 'B532 Median' 'B532 Mean' 'B532 SD' '% > B532+1SD' '% > B532+2SD' 'F532 % Sat.' 'Ratio of Medians' 'Ratio of Means' 'Median of Ratios' 'Mean of Ratios' 'Ratios SD' 'Rgn Ratio' 'Rgn R²' 'F Pixels' 'B Pixels' 'Sum of Medians' 'Sum of Means' 'Log Ratio' 'F635 Median - B635' 'F532 Median - B532' 'F635 Mean - B635' 'F532 Mean - B532' 'Flags'

Access the names of the genes. For example, to list the first 20 gene names, type

pd.Names(1:20)

A list of the first 20 gene names is displayed:

ans = 'AA467053' 'AA388323' 'AA387625' 'AA474342' 'Myo1b' 'AA473123' 'AA387579' 'AA387314' 'AA467571' '' 'Spop' 'AA547022' 'AI508784' 'AA413555' 'AA414733' '' 'Snta1' 'AI414419' 'W14393' 'W10596'

This procedure illustrates how to visualize microarray data
by plotting image maps. The function `maimage` can
take a microarray data structure and create a pseudocolor image of
the data arranged in the same order as the spots on the array. In
other words, `maimage` plots a spatial plot of the
microarray.

This procedure uses data from a study of gene expression in
mouse brains. For a list of field names in the MATLAB structure `pd`,
see Exploring the Microarray Data Set.

Plot the median values for the red channel. For example, to plot data from the field

`F635 Median`, typefigure maimage(pd,'F635 Median')

The MATLAB software plots an image showing the median pixel values for the foreground of the red (Cy5) channel.

Plot the median values for the green channel. For example, to plot data from the field

`F532 Median`, typefigure maimage(pd,'F532 Median')

The MATLAB software plots an image showing the median pixel values of the foreground of the green (Cy3) channel.

Plot the median values for the red background. The field

`B635 Median`shows the median values for the background of the red channel.figure maimage(pd,'B635 Median')

The MATLAB software plots an image for the background of the red channel. Notice the very high background levels down the right side of the array.

Plot the medial values for the green background. The field

`B532 Median`shows the median values for the background of the green channel.figure maimage(pd,'B532 Median')

The MATLAB software plots an image for the background of the green channel.

The first array was for the Parkinson's disease model mouse. Now read in the data for the same brain voxel but for the untreated control mouse. In this case, the voxel sample was labeled with Cy3 and the control, total brain (not voxelated), was labeled with Cy5.

wt = gprread('mouse_a1wt.gpr')

The MATLAB software creates a structure and displays information about the structure.

wt = Header: [1x1 struct] Data: [9504x38 double] Blocks: [9504x1 double] Columns: [9504x1 double] Rows: [9504x1 double] Names: {9504x1 cell} IDs: {9504x1 cell} ColumnNames: {38x1 cell} Indices: [132x72 double] Shape: [1x1 struct]

Use the function

`maimage`to show pseudocolor images of the foreground and background. You can use the function`subplot`to put all the plots onto one figure.figure subplot(2,2,1); maimage(wt,'F635 Median') subplot(2,2,2); maimage(wt,'F532 Median') subplot(2,2,3); maimage(wt,'B635 Median') subplot(2,2,4); maimage(wt,'B532 Median')

The MATLAB software plots the images.

If you look at the scale for the background images, you will notice that the background levels are much higher than those for the PD mouse and there appears to be something nonrandom affecting the background of the Cy3 channel of this slide. Changing the colormap can sometimes provide more insight into what is going on in pseudocolor plots. For more control over the color, try the

`colormapeditor`function.colormap hot

The MATLAB software plots the images.

The function

`maimage`is a simple way to quickly create pseudocolor images of microarray data. However if you want more control over plotting, it is easy to create your own plots using the function`imagesc`.First find the column number for the field of interest.

b532MedCol = find(strcmp(wt.ColumnNames,'B532 Median'))

The MATLAB software displays:

b532MedCol = 16

Extract that column from the field

`Data`.b532Data = wt.Data(:,b532MedCol);

Use the field

`Indices`to index into the`Data`.figure subplot(1,2,1); imagesc(b532Data(wt.Indices)) axis image colorbar title('B532 Median')

The MATLAB software plots the image.

Bound the intensities of the background plot to give more contrast in the image.

maskedData = b532Data; maskedData(b532Data<500) = 500; maskedData(b532Data>2000) = 2000; subplot(1,2,2); imagesc(maskedData(wt.Indices)) axis image colorbar title('Enhanced B532 Median')

The MATLAB software plots the images.

This procedure illustrates how to visualize distributions in
microarray data. You can use the function `maboxplot` to
look at the distribution of data in each of the blocks.

In the MATLAB Command Window, type

figure subplot(2,1,1) maboxplot(pd,'F532 Median','title','Parkinson''s Disease Model Mouse') subplot(2,1,2) maboxplot(pd,'B532 Median','title','Parkinson''s Disease Model Mouse') figure subplot(2,1,1) maboxplot(wt,'F532 Median','title','Untreated Mouse') subplot(2,1,2) maboxplot(wt,'B532 Median','title','Untreated Mouse')

The MATLAB software plots the images.

Compare the plots.

From the box plots you can clearly see the spatial effects in the background intensities. Blocks numbers

`1`,`3`,`5`, and`7`are on the left side of the arrays, and numbers`2`,`4`,`6`, and`8`are on the right side. The data must be normalized to remove this spatial bias.

This procedure illustrates how to visualize expression levels
in microarray data. There are two columns in the microarray data structure
labeled `'F635 Median - B635'` and `'F532
Median - B532'`. These columns are the differences between
the median foreground and the median background for the `635` `nm` channel
and `532` `nm` channel respectively.
These give a measure of the actual expression levels, although since
the data must first be normalized to remove spatial bias in the background,
you should be careful about using these values without further normalization.
However, in this example no normalization is performed.

Rather than working with data in a larger structure, it is often easier to extract the column numbers and data into separate variables.

cy5DataCol = find(strcmp(wt.ColumnNames,'F635 Median - B635')) cy3DataCol = find(strcmp(wt.ColumnNames,'F532 Median - B532')) cy5Data = pd.Data(:,cy5DataCol); cy3Data = pd.Data(:,cy3DataCol);

The MATLAB software displays:

cy5DataCol = 34 cy3DataCol = 35

A simple way to compare the two channels is with a loglog plot. The function

`maloglog`is used to do this. Points that are above the diagonal in this plot correspond to genes that have higher expression levels in the A1 voxel than in the brain as a whole.figure maloglog(cy5Data,cy3Data) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');

The MATLAB software displays the following messages and plots the images.

Warning: Zero values are ignored (Type "warning off Bioinfo:MaloglogZeroValues" to suppress this warning.) Warning: Negative values are ignored. (Type "warning off Bioinfo:MaloglogNegativeValues" to suppress this warning.)

Notice that this function gives some warnings about negative and zero elements. This is because some of the values in the

`'F635 Median - B635'`and`'F532 Median - B532'`columns are zero or even less than zero. Spots where this happened might be bad spots or spots that failed to hybridize. Points with positive, but very small, differences between foreground and background should also be considered to be bad spots.Disable the display of warnings by using the

`warning`command. Although warnings can be distracting, it is good practice to investigate why the warnings occurred rather than simply to ignore them. There might be some systematic reason why they are bad.warnState = warning; % First save the current warning state. % Now turn off the two warnings. warning('off','Bioinfo:MaloglogZeroValues'); warning('off','Bioinfo:MaloglogNegativeValues'); figure maloglog(cy5Data,cy3Data) % Create the loglog plot warning(warnState); % Reset the warning state. xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');

The MATLAB software plots the image.

An alternative to simply ignoring or disabling the warnings is to remove the bad spots from the data set. You can do this by finding points where either the red or green channel has values less than or equal to a threshold value. For example, use a threshold value of

`10`.threshold = 10; badPoints = (cy5Data <= threshold) | (cy3Data <= threshold);

The MATLAB software plots the image.

You can then remove these points and redraw the loglog plot.

cy5Data(badPoints) = []; cy3Data(badPoints) = []; figure maloglog(cy5Data,cy3Data) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');

The MATLAB software plots the image.

This plot shows the distribution of points but does not give any indication about which genes correspond to which points.

Add gene labels to the plot. Because some of the data points have been removed, the corresponding gene IDs must also be removed from the data set before you can use them. The simplest way to do that is

`wt.IDs(~badPoints)`.maloglog(cy5Data,cy3Data,'labels',wt.IDs(~badPoints),... 'factorlines',2) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');

The MATLAB software plots the image.

Try using the mouse to click some of the outlier points.

You will see the gene ID associated with the point. Most of the outliers are below the

`y = x`line. In fact, most of the points are below this line. Ideally the points should be evenly distributed on either side of this line.Normalize the points to evenly distribute them on either side of the line. Use the function

`mameannorm`to perform global mean normalization.normcy5 = mameannorm(cy5Data); normcy3 = mameannorm(cy3Data);

If you plot the normalized data you will see that the points are more evenly distributed about the

`y = x`line.figure maloglog(normcy5,normcy3,'labels',wt.IDs(~badPoints),... 'factorlines',2) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');

The MATLAB software plots the image.

The function

`mairplot`is used to create an Intensity vs. Ratio plot for the normalized data. This function works in the same way as the function`maloglog`.figure mairplot(normcy5,normcy3,'labels',wt.IDs(~badPoints),... 'factorlines',2)

The MATLAB software plots the image.

You can click the points in this plot to see the name of the gene associated with the plot.

Was this topic helpful?