Documentation |
On this page… |
---|
Exploring the Microarray Data Set |
This example looks at the various ways to visualize microarray data. The data comes from a pharmacological model of Parkinson's disease (PD) using a mouse brain. The microarray data for this example is from Brown, V.M., Ossadtchi, A., Khan, A.H., Yee, S., Lacan, G., Melega, W.P., Cherry, S.R., Leahy, R.M., and Smith, D.J.; "Multiplex three dimensional brain gene expression mapping in a mouse model of Parkinson's disease"; Genome Research 12(6): 868-884 (2002).
The microarray data used in this example is available in a Web supplement to the paper by Brown et al. and in the file mouse_a1pd.gpr included with the Bioinformatics Toolbox™ software.
http://labs.pharmacology.ucla.edu/smithlab/genome_multiplex/
The microarray data is also available on the Gene Expression Omnibus Web site at
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30
The GenePix^{®} GPR-formatted file mouse_a1pd.gpr contains the data for one of the microarrays used in the study. This is data from voxel A1 of the brain of a mouse in which a pharmacological model of Parkinson's disease (PD) was induced using methamphetamine. The voxel sample was labeled with Cy3 (green) and the control, RNA from a total (not voxelated) normal mouse brain, was labeled with Cy5 (red). GPR formatted files provide a large amount of information about the array, including the mean, median, and standard deviation of the foreground and background intensities of each spot at the 635 nm wavelength (the red, Cy5 channel) and the 532 nm wavelength (the green, Cy3 channel).
This procedure illustrates how to import data from the Web into the MATLAB^{®} environment, using data from a study about gene expression in mouse brains as an example. See Overview of the Mouse Example.
Read data from a file into a MATLAB structure. For example, in the MATLAB Command Window, type
pd = gprread('mouse_a1pd.gpr')
Information about the structure displays in the MATLAB Command Window:
pd = Header: [1x1 struct] Data: [9504x38 double] Blocks: [9504x1 double] Columns: [9504x1 double] Rows: [9504x1 double] Names: {9504x1 cell} IDs: {9504x1 cell} ColumnNames: {38x1 cell} Indices: [132x72 double] Shape: [1x1 struct]
Access the fields of a structure using StructureName.FieldName. For example, you can access the field ColumnNames of the structure pd by typing
pd.ColumnNames
The column names are shown below.
ans = 'X' 'Y' 'Dia.' 'F635 Median' 'F635 Mean' 'F635 SD' 'B635 Median' 'B635 Mean' 'B635 SD' '% > B635+1SD' '% > B635+2SD' 'F635 % Sat.' 'F532 Median' 'F532 Mean' 'F532 SD' 'B532 Median' 'B532 Mean' 'B532 SD' '% > B532+1SD' '% > B532+2SD' 'F532 % Sat.' 'Ratio of Medians' 'Ratio of Means' 'Median of Ratios' 'Mean of Ratios' 'Ratios SD' 'Rgn Ratio' 'Rgn R²' 'F Pixels' 'B Pixels' 'Sum of Medians' 'Sum of Means' 'Log Ratio' 'F635 Median - B635' 'F532 Median - B532' 'F635 Mean - B635' 'F532 Mean - B532' 'Flags'
Access the names of the genes. For example, to list the first 20 gene names, type
pd.Names(1:20)
A list of the first 20 gene names is displayed:
ans = 'AA467053' 'AA388323' 'AA387625' 'AA474342' 'Myo1b' 'AA473123' 'AA387579' 'AA387314' 'AA467571' '' 'Spop' 'AA547022' 'AI508784' 'AA413555' 'AA414733' '' 'Snta1' 'AI414419' 'W14393' 'W10596'
This procedure illustrates how to visualize microarray data by plotting image maps. The function maimage can take a microarray data structure and create a pseudocolor image of the data arranged in the same order as the spots on the array. In other words, maimage plots a spatial plot of the microarray.
This procedure uses data from a study of gene expression in mouse brains. For a list of field names in the MATLAB structure pd, see Exploring the Microarray Data Set.
Plot the median values for the red channel. For example, to plot data from the field F635 Median, type
figure maimage(pd,'F635 Median')
The MATLAB software plots an image showing the median pixel values for the foreground of the red (Cy5) channel.
Plot the median values for the green channel. For example, to plot data from the field F532 Median, type
figure maimage(pd,'F532 Median')
The MATLAB software plots an image showing the median pixel values of the foreground of the green (Cy3) channel.
Plot the median values for the red background. The field B635 Median shows the median values for the background of the red channel.
figure maimage(pd,'B635 Median')
The MATLAB software plots an image for the background of the red channel. Notice the very high background levels down the right side of the array.
Plot the medial values for the green background. The field B532 Median shows the median values for the background of the green channel.
figure maimage(pd,'B532 Median')
The MATLAB software plots an image for the background of the green channel.
The first array was for the Parkinson's disease model mouse. Now read in the data for the same brain voxel but for the untreated control mouse. In this case, the voxel sample was labeled with Cy3 and the control, total brain (not voxelated), was labeled with Cy5.
wt = gprread('mouse_a1wt.gpr')
The MATLAB software creates a structure and displays information about the structure.
wt = Header: [1x1 struct] Data: [9504x38 double] Blocks: [9504x1 double] Columns: [9504x1 double] Rows: [9504x1 double] Names: {9504x1 cell} IDs: {9504x1 cell} ColumnNames: {38x1 cell} Indices: [132x72 double] Shape: [1x1 struct]
Use the function maimage to show pseudocolor images of the foreground and background. You can use the function subplot to put all the plots onto one figure.
figure subplot(2,2,1); maimage(wt,'F635 Median') subplot(2,2,2); maimage(wt,'F532 Median') subplot(2,2,3); maimage(wt,'B635 Median') subplot(2,2,4); maimage(wt,'B532 Median')
The MATLAB software plots the images.
If you look at the scale for the background images, you will notice that the background levels are much higher than those for the PD mouse and there appears to be something nonrandom affecting the background of the Cy3 channel of this slide. Changing the colormap can sometimes provide more insight into what is going on in pseudocolor plots. For more control over the color, try the colormapeditor function.
colormap hot
The MATLAB software plots the images.
The function maimage is a simple way to quickly create pseudocolor images of microarray data. However if you want more control over plotting, it is easy to create your own plots using the function imagesc.
First find the column number for the field of interest.
b532MedCol = find(strcmp(wt.ColumnNames,'B532 Median'))
The MATLAB software displays:
b532MedCol = 16
Extract that column from the field Data.
b532Data = wt.Data(:,b532MedCol);
Use the field Indices to index into the Data.
figure subplot(1,2,1); imagesc(b532Data(wt.Indices)) axis image colorbar title('B532 Median')
The MATLAB software plots the image.
Bound the intensities of the background plot to give more contrast in the image.
maskedData = b532Data; maskedData(b532Data<500) = 500; maskedData(b532Data>2000) = 2000; subplot(1,2,2); imagesc(maskedData(wt.Indices)) axis image colorbar title('Enhanced B532 Median')
The MATLAB software plots the images.
This procedure illustrates how to visualize distributions in microarray data. You can use the function maboxplot to look at the distribution of data in each of the blocks.
In the MATLAB Command Window, type
figure subplot(2,1,1) maboxplot(pd,'F532 Median','title','Parkinson''s Disease Model Mouse') subplot(2,1,2) maboxplot(pd,'B532 Median','title','Parkinson''s Disease Model Mouse') figure subplot(2,1,1) maboxplot(wt,'F532 Median','title','Untreated Mouse') subplot(2,1,2) maboxplot(wt,'B532 Median','title','Untreated Mouse')
The MATLAB software plots the images.
Compare the plots.
From the box plots you can clearly see the spatial effects in the background intensities. Blocks numbers 1, 3, 5, and 7 are on the left side of the arrays, and numbers 2, 4, 6, and 8 are on the right side. The data must be normalized to remove this spatial bias.
This procedure illustrates how to visualize expression levels in microarray data. There are two columns in the microarray data structure labeled 'F635 Median - B635' and 'F532 Median - B532'. These columns are the differences between the median foreground and the median background for the 635 nm channel and 532 nm channel respectively. These give a measure of the actual expression levels, although since the data must first be normalized to remove spatial bias in the background, you should be careful about using these values without further normalization. However, in this example no normalization is performed.
Rather than working with data in a larger structure, it is often easier to extract the column numbers and data into separate variables.
cy5DataCol = find(strcmp(wt.ColumnNames,'F635 Median - B635')) cy3DataCol = find(strcmp(wt.ColumnNames,'F532 Median - B532')) cy5Data = pd.Data(:,cy5DataCol); cy3Data = pd.Data(:,cy3DataCol);
The MATLAB software displays:
cy5DataCol = 34 cy3DataCol = 35
A simple way to compare the two channels is with a loglog plot. The function maloglog is used to do this. Points that are above the diagonal in this plot correspond to genes that have higher expression levels in the A1 voxel than in the brain as a whole.
figure maloglog(cy5Data,cy3Data) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');
The MATLAB software displays the following messages and plots the images.
Warning: Zero values are ignored (Type "warning off Bioinfo:MaloglogZeroValues" to suppress this warning.) Warning: Negative values are ignored. (Type "warning off Bioinfo:MaloglogNegativeValues" to suppress this warning.)
Notice that this function gives some warnings about negative and zero elements. This is because some of the values in the 'F635 Median - B635' and 'F532 Median - B532' columns are zero or even less than zero. Spots where this happened might be bad spots or spots that failed to hybridize. Points with positive, but very small, differences between foreground and background should also be considered to be bad spots.
Disable the display of warnings by using the warning command. Although warnings can be distracting, it is good practice to investigate why the warnings occurred rather than simply to ignore them. There might be some systematic reason why they are bad.
warnState = warning; % First save the current warning state. % Now turn off the two warnings. warning('off','Bioinfo:MaloglogZeroValues'); warning('off','Bioinfo:MaloglogNegativeValues'); figure maloglog(cy5Data,cy3Data) % Create the loglog plot warning(warnState); % Reset the warning state. xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');
The MATLAB software plots the image.
An alternative to simply ignoring or disabling the warnings is to remove the bad spots from the data set. You can do this by finding points where either the red or green channel has values less than or equal to a threshold value. For example, use a threshold value of 10.
threshold = 10; badPoints = (cy5Data <= threshold) | (cy3Data <= threshold);
The MATLAB software plots the image.
You can then remove these points and redraw the loglog plot.
cy5Data(badPoints) = []; cy3Data(badPoints) = []; figure maloglog(cy5Data,cy3Data) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');
The MATLAB software plots the image.
This plot shows the distribution of points but does not give any indication about which genes correspond to which points.
Add gene labels to the plot. Because some of the data points have been removed, the corresponding gene IDs must also be removed from the data set before you can use them. The simplest way to do that is wt.IDs(~badPoints).
maloglog(cy5Data,cy3Data,'labels',wt.IDs(~badPoints),... 'factorlines',2) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');
The MATLAB software plots the image.
Try using the mouse to click some of the outlier points.
You will see the gene ID associated with the point. Most of the outliers are below the y = x line. In fact, most of the points are below this line. Ideally the points should be evenly distributed on either side of this line.
Normalize the points to evenly distribute them on either side of the line. Use the function manorm to perform global mean normalization.
normcy5 = mannorm(cy5Data); normcy3 = manorm(cy3Data);
If you plot the normalized data you will see that the points are more evenly distributed about the y = x line.
figure maloglog(normcy5,normcy3,'labels',wt.IDs(~badPoints),... 'factorlines',2) xlabel('F635 Median - B635 (Control)'); ylabel('F532 Median - B532 (Voxel A1)');
The MATLAB software plots the image.
The function mairplot is used to create an Intensity vs. Ratio plot for the normalized data. This function works in the same way as the function maloglog.
figure mairplot(normcy5,normcy3,'labels',wt.IDs(~badPoints),... 'factorlines',2)
The MATLAB software plots the image.
You can click the points in this plot to see the name of the gene associated with the plot.