MATLAB Answers


Can I color my PCA data by column, and if so, how?

Asked by Katharine Dickson on 14 Jun 2018
Latest activity Commented on by Image Analyst
on 15 Jun 2018
I have a data set in a table object, with 11 columns. The first column is a string, corresponding to a group of orthologous genes (basically, genes that do the same thing in different species). The other ten columns are numbers that describe how much that gene, in species 1 or 2, is being expressed at a given point in time in this species - let's say we've got columns for species 1 timepoints 1-5, and species 2 timepoints 1-5.
I want to perform a PCA on it, but I want to color the data on it by column, such that I can figure out which species/timepoint a given datapoint belongs to.
Is this possible, and if so, how can I do it?


Sign in to comment.

1 Answer

Answer by Image Analyst
on 15 Jun 2018

I'm not sure this makes sense, at least to me it doesn't. So you run PCA on your data and for each observation, you'll get 10 principal components. Now if you want to do a scatterplot where you color each point with a color representing a certain range of a certain PC, you can use gscatter() to do that. But I don't know what it means to "color the data on it by column". My guess is that you want Machine Learning, perhaps discriminant classification or KNN, rather than PCA. See the chart on


After looking at my data, I think I understand a bit better now exactly what I need. There are 3676 components for each of these columns, corresponding to each entry in column 1 - each 'component' is a gene group. I need to condense them into two principal components for each column.
Still not sure I'm visualizing it correctly. So you have 10 columns of data, and 1 column that defines what gene group, of two possible gene groups (species), each row of the table is? And you want to get, for each column the 2 PCs for that column? Attach your .mat file if you want more help.

Sign in to comment.