PCA of dataset that contains 4 samples, how to compare principal component results between samples and use biplot to visually compare?

4 views (last 30 days)
I am performing a pca of dataset A, 163x5. Within A, there are 4 samples that I would like to eventually compare with row indices of A1 = A(1:45,:); A2= A(46:83,:); A3 = A(84:125,:); A4 = A(126:163,:).
Performed pca on all of A by: [coeff,score,latent,explained,tsquare] = pca(zscore(A)); Then, I am using biplot to illustrate results and want to differentiate between samples A1 through A4 in dataset. In order to compare all of the samples, the biplot must be applied to the entire dataset at one time. In other words, the method of: biplot(coeff(:,1:2),'Scores',score(1:45,1:2),'VarLabels',{'DO' 'K' 'TOM' 'dh' 'N'} 'Marker','s','MarkerEdgeColor','g','MarkerSize', 9); hold on biplot(coeff(:,1:2),'Scores',score(46:83,1:2),'VarLabels',{'DO' 'K' 'TOM' 'dh' 'N'} 'Marker','d','MarkerEdgeColor','b','MarkerSize', 9); etc.... will not work. So I can go to plot editor, and in the plot browser count the number of plots that match A1, change to a different marker, and thus differentiate between samples A1,A2,... But this not a great way to do it!!!!!! Any ideas? Also, as part of this I would like to find the centroid of each sample. Again this is not as simple as average of x and average of y for each sample because of the manipulation biplot performs. Any help is greatly appreciated. Thanks! George

Answers (1)

George McFadden
George McFadden on 19 Nov 2012
I figured out a work around for using biplot to plot pca results of a dataset that contains 4 samples within the dataset and plot the centroids with errorbars on the centroids. First transformed the pca results (coeff and score) with the same transformation biplot would do, creating a new matrix. The centroids (taking the mean) were calculated with new matrix, and the associated error bars.
The original pca data was plotted using biplot function, then centroid and error bars plotted using plot function and ploterr function. I used code found at stackoverflow, url referenced. And the ploterr function from file exchange.
I have double checked a lot of things here and believe the figure represents the data accurately.
xxx = coeff(:,1:2);
yyy= score(:,1:2);
%Taken from biplot.m; This is alter the data the same way biplot alters data - having the %data fit on grid axes no larger than 1.**
[n,d2] = size(yyy);
[p,d] = size(xxx); %7 by 3
[dum,maxind] = max(abs(xxx),[],1);
colsign = sign(xxx(maxind + (0:p:(d-1)*p)));
xxx = xxx .* repmat(colsign, p, 1);
yyy= (yyy ./ max(abs(yyy(:)))) .* repmat(colsign, 163, 1);
nans = NaN(n,1);
ptx = [yyy(:,1) nans]';
pty = [yyy(:,2) nans]';
% ptz = [yyy(:,3) nans]';
%I grouped the pt matrices for my benefit**
plotdataholder(:,1) = ptx(1,:);
plotdataholder(:,2) = pty(1,:);
% plotdataholder(:,3) = ptz(1,:);
% %my original score matrix is 42x3 - wanted each 14x3 to be a different color**
scatter(plotdataholder(1:45,1),plotdataholder(1:45,2),'marker', 'o');
hold on;
scatter(mean(plotdataholder(1:45,1)),mean(plotdataholder(1:45,2)),'marker', '.');
scatter(plotdataholder(46:83,1),plotdataholder(46:83,2),'marker', 'd') ;
scatter(mean(plotdataholder(46:83,1)),mean(plotdataholder(46:83,2)),'marker', 'd') ;
scatter(plotdataholder(84:125,1),plotdataholder(84:125,2),'marker', '^') ;
scatter(mean(plotdataholder(84:125,1)),mean(plotdataholder(84:125,2)),'marker', '^') ;
scatter(plotdataholder(126:163,1),plotdataholder(126:163,2),'marker', 's') ;
scatter(mean(plotdataholder(126:163,1)),mean(plotdataholder(126:163,2)),'marker', 's') ;
xlabel('Principal Component 1');
ylabel('Principal Component 2');
% zlabel('Principal Component 3');
Now for the final part: a1=plotdataholder(1:45,1);b1=plotdataholder(1:45,2); a2=plotdataholder(46:83,1);b2=plotdataholder(46:83,2); a3=plotdataholder(84:125,1);b3=plotdataholder(84:125,2); a4=plotdataholder(126:163,1);b4=plotdataholder(126:163,2); n1=length(a1); n2=length(a2); n3=length(a3); n4=length(a4); x1=mean(a1);y1=mean(b1); x2=mean(a2);y2=mean(b2); x3=mean(a3);y3=mean(b3); x4=mean(a4);y4=mean(b4); sema1=std(a1)/n1; semb1=std(b1)/n1; sema2=std(a2)/n2; semb2=std(b2)/n2; sema3=std(a3)/n3; semb3=std(b3)/n3; sema4=std(a4)/n4; semb4=std(b4)/n4;
%%This is the pca all stream code for creating a biplot figure
% load pcasolution.mat
figure
hp=biplot(coeff(:,1:2),'Scores',score(:,1:2),'VarLabels',{'DO' 'K' 'TOM' 'dh' 'N'},...
'Marker','s','MarkerEdgeColor','g','MarkerSize', 9);
legend([hp(16,:) hp(61,:) hp(99,:) hp(141,:)], 'Coal Kiln','Machipongo',...
'Parkers','Phillips','Location','NorthWest');
hold on
plot(x1,y1,x2,y2,x3,y3,x4,y4,'or')
ploterr(x1,y1,sema1,semb1)
ploterr(x2,y2,sema2,semb2)
ploterr(x3,y3,sema3,semb3)
ploterr(x4,y4,sema4,semb4)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!