Visualize High-Dimensional Data Using t-SNE
This example shows how to visualize the
humanactivity data, which consists of acceleration data collected from smartphones during various activities.
tsne reduces the dimension of the data from 60 original dimensions to two or three.
tsne creates a nonlinear transformation whose purpose is to enable grouping of points with similar characteristics. Ideally, the
tsne result shows clean separation of the 60-dimensional data points into groups.
Load and Examine Data
humanactivity data, which is available when you run this example.
View a description of the data.
Description = 29×1 string " === Human Activity Data === " " " " The humanactivity data set contains 24,075 observations of five different " " physical human activities: Sitting, Standing, Walking, Running, and " " Dancing. Each observation has 60 features extracted from acceleration " " data measured by smartphone accelerometer sensors. The data set contains " " the following variables: " " " " * actid - Response vector containing the activity IDs in integers: 1, 2, " " 3, 4, and 5 representing Sitting, Standing, Walking, Running, and " " Dancing, respectively " " * actnames - Activity names corresponding to the integer activity IDs " " * feat - Feature matrix of 60 features for 24,075 observations " " * featlabels - Labels of the 60 features " " " " The Sensor HAR (human activity recognition) App  was used to create " " the humanactivity data set. When measuring the raw acceleration data with " " this app, a person placed a smartphone in a pocket so that the smartphone " " was upside down and the screen faced toward the person. The software then " " calibrated the measured raw data accordingly and extracted the 60 " " features from the calibrated data. For details about the calibration and " " feature extraction, see  and , respectively. " " " "  El Helou, A. Sensor HAR recognition App. MathWorks File Exchange " " http://www.mathworks.com/matlabcentral/fileexchange/54138-sensor-har-recognition-app " "  STMicroelectronics, AN4508 Application note. “Parameters and " " calibration of a low-g 3-axis accelerometer.” 2014. " "  El Helou, A. Sensor Data Analytics. MathWorks File Exchange " " https://www.mathworks.com/matlabcentral/fileexchange/54139-sensor-data-analytics--french-webinar-code- "
The data set is organized by activity type. To better represent a random set of data, shuffle the rows.
n = numel(actid); % Number of data points rng default % For reproducibility idx = randsample(n,n); % Shuffle X = feat(idx,:); % Shuffled data actid = actid(idx); % Shuffled labels
Associate the activities with the labels in
activities = ["Sitting";"Standing";"Walking";"Running";"Dancing"]; activity = activities(actid);
Reduce Dimension of Data to Two
Obtain two-dimensional analogues of the data clusters using t-SNE. To save time on this relatively large data set, use the Barnes-Hut variant of the t-SNE algorithm.
rng default % For reproducibility Y = tsne(X,Algorithm="barneshut");
Display the result, colored with the correct labels.
figure numGroups = length(unique(actid)); clr = hsv(numGroups); gscatter(Y(:,1),Y(:,2),activity,clr)
t-SNE creates clusters of points based solely on their relative similarities. The clusters are not very well separated in this view.
To obtain better separation between data clusters, try setting the Perplexity parameter to 300.
rng default % for reproducibility Y = tsne(X,Algorithm="barneshut",Perplexity=300); figure gscatter(Y(:,1),Y(:,2),activity,clr)
With the current settings, most of the clusters look better separated and structured. The
sitting cluster comes in a few pieces, but these pieces are well-defined. The
standing cluster is in two nearly circular pieces with very little data (colors) mixed in from other clusters. The
walking cluster is one piece with a small admixture of colors from other activities. The
running data are not separated from each other, but are mainly separated from the other data. This lack of separation means running and dancing are not easily distinguishable; perhaps this result is not surprising.
Reduce Dimension of Data to Three
t-SNE can also reduce the data to three dimensions. Set the
'NumDimensions' argument to
rng default % for fair comparison Y3 = tsne(X,Algorithm="barneshut",Perplexity=300,NumDimensions=3); figure scatter3(Y3(:,1),Y3(:,2),Y3(:,3),15,clr(actid,:),'filled'); view(61,51)
The clusters seem pretty well separated, with the exception of running and dancing. By rotating the 3-D plot, you can see that running and dancing are more easily distinguished in 3-D than in 2-D.