What Is Unsupervised Learning?

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets without human intervention, in contrast to supervised learning where labels are provided along with the data.

The most common unsupervised learning method is cluster analysis, which applies clustering methods to explore data and find hidden patterns or groupings in data.

With MATLAB you can apply many popular clustering algorithms:

Hierarchical clustering: Builds a multilevel hierarchy of clusters by creating a cluster tree
k-Means and k-medoids clustering: Partitions data into k distinct clusters based on distance.
Gaussian mixture models: Models clusters as a mixture of multivariate normal density components
Density-based spatial clustering (DBSCAN): Groups points that are close to each other in areas of high density, keeping track of outliers in low-density regions
Self-organizing maps: Uses neural networks that learn the topology and distribution of the data
Spectral clustering: Graph-based clustering that can handle arbitrary non-convex shapes

Other methods that apply unsupervised learning include semi-supervised learning and unsupervised feature ranking. Semi-supervised learning reduces the need for labeled data in supervised learning. Clustering applied to the whole data set establishes similarity between labeled and unlabeled data, and labels are propagated to previously unlabeled and similar cluster members.

Unsupervised feature ranking assigns scores to features without a given prediction target or response. MATLAB^® and Statistics and Machine Learning Toolbox™ support unsupervised ranking using Laplacian scores.

Key Points

Unsupervised learning is typically applied before supervised learning, to identify features in exploratory data analysis, and establish classes based on groupings.
k-means and hierarchical clustering remain popular. Only some clustering methods can handle arbitrary non-convex shapes including those supported in MATLAB: DBSCAN, hierarchical, and spectral clustering.
Unsupervised learning (clustering) can also be used to compress data.
Unsupervised feature ranking is available to apply distance-based clustering more efficiently to large data sets.

Examples and How To

Easy k-Means Clustering with MATLAB (1:50) - Video
Discover Gene Expression profiles using k-Means Clustering - Example
Color-Based Segmentation Using k-Means - Example
Guidance for Choosing the Appropriate Clustering Method - Documentation
Machine Learning with MATLAB Overview (3:02) - Video
What Is Statistics and Machine Learning Toolbox? (2:14) - Video
Anomaly Detection - Example

Software Reference

Overview of Cluster Analysis in MATLAB - Documentation
Choosing the Appropriate Clustering Method - Documentation
Hierarchical Clustering - Documentation
kmeans: Applying k-Means Clustering - Function
Applying DBSCAN Clustering - Function
fsulaplacian: Unsupervised Feature Ranking - Function
rica: Unsupervised Dimensionality Reduction - Function
Using Hidden Markov Models - Documentation

Basics of Unsupervised Learning

Machine Learning with MATLAB

Read ebook

Mastering Machine Learning: A Step-by-Step Guide with MATLAB

Read ebook