Unsupervised Learning

What Is Unsupervised Learning?

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets without human intervention, in contrast to supervised learning where labels are provided along with the data.

The most common unsupervised learning method is cluster analysis, which applies clustering methods to explore data and find hidden patterns or groupings in data.

With MATLAB you can apply many popular clustering algorithms:

  • Hierarchical clustering: Builds a multilevel hierarchy of clusters by creating a cluster tree
  • k-Means and k-medoids clustering: Partitions data into k distinct clusters based on distance.
  • Gaussian mixture models: Models clusters as a mixture of multivariate normal density components
  • Density-based spatial clustering (DBSCAN): Groups points that are close to each other in areas of high density, keeping track of outliers in low-density regions
  • Self-organizing maps: Uses neural networks that learn the topology and distribution of the data
  • Spectral clustering: Graph-based clustering that can handle arbitrary non-convex shapes

Other methods that apply unsupervised learning include semi-supervised learning and unsupervised feature ranking. Semi-supervised learning reduces the need for labeled data in supervised learning. Clustering applied to the whole data set establishes similarity between labeled and unlabeled data, and labels are propagated to previously unlabeled and similar cluster members.

Unsupervised feature ranking assigns scores to features without a given prediction target or response. MATLAB® and Statistics and Machine Learning Toolbox™ support unsupervised ranking using Laplacian scores.

Key Points

  • Unsupervised learning is typically applied before supervised learning, to identify features in exploratory data analysis, and establish classes based on groupings.
  • k-means and hierarchical clustering remain popular. Only some clustering methods can handle arbitrary non-convex shapes including those supported in MATLAB: DBSCAN, hierarchical, and spectral clustering.
  • Unsupervised learning (clustering) can also be used to compress data.
  • Unsupervised feature ranking is available to apply distance-based clustering more efficiently to large data sets.

See also: Statistics and Machine Learning Toolbox, Machine Learning with MATLAB, Image Processing Toolbox