Chapter 3

Applying Unsupervised Learning

When to Consider Unsupervised Learning

Unsupervised learning is useful when you want to explore your data but don’t yet have a specific goal or are not sure what information the data contains. It’s also a good way to reduce the dimensions of your data.

Most unsupervised learning techniques are a form of cluster analysis, as we saw in Chapter 1.

In cluster analysis, data is partitioned into groups based on some measure of similarity or shared characteristic. Clusters are formed so that objects in the same cluster are very similar and objects in different clusters are very distinct.

Clustering algorithms fall into two broad groups:

  • Hard clustering, where each data point belongs to only one cluster.
  • Soft clustering, where each data point can belong to more than one cluster. You can use hard or soft clustering techniques if you already know the possible data groupings.
Graph of a cluster model using Gaussian technique

Gaussian mixture model used to separate data into two clusters.

If you don’t yet know how the data might be grouped:

  • Use self-organizing feature maps or hierarchical clustering to look for possible structures in the data.
  • Use cluster evaluation to look for the “best” number of groups for a given clustering algorithm.

Common Hard Clustering Algorithms

Common Soft Clustering Algorithms