Supervised learning is a type of machine learning algorithm that uses a known dataset (called the training dataset) to make predictions. The training dataset includes input data and response values. From it, the supervised learning algorithm seeks to build a model that can make predictions of the response values for a new dataset. Using larger training datasets and optimizing model hyperparamamters can often increase the model’s predictive power and ensure that it can generalize well for new datasets. A test dataset is often used to validate the model.
Supervised learning includes two categories of algorithms:
- Classification: for categorical response values, where the data can be separated into specific “classes”
- Regression: for continuous-response values
Common classification algorithms include:
- Support vector machines (SVM)
- Neural networks
- Naïve Bayes classifier
- Decision trees
- Discriminant analysis
- Nearest neighbors (kNN)
Common regression algorithms include:
Supervised learning is used in financial applications for credit scoring, algorithmic trading, and bond classification; in biological applications for tumor detection and drug discovery; in energy applications for price and load forecasting; and in pattern recognition applications for speech and images.