Accelerating the pace of engineering and science

# Statistics Toolbox

## Multivariate Statistics

Multivariate statistics provide algorithms and functions to analyze multiple variables. Typical applications include dimensionality reduction by feature transformation and feature selection, and exploring relationships between variables using visualization techniques, such as scatter plot matrices and classical multidimensional scaling.

Fitting an Orthogonal Regression Using Principal Component Analysis (Example)
Implement Deming regression (total least squares).

### Feature Transformation

Feature transformation (sometimes called feature extraction) is a dimensionality reduction technique that transforms existing features into new features (predictor variables) where less descriptive features can be dropped. The toolbox offers the following approaches for feature transformation:

Partial Least Squares Regression and Principal Component Regression (Example)
Model a response variable in the presence of highly correlated predictors.

### Feature Selection

Feature selection is a dimensionality reduction technique that selects only the subset of measured features (predictor variables) that provide the best predictive power in modeling the data. It is useful when you are dealing with high-dimensional data or when collecting data for all features is cost prohibitive.

Feature selection methods include:

• Stepwise regression sequentially adds or removes features until there is no improvement in prediction accuracy; it can be used with linear regression or generalized linear regression algorithms.
• Sequential feature selection is similar to stepwise regression and can be used with any supervised learning algorithm and a custom performance measure.
• Regularization (lasso and elastic nets) uses shrinkage estimators to remove redundant features by reducing their weights (coefficients) to zero.

Feature selection can be used to:

• Improve the accuracy of a machine learning algorithm
• Boost the performance on very high-dimensional data
• Improve model interpretability
• Prevent overfitting

Selecting Features for Classifying High-Dimensional Data (Example)
Select important features for cancer detection.

### Multivariate Visualization

Statistics Toolbox provides graphs and charts to explore multivariate data visually, including:

• Scatter plot matrices
• Dendograms
• Biplots
• Parallel coordinate charts
• Andrews plots
• Glyph plots
Group scatter plot matrix showing how model year impacts different variables.
Andrews plot showing how country of origin impacts the variables.