# Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

# ClassificationDiscriminant class

Superclasses: CompactClassificationDiscriminant

Discriminant analysis classification

## Description

A `ClassificationDiscriminant` object encapsulates a discriminant analysis classifier, which is a Gaussian mixture model for data generation. A `ClassificationDiscriminant` object can predict responses for new data using the `predict` method. The object contains the data used for training, so can compute resubstitution predictions.

## Construction

`Mdl = fitcdiscr(Tbl,ResponseVarName)` returns a fitted discriminant analysis model based on the input variables (also known as predictors, features, or attributes) contained in the table `Tbl` and output (response or labels) contained in `ResponseVarName`.

`Mdl = fitcdiscr(Tbl,formula)` returns a fitted discriminant analysis model based on the predictor data and class labels in the table `Tbl`. `formula` is an explanatory model of the response and a subset of predictor variables in `Tbl` used to fit `Mdl`.

`Mdl = fitcdiscr(Tbl,Y)` returns a fitted discriminant analysis model based on the input variables contained in the table `Tbl` and response `Y`.

`Mdl = fitcdiscr(X,Y)` returns a discriminant analysis classifier based on the input variables `X` and response `Y`.

`Mdl = fitcdiscr(___,Name,Value)` fits a classifier with additional options specified by one or more name-value pair arguments, using any of the previous syntaxes. For example, you can specify the cost of misclassification, prior probabilities for each class, or observation weights.

### Input Arguments

expand all

Sample data used to train the model, specified as a table. Each row of `Tbl` corresponds to one observation, and each column corresponds to one predictor variable. Optionally, `Tbl` can contain one additional column for the response variable. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed.

If `Tbl` contains the response variable, and you want to use all remaining variables in `Tbl` as predictors, then specify the response variable using `ResponseVarName`.

If `Tbl` contains the response variable, and you want to use only a subset of the remaining variables in `Tbl` as predictors, then specify a formula using `formula`.

If `Tbl` does not contain the response variable, then specify a response variable using `Y`. The length of response variable and the number of rows of `Tbl` must be equal.

Data Types: `table`

Response variable name, specified as the name of a variable in `Tbl`.

You must specify `ResponseVarName` as a character vector. For example, if the response variable `Y` is stored as `Tbl.Y`, then specify it as `'Y'`. Otherwise, the software treats all columns of `Tbl`, including `Y`, as predictors when training the model.

The response variable must be a categorical or character array, logical or numeric vector, or cell array of character vectors. If `Y` is a character array, then each element must correspond to one row of the array.

It is good practice to specify the order of the classes using the `ClassNames` name-value pair argument.

Data Types: `char`

Explanatory model of the response and a subset of the predictor variables, specified as a character vector in the form of `'Y~X1+X2+X3'`. In this form, `Y` represents the response variable, and `X1`, `X2`, and `X3` represent the predictor variables. The variables must be variable names in `Tbl` (`Tbl.Properties.VariableNames`).

To specify a subset of variables in `Tbl` as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in `Tbl` that do not appear in `formula`.

Data Types: `char`

Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of `Y` represents the classification of the corresponding row of `X`.

The software considers `NaN`, `''` (empty character vector), and `<undefined>` values in `Y` to be missing values. Consequently, the software does not train using observations with a missing response.

Data Types: `single` | `double` | `logical` | `char` | `cell`

Predictor values, specified as a numeric matrix. Each column of `X` represents one variable, and each row represents one observation.

`fitcdiscr` considers `NaN` values in `X` as missing values. `fitcdiscr` does not use observations with missing values for `X` in the fit.

Data Types: `single` | `double`

## Properties

 `BetweenSigma` `p`-by-`p` matrix, the between-class covariance, where `p` is the number of predictors. `CategoricalPredictors` List of categorical predictors, which is always empty (`[]`) for SVM and discriminant analysis classifiers. `ClassNames` List of the elements in the training data `Y` with duplicates removed. `ClassNames` can be a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. `ClassNames` has the same data type as the data in the argument `Y`. `Coeffs` `k`-by-`k` structure of coefficient matrices, where `k` is the number of classes. `Coeffs(i,j)` contains coefficients of the linear or quadratic boundaries between classes `i` and `j`. Fields in `Coeffs(i,j)`: `DiscrimType``Class1` — `ClassNames``(i)``Class2` — `ClassNames``(j)``Const` — A scalar`Linear` — A vector with `p` components, where `p` is the number of columns in `X``Quadratic` — `p`-by-`p` matrix, exists for quadratic `DiscrimType` The equation of the boundary between class `i` and class `j` is `Const` + `Linear` * `x` + `x'` * `Quadratic` * `x` = `0`, where `x` is a column vector of length `p`. If `fitcdiscr` had the `FillCoeffs` name-value pair set to `'off'` when constructing the classifier, `Coeffs` is empty (`[]`). `Cost` Square matrix, where `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i` (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`. The number of rows and columns in `Cost` is the number of unique classes in the response. Change a `Cost` matrix using dot notation: ```obj.Cost = costMatrix```. `Delta` Value of the Delta threshold for a linear discriminant model, a nonnegative scalar. If a coefficient of `obj` has magnitude smaller than `Delta`, `obj` sets this coefficient to `0`, and so you can eliminate the corresponding predictor from the model. Set `Delta` to a higher value to eliminate more predictors. `Delta` must be `0` for quadratic discriminant models. Change `Delta` using dot notation: ```obj.Delta = newDelta```. `DeltaPredictor` Row vector of length equal to the number of predictors in `obj`. If `DeltaPredictor(i) < Delta` then coefficient `i` of the model is `0`. If `obj` is a quadratic discriminant model, all elements of `DeltaPredictor` are `0`. `DiscrimType` Character vector specifying the discriminant type. One of: `'linear'``'quadratic'``'diagLinear'``'diagQuadratic'``'pseudoLinear'``'pseudoQuadratic'` Change `DiscrimType` using dot notation: ```obj.DiscrimType = newDiscrimType```. You can change between linear types, or between quadratic types, but cannot change between linear and quadratic types. `Gamma` Value of the Gamma regularization parameter, a scalar from `0` to `1`. Change `Gamma` using dot notation: ```obj.Gamma = newGamma```. If you set `1` for linear discriminant, the discriminant sets its type to `'diagLinear'`.If you set a value between `MinGamma` and `1` for linear discriminant, the discriminant sets its type to `'linear'`.You cannot set values below the value of the `MinGamma` property.For quadratic discriminant, you can set either `0` (for `DiscrimType` `'quadratic'`) or `1` (for `DiscrimType` `'diagQuadratic'`). `HyperparameterOptimizationResults` Description of the cross-validation optimization of hyperparameters, stored as a `BayesianOptimization` object or a table of hyperparameters and associated values. Nonempty when the `OptimizeHyperparameters` name-value pair is nonempty at creation. Value depends on the setting of the `HyperparameterOptimizationOptions` name-value pair at creation: `'bayesopt'` (default) — Object of class `BayesianOptimization``'gridsearch'` or `'randomsearch'` — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) `LogDetSigma` Logarithm of the determinant of the within-class covariance matrix. The type of `LogDetSigma` depends on the discriminant type: Scalar for linear discriminant analysisVector of length `K` for quadratic discriminant analysis, where `K` is the number of classes `MinGamma` Nonnegative scalar, the minimal value of the Gamma parameter so that the correlation matrix is invertible. If the correlation matrix is not singular, `MinGamma` is `0`. `ModelParameters` Parameters used in training `obj`. `Mu` Class means, specified as a `K`-by-`p` matrix of scalar values class means of size. `K` is the number of classes, and `p` is the number of predictors. Each row of `Mu` represents the mean of the multivariate normal distribution of the corresponding class. The class indices are in the `ClassNames` attribute. `NumObservations` Number of observations in the training data, a numeric scalar. `NumObservations` can be less than the number of rows of input data `X` when there are missing values in `X` or response `Y`. `PredictorNames ` Cell array of names for the predictor variables, in the order in which they appear in the training data `X`. `Prior` Numeric vector of prior probabilities for each class. The order of the elements of `Prior` corresponds to the order of the classes in `ClassNames`. Add or change a `Prior` vector using dot notation: ```obj.Prior = priorVector```. `ResponseName` Character vector describing the response variable `Y`. `ScoreTransform` Function handle for transforming scores, or character vector representing a built-in transformation function. `'none'` means no transformation; equivalently, `'none'` means `@(x)x`. For a list of built-in transformation functions and the syntax of custom transformation functions, see `fitcdiscr`. Implement dot notation to add or change a `ScoreTransform` function using one of the following: `cobj.ScoreTransform = 'function'``cobj.ScoreTransform = @function` `Sigma` Within-class covariance matrix or matrices. The dimensions depend on `DiscrimType`: `'linear'` (default) — Matrix of size `p`-by-`p`, where `p` is the number of predictors`'quadratic'` — Array of size `p`-by-`p`-by-`K`, where `K` is the number of classes`'diagLinear'` — Row vector of length `p``'diagQuadratic'` — Array of size `1`-by-`p`-by-`K``'pseudoLinear'` — Matrix of size `p`-by-`p``'pseudoQuadratic'` — Array of size `p`-by-`p`-by-`K` `W` Scaled `weights`, a vector with length `n`, the number of rows in `X`. `X` Matrix of predictor values. Each column of `X` represents one predictor (variable), and each row represents one observation. `Xcentered` `X` data with class means subtracted. If `Y(i)` is of class `j`, `Xcentered(i,:)` = `X(i,:)` – `Mu(j,:)`, where `Mu` is the class mean property. `Y` A categorical array, cell array of character vectors, character array, logical vector, or a numeric vector with the same number of rows as `X`. Each row of `Y` represents the classification of the corresponding row of `X`.

## Methods

 compact Compact discriminant analysis classifier crossval Cross-validated discriminant analysis classifier cvshrink Cross-validate regularization of linear discriminant resubEdge Classification edge by resubstitution resubLoss Classification error by resubstitution resubMargin Classification margins by resubstitution resubPredict Predict resubstitution response of classifier

### Inherited Methods

 compareHoldout Compare accuracies of two classification models using new data edge Classification edge logP Log unconditional probability density for discriminant analysis classifier loss Classification error mahal Mahalanobis distance to class means margin Classification margins nLinearCoeffs Number of nonzero linear coefficients predict Predict labels using discriminant analysis classification model

## Definitions

### Discriminant Classification

The model for discriminant analysis is:

• Each class (`Y`) generates data (`X`) using a multivariate normal distribution. That is, the model assumes `X` has a Gaussian mixture distribution (`gmdistribution`).

• For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary.

• For quadratic discriminant analysis, both means and covariances of each class vary.

`predict` classifies so as to minimize the expected classification cost:

`$\stackrel{^}{y}=\underset{y=1,...,K}{\mathrm{arg}\mathrm{min}}\sum _{k=1}^{K}\stackrel{^}{P}\left(k|x\right)C\left(y|k\right),$`

where

• $\stackrel{^}{y}$ is the predicted classification.

• K is the number of classes.

• $\stackrel{^}{P}\left(k|x\right)$ is the posterior probability of class k for observation x.

• $C\left(y|k\right)$ is the cost of classifying an observation as y when its true class is k.

For details, see How the predict Method Classifies.

### Regularization

Regularization is the process of finding a small set of predictors that yield an effective predictive model. For linear discriminant analysis, there are two parameters, γ and δ, that control regularization as follows. `cvshrink` helps you select appropriate values of the parameters.

Let Σ represent the covariance matrix of the data X, and let $\stackrel{^}{X}$ be the centered data (the data X minus the mean by class). Define

`$D=\text{diag}\left({\stackrel{^}{X}}^{T}*\stackrel{^}{X}\right).$`

The regularized covariance matrix $\stackrel{˜}{\Sigma }$ is

`$\stackrel{˜}{\Sigma }=\left(1-\gamma \right)\Sigma +\gamma D.$`

Whenever γ ≥ `MinGamma`, $\stackrel{˜}{\Sigma }$ is nonsingular.

Let μk be the mean vector for those elements of X in class k, and let μ0 be the global mean vector (the mean of the rows of X). Let C be the correlation matrix of the data X, and let $\stackrel{˜}{C}$ be the regularized correlation matrix:

`$\stackrel{˜}{C}=\left(1-\gamma \right)C+\gamma I,$`

where I is the identity matrix.

The linear term in the regularized discriminant analysis classifier for a data point x is

`${\left(x-{\mu }_{0}\right)}^{T}{\stackrel{˜}{\Sigma }}^{-1}\left({\mu }_{k}-{\mu }_{0}\right)=\left[{\left(x-{\mu }_{0}\right)}^{T}{D}^{-1/2}\right]\left[{\stackrel{˜}{C}}^{-1}{D}^{-1/2}\left({\mu }_{k}-{\mu }_{0}\right)\right].$`

The parameter δ enters into this equation as a threshold on the final term in square brackets. Each component of the vector $\left[{\stackrel{˜}{C}}^{-1}{D}^{-1/2}\left({\mu }_{k}-{\mu }_{0}\right)\right]$ is set to zero if it is smaller in magnitude than the threshold δ. Therefore, for class k, if component j is thresholded to zero, component j of x does not enter into the evaluation of the posterior probability.

The `DeltaPredictor` property is a vector related to this threshold. When δ ≥ `DeltaPredictor(i)`, all classes k have

`$|{\stackrel{˜}{C}}^{-1}{D}^{-1/2}\left({\mu }_{k}-{\mu }_{0}\right)|\le \delta .$`

Therefore, when δ ≥ `DeltaPredictor(i)`, the regularized classifier does not use predictor `i`.

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB® documentation.

## Examples

expand all

```load fisheriris ```

Train a discriminant analysis model using the entire data set.

```Mdl = fitcdiscr(meas,species) ```
```Mdl = ClassificationDiscriminant ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DiscrimType: 'linear' Mu: [3×4 double] Coeffs: [3×3 struct] ```

`Mdl` is a `ClassificationDiscriminant` model. To access its properties, use dot notation. For example, display the group means for each predictor.

```Mdl.Mu ```
```ans = 5.0060 3.4280 1.4620 0.2460 5.9360 2.7700 4.2600 1.3260 6.5880 2.9740 5.5520 2.0260 ```

To predict lables for new observations, pass `Mdl` and predictor data to `predict`.

## References

[1] Guo, Y., T. Hastie, and R. Tibshirani. Regularized linear discriminant analysis and its application in microarrays. Biostatistics, Vol. 8, No. 1, pp. 86–100, 2007.