ClassificationDiscriminant class

Superclasses: CompactClassificationDiscriminant

Discriminant analysis classification

Description

A ClassificationDiscriminant object encapsulates a discriminant analysis classifier, which is a Gaussian mixture model for data generation. A ClassificationDiscriminant object can predict responses for new data using the predict method. The object contains the data used for training, so can compute resubstitution predictions.

Construction

Mdl = fitcdiscr(Tbl,ResponseVarName) returns a fitted discriminant analysis model based on the input variables (also known as predictors, features, or attributes) contained in the table Tbl and output (response or labels) contained in ResponseVarName.

Mdl = fitcdiscr(Tbl,formula) returns a fitted discriminant analysis model based on the predictor data and class labels in the table Tbl. formula is an explanatory model of the response and a subset of predictor variables in Tbl used to fit Mdl.

Mdl = fitcdiscr(Tbl,Y) returns a fitted discriminant analysis model based on the input variables contained in the table Tbl and response Y.

Mdl = fitcdiscr(X,Y) returns a discriminant analysis classifier based on the input variables X and response Y.

Mdl = fitcdiscr(___,Name,Value) fits a classifier with additional options specified by one or more name-value pair arguments, using any of the previous syntaxes. For example, you can specify the cost of misclassification, prior probabilities for each class, or observation weights.

Input Arguments

Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed.

If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable using ResponseVarName.

If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula using formula.

If Tbl does not contain the response variable, then specify a response variable using Y. The length of response variable and the number of rows of Tbl must be equal.

Data Types: table

Response variable name, specified as the name of a variable in Tbl.

You must specify ResponseVarName as a character vector. For example, if the response variable Y is stored as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model.

The response variable must be a categorical or character array, logical or numeric vector, or cell array of character vectors. If Y is a character array, then each element must correspond to one row of the array.

It is good practice to specify the order of the classes using the ClassNames name-value pair argument.

Data Types: char

Explanatory model of the response and a subset of the predictor variables, specified as a character vector in the form of 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. The variables must be variable names in Tbl (Tbl.Properties.VariableNames).

To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula.

Data Types: char

Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X.

The software considers NaN, '' (empty character vector), and <undefined> values in Y to be missing values. Consequently, the software does not train using observations with a missing response.

Data Types: single | double | logical | char | cell

Predictor values, specified as a numeric matrix. Each column of X represents one variable, and each row represents one observation.

fitcdiscr considers NaN values in X as missing values. fitcdiscr does not use observations with missing values for X in the fit.

Data Types: single | double

Methods

 compact Compact discriminant analysis classifier crossval Cross-validated discriminant analysis classifier cvshrink Cross-validate regularization of linear discriminant resubEdge Classification edge by resubstitution resubLoss Classification error by resubstitution resubMargin Classification margins by resubstitution resubPredict Predict resubstitution response of classifier

Inherited Methods

 compareHoldout Compare accuracies of two classification models using new data edge Classification edge logP Log unconditional probability density for discriminant analysis classifier loss Classification error mahal Mahalanobis distance to class means margin Classification margins nLinearCoeffs Number of nonzero linear coefficients predict Predict labels using discriminant analysis classification model

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples

Train a discriminant analysis model using the entire data set.

Mdl = fitcdiscr(meas,species)
Mdl =

ClassificationDiscriminant
ResponseName: 'Y'
CategoricalPredictors: []
ClassNames: {'setosa'  'versicolor'  'virginica'}
ScoreTransform: 'none'
NumObservations: 150
DiscrimType: 'linear'
Mu: [3x4 double]
Coeffs: [3x3 struct]

Mdl is a ClassificationDiscriminant model. To access its properties, use dot notation. For example, display the group means for each predictor.

Mdl.Mu
ans =

5.0060    3.4280    1.4620    0.2460
5.9360    2.7700    4.2600    1.3260
6.5880    2.9740    5.5520    2.0260

To predict lables for new observations, pass Mdl and predictor data to predict.

References

