Documentation

This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.

ClassificationDiscriminant.fit

Class: ClassificationDiscriminant

Fit discriminant analysis classifier (to be removed)

ClassificationDiscriminant.fit will be removed in a future release. Use fitcdiscr instead.

Syntax

obj = ClassificationDiscriminant.fit(x,y)
obj = ClassificationDiscriminant.fit(x,y,Name,Value)

Description

obj = ClassificationDiscriminant.fit(x,y) returns a discriminant analysis classifier based on the input variables (also known as predictors, features, or attributes) x and output (response) y.

obj = ClassificationDiscriminant.fit(x,y,Name,Value) fits a classifier with additional options specified by one or more Name,Value pair arguments. If you use one of the following five options, obj is of class ClassificationPartitionedModel: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Otherwise, obj is of class ClassificationDiscriminant.

Input Arguments

expand all

Predictor values, specified as a matrix of numeric values. Each column of x represents one variable, and each row represents one observation.

ClassificationDiscriminant.fit considers NaN values in x as missing values. ClassificationDiscriminant.fit does not use observations with missing values for x in the fit.

Example:

Data Types: single | double

Classification values, specified as a numeric vector, categorical vector (nominal or ordinal), logical vector, character array, or cell array of character vectors. Each row of y represents the classification of the corresponding row of x.

ClassificationDiscriminant.fit considers NaN values in y to be missing values. ClassificationDiscriminant.fit does not use observations with missing values for y in the fit.

Data Types: single | double | logical | char | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

expand all

Class names, specified as the comma-separated pair consisting of 'ClassNames' and an array. Use the data type that exists in y. The default is the class names that exist in y. Use ClassNames to order the classes or to select a subset of classes for training.

Data Types: single | double | logical | char

Cost of misclassification, specified as the comma-separated pair consisting of 'Cost' and a square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. Alternatively, Cost can be a structure S having two fields: S.ClassNames containing the group names as a variable of the same type as y, and S.ClassificationCosts containing the cost matrix.

The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j.

Data Types: single | double | struct

Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'.

If you specify 'on', then the software implements 10-fold cross-validation.

To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only.

Alternatively, cross validate later by passing Mdl to crossval.

Example: 'CrossVal','on'

Cross-validated model partition, specified as the comma-separated pair consisting of 'CVPartition' and an object created using cvpartition.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Linear coefficient threshold, specified as the comma-separated pair consisting of 'Delta' and a nonnegative scalar value. If a coefficient of Mdl has magnitude smaller than Delta, Mdl sets this coefficient to 0, and you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors.

Delta must be 0 for quadratic discriminant models.

Data Types: single | double

Discriminant type, specified as the comma-separated pair consisting of 'DiscrimType' and a character vector in this table.

ValueDescriptionPredictor Covariance Treatment
'linear'Regularized linear discriminant analysis (LDA)
  • All classes have the same covariance matrix.

  • Σ^γ=(1γ)Σ^+γdiag(Σ^).

    Σ^ is the empirical, pooled covariance matrix and γ is the amount of regularization.

'diaglinear'LDAAll classes have the same, diagonal covariance matrix.
'pseudolinear'LDAAll classes have the same covariance matrix. The software inverts the covariance matrix using the pseudo inverse.
'quadratic'Quadratic discriminant analysis (QDA)The covariance matrices can vary among classes.
'diagquadratic'QDAThe covariance matrices are diagonal and can vary among classes.
'pseudoquadratic'QDAThe covariance matrices can vary among classes. The software inverts the covariance matrix using the pseudo inverse.

    Note:   To use regularization, you must specify 'linear'. To specify the amount of regularization, use the Gamma name-value pair argument.

Example: 'DiscrimType','quadratic'

Coeffs property flag, specified as the comma-separated pair consisting of 'FillCoeffs' and 'on' or 'off'. Setting the flag to 'on' populates the Coeffs property in the classifier object. This can be computationally intensive, especially when cross validating. The default is 'on', unless you specify a cross validation name-value pair, in which case the flag is set to 'off' by default.

Example: 'FillCoeffs','off'

Amount of regularization to apply when estimating the covariance matrix of the predictors, specified as the comma-separated pair consisting of 'Gamma' and a scalar value in the interval [0,1]. Gamma provides finer control over the covariance matrix structure than DiscrimType.

  • If you specify 0, then the software does not use regularization to adjust the covariance matrix. That is, the software estimates and uses the unrestricted, empirical covariance matrix.

    • For linear discriminant analysis, if the empirical covariance matrix is singular, then the software automatically applies the minimal regularization required to invert the covariance matrix. You can display the chosen regularization amount by entering Mdl.Gamma at the command line.

    • For quadratic discriminant analysis, if at least one class has an empirical covariance matrix that is singular, then the software throws an error.

  • If you specify a value in the interval (0,1), then you must implement linear discriminant analysis, otherwise the software throws an error. Consequently, the software sets DiscrimType to 'linear'.

  • If you specify 1, then the software uses maximum regularization for covariance matrix estimation. That is, the software restricts the covariance matrix to be diagonal. Alternatively, you can set DiscrimType to 'diagLinear' or 'diagQuadratic' for diagonal covariance matrices.

Example: 'Gamma',1

Data Types: single | double

Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software:

  1. Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data

  2. Stores the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Holdout',0.1

Data Types: double | single

Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software:

  1. Randomly partitions the data into k sets

  2. For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

  3. Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'KFold',8

Data Types: single | double

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations, where n is size(Mdl.X,1), the software:

  1. Reserves the observation as validation data, and trains the model using the other n – 1 observations

  2. Stores the n compact, trained models in the cells of a n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Example: 'Leaveout','on'

Data Types: char

Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of character vectors containing the names for the predictor variables, in the order in which they appear in X.

Data Types: cell

Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following.

  • 'empirical' determines class probabilities from class frequencies in y. If you pass observation weights, they are used to compute the class probabilities.

  • 'uniform' sets all class probabilities equal.

  • A vector containing one scalar value for each class.

  • A structure S with two fields:

    • S.ClassNames containing the class names as a variable of the same type as y

    • S.ClassProbs containing a vector of corresponding probabilities

Example: 'Prior','uniform'

Data Types: single | double | char | struct

Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector containing the name of the response variable y.

Example: 'ResponseName','Response'

Data Types: char

Flag to save covariance matrix, specified as the comma-separated pair consisting of 'SaveMemory' and either 'on' or 'off'. If you specify 'on', then fitcdiscr does not store the full covariance matrix, but instead stores enough information to compute the matrix. The predict method computes the full covariance matrix for prediction, and does not store the matrix. If you specify 'off', then fitcdiscr computes and stores the full covariance matrix in Mdl.

Specify SaveMemory as 'on' when the input matrix contains thousands of predictors.

Example: 'SaveMemory','on'

Score transform function, specified as the comma-separated pair consisting of 'ScoreTransform' and a function handle or value in this table.

ValueFormula
'doublelogit'1/(1 + e–2x)
'invlogit'log(x / (1–x))
'ismax'Set the score for the class with the largest score to 1, and scores for all other classes to 0.
'logit'1/(1 + ex)
'none' or 'identity'x (no transformation)
'sign'–1 for x < 0
0 for x = 0
1 for x > 0
'symmetric'2x – 1
'symmetriclogit'2/(1 + ex) – 1
'symmetricismax'Set the score for the class with the largest score to 1, and scores for all other classes to -1.

For a MATLAB® function, or a function that you define, enter its function handle.

Mdl.ScoreTransform = @function;

function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Example: 'ScoreTransform','logit'

Data Types: function_handle | char

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values. The length of Weights is the number of rows in X. fitcdiscr normalizes the weights to sum to 1.

Data Types: single | double

Output Arguments

expand all

Discriminant analysis classifier, returned as a classifier object.

Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method.

Otherwise, obj is of class ClassificationDiscriminant, and you can use the predict method to predict the response of new data.

Definitions

Discriminant Classification

The model for discriminant analysis is:

  • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution).

    • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary.

    • For quadratic discriminant analysis, both means and covariances of each class vary.

predict classifies so as to minimize the expected classification cost:

y^=argminy=1,...,Kk=1KP^(k|x)C(y|k),

where

  • y^ is the predicted classification.

  • K is the number of classes.

  • P^(k|x) is the posterior probability of class k for observation x.

  • C(y|k) is the cost of classifying an observation as y when its true class is k.

For details, see How the predict Method Classifies.

Examples

expand all

Load the sample data.

load fisheriris

Construct a discriminant analysis classifier using the sample data.

obj = ClassificationDiscriminant.fit(meas,species)
obj = 

  ClassificationDiscriminant
     PredictorNames: {'x1'  'x2'  'x3'  'x4'}
       ResponseName: 'Y'
         ClassNames: {'setosa'  'versicolor'  'virginica'}
     ScoreTransform: 'none'
    NumObservations: 150
        DiscrimType: 'linear'
                 Mu: [3x4 double]
             Coeffs: [3x3 struct]


  Properties, Methods

Alternatives

The classify function also performs discriminant analysis. classify is usually more awkward to use:

  • classify requires you to fit the classifier every time you make a new prediction.

  • classify does not perform cross validation.

  • classify requires you to fit the classifier when changing prior probabilities.

Was this topic helpful?