ClassificationDiscriminant.fit

Class: ClassificationDiscriminant

Fit discriminant analysis classifier (to be removed)

ClassificationDiscriminant.fit will be removed in a future release. Use fitcdiscr instead.

Syntax

obj = ClassificationDiscriminant.fit(x,y)
obj = ClassificationDiscriminant.fit(x,y,Name,Value)

Description

obj = ClassificationDiscriminant.fit(x,y) returns a discriminant analysis classifier based on the input variables (also known as predictors, features, or attributes) x and output (response) y.

obj = ClassificationDiscriminant.fit(x,y,Name,Value) fits a classifier with additional options specified by one or more Name,Value pair arguments. If you use one of the following five options, obj is of class ClassificationPartitionedModel: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Otherwise, obj is of class ClassificationDiscriminant.

Input Arguments

expand all

x — Predictor valuesmatrix of numeric values

Predictor values, specified as a matrix of numeric values. Each column of x represents one variable, and each row represents one observation.

ClassificationDiscriminant.fit considers NaN values in x as missing values. ClassificationDiscriminant.fit does not use observations with missing values for x in the fit.

Example:

Data Types: single | double

y — Classification valuesnumeric vector | categorical vector | logical vector | character array | cell array of strings

Classification values, specified as a numeric vector, categorical vector (nominal or ordinal), logical vector, character array, or cell array of strings. Each row of y represents the classification of the corresponding row of x.

ClassificationDiscriminant.fit considers NaN values in y to be missing values. ClassificationDiscriminant.fit does not use observations with missing values for y in the fit.

Data Types: single | double | logical | char | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'ClassNames' — Class namesarray

Class names, specified as the comma-separated pair consisting of 'ClassNames' and an array. Use the data type that exists in y. The default is the class names that exist in y. Use ClassNames to order the classes or to select a subset of classes for training.

Data Types: single | double | logical | char

'Cost' — Cost of misclassificationsquare matrix | structure

Cost of misclassification, specified as the comma-separated pair consisting of 'Cost' and a square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. Alternatively, Cost can be a structure S having two fields: S.ClassNames containing the group names as a variable of the same type as y, and S.ClassificationCosts containing the cost matrix.

The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j.

Data Types: single | double | struct

'CrossVal' — Flag to train cross-validated classifier'off' (default) | 'on'

Flag to train a cross-validated classifier, specified as the comma-separated pair consisting of 'CrossVal' and either 'on' or 'off'.

If you specify 'on', then ClassificationDiscriminant.fit creates a cross-validated classifier with 10 folds.

You can override this cross-validation setting using one of the 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' name-value pair arguments.

You can only use one of these four options at a time to create a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

Alternatively, cross validate obj later using the crossval method.

Example: 'CrossVal','on'

'CVPartition' — Cross-validated model partitioncvpartition object

Cross-validated model partition, specified as the comma-separated pair consisting of 'CVPartition' and an object created using cvpartition. You can only use one option at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

'Delta' — Linear coefficient threshold0 (default) | nonnegative scalar value

Linear coefficient threshold, specified as the comma-separated pair consisting of 'Delta' and a nonnegative scalar value. If a coefficient of obj has magnitude smaller than Delta, obj sets this coefficient to 0, and you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors.

Delta must be 0 for quadratic discriminant models.

Data Types: single | double

'DiscrimType' — Discriminant type'linear' (default) | 'quadratic' | 'diagLinear' | 'diagQuadratic' | 'pseudoLinear' | 'pseudoQuadratic'

Discriminant type, specified as the comma-separated pair consisting of 'DiscrimType' and one of the following:

  • 'linear'

  • 'quadratic'

  • 'diagLinear'

  • 'diagQuadratic'

  • 'pseudoLinear'

  • 'pseudoQuadratic'

Example: 'DiscrimType','quadratic'

'FillCoeffs'Coeffs property flag'on' | 'off'

Coeffs property flag, specified as the comma-separated pair consisting of 'FillCoeffs' and 'on' or 'off'. Setting the flag to 'on' populates the Coeffs property in the classifier object. This can be computationally intensive, especially when cross validating. The default is 'on', unless you specify a cross validation name-value pair, in which case the flag is set to 'off' by default.

Example: 'FillCoeffs','off'

'Gamma' — Regularization parameterscalar value in the range [0,1]

Parameter for regularizing the correlation matrix of predictors, specified as the comma-separated pair consisting of 'Gamma' and a scalar value in the range [0,1].

  • Linear discriminant — Scalar value in the range [0,1].

    • If you pass a value strictly between 0 and 1, fitcdiscr sets the discriminant type to 'Linear'.

    • If you pass 0 for Gamma and 'Linear' for DiscrimType, and if the correlation matrix is singular, fitcdiscr sets Gamma to the minimal value required for inverting the covariance matrix.

    • If you set Gamma to 1, fitcdiscr sets the discriminant type to 'DiagLinear'.

  • Quadratic discriminant — Either 0 or 1.

    • If you pass 0 for Gamma and 'Quadratic' for DiscrimType, and if one of the classes has a singular covariance matrix, fitcdiscr errors.

    • If you set Gamma to 1, fitcdiscr sets the discriminant type to 'DiagQuadratic'.

    • If you set Gamma to a value between 0 and 1 for a quadratic discriminant, fitcdiscr errors.

Example: 'Gamma',1

Data Types: single | double

'Holdout' — Fraction of data for holdout validationscalar value in the range (0,1)

Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software:

  1. Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data

  2. Stores the compact, trained model in CVMdl.Trained

If you specify Holdout, then you cannot specify any of CVPartition, KFold, or Leaveout.

Example: 'Holdout',0.1

Data Types: double | single

'KFold' — Number of folds10 (default) | positive integer value

Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value.

You can only use one of these four options at a time to create a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

Example: 'KFold',8

Data Types: single | double

'Leaveout' — Leave-one-out cross-validation flag'off' (default) | 'on'

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and either 'on' or 'off'. If you specify 'on', then the software implements leave-one-out cross validation.

If you use 'Leaveout', you cannot use these 'CVPartition', 'Holdout', or 'KFold' name-value pair arguments.

Example: 'Leaveout','on'

Data Types: char

'PredictorNames' — Predictor variable names{'x1','x2',...} (default) | cell array of strings

Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of strings containing the names for the predictor variables, in the order in which they appear in x.

Data Types: cell

'Prior' — Prior probabilities'empirical' (default) | 'uniform' | vector of scalar values | structure

Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following.

  • A string:

    • 'empirical' determines class probabilities from class frequencies in y. If you pass observation weights, they are used to compute the class probabilities.

    • 'uniform' sets all class probabilities equal.

  • A vector containing one scalar value for each class.

  • A structure S with two fields:

    • S.ClassNames containing the class names as a variable of the same type as y

    • S.ClassProbs containing a vector of corresponding probabilities

Example: 'Prior','uniform'

Data Types: single | double | struct

'ResponseName' — Response variable name'Y' (default) | string

Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a string containing the name of the response variable y.

Example: 'ResponseName','Response'

Data Types: char

'SaveMemory' — Flag to save covariance matrix'off' (default) | 'on'

Flag to save covariance matrix, specified as the comma-separated pair consisting of 'SaveMemory' and either 'on' or 'off'. If you specify 'on', then fitcdiscr does not store the full covariance matrix, but instead stores enough information to compute the matrix. The predict method computes the full covariance matrix for prediction, and does not store the matrix. If you specify 'off', then fitcdiscr computes and stores the full covariance matrix in obj.

Specify SaveMemory as 'on' when the input matrix contains thousands of predictors.

Example: 'SaveMemory','on'

'ScoreTransform' — Score transform function'none' (default) | valid score transform string | function handle

Score transform function, specified as the comma-separated pair consisting of 'ScoreTransform' and one of the following.

StringFormula
'doublelogit'1/(1 + e–2x)
'invlogit'log(x / (1–x))
'ismax'Set the score for the class with the largest score to 1, and scores for all other classes to 0.
'logit'1/(1 + ex)
'none'x (no transformation)
'sign'–1 for x < 0
0 for x = 0
1 for x > 0
'symmetric'2x – 1
'symmetriclogit'2/(1 + ex) – 1
'symmetricismax'Set the score for the class with the largest score to 1, and scores for all other classes to -1.

Alternatively, you can use your own function handle for transforming scores. Your function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Example: 'ScoreTransform','logit'

Data Types: function_handle

'Weights' — Observation weightsones(size(X,1),1) (default) | vector of scalar values

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values. The length of Weights is the number of rows in x. fitcdiscr normalizes the weights to sum to 1.

Data Types: single | double

Output Arguments

expand all

obj — Discriminant analysis classifierclassifier object

Discriminant analysis classifier, returned as a classifier object.

Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method.

Otherwise, obj is of class ClassificationDiscriminant, and you can use the predict method to predict the response of new data.

Definitions

Discriminant Classification

The model for discriminant analysis is:

  • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution).

    • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary.

    • For quadratic discriminant analysis, both means and covariances of each class vary.

predict classifies so as to minimize the expected classification cost:

y^=argminy=1,...,Kk=1KP^(k|x)C(y|k),

where

  • y^ is the predicted classification.

  • K is the number of classes.

  • P^(k|x) is the posterior probability of class k for observation x.

  • C(y|k) is the cost of classifying an observation as y when its true class is k.

For details, see How the predict Method Classifies.

Examples

expand all

Construct a Discriminant Analysis Classifier

Load the sample data.

load fisheriris

Construct a discriminant analysis classifier using the sample data.

obj = ClassificationDiscriminant.fit(meas,species)
obj = 

  ClassificationDiscriminant
     PredictorNames: {'x1'  'x2'  'x3'  'x4'}
       ResponseName: 'Y'
         ClassNames: {'setosa'  'versicolor'  'virginica'}
     ScoreTransform: 'none'
    NumObservations: 150
        DiscrimType: 'linear'
                 Mu: [3x4 double]
             Coeffs: [3x3 struct]


  Properties, Methods

Alternatives

The classify function also performs discriminant analysis. classify is usually more awkward to use:

  • classify requires you to fit the classifier every time you make a new prediction.

  • classify does not perform cross validation.

  • classify requires you to fit the classifier when changing prior probabilities.

Was this topic helpful?