kfoldMargin

Classification margins for cross-validated kernel classification model

Syntax

margin = kfoldMargin(CVMdl)

Description

margin = kfoldMargin(CVMdl) returns the classification margins obtained by the cross-validated, binary kernel model (ClassificationPartitionedKernel) CVMdl. For every fold, kfoldMargin computes the classification margins for validation-fold observations using a model trained on training-fold observations.

example

Examples

collapse all

Estimate k-Fold Cross-Validation Margins

Open Live Script

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled as either bad ('b') or good ('g').

load ionosphere

Cross-validate a binary kernel classification model using the data.

CVMdl = fitckernel(X,Y,'Crossval','on')

CVMdl = 
  ClassificationPartitionedKernel
    CrossValidatedModel: 'Kernel'
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1×1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'


  Properties, Methods

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'.

Estimate the classification margins for validation-fold observations.

m = kfoldMargin(CVMdl);
size(m)

ans = 1×2

   351     1

m is a 351-by-1 vector. m(j) is the classification margin for observation j.

Plot the k-fold margins using a box plot.

boxplot(m,'Labels','All Observations')
title('Distribution of Margins')

Figure contains an axes object. The axes object with title Distribution of Margins contains 7 objects of type line. One or more of the lines displays its values using only markers

Feature Selection Using k-Fold Margins

Open Live Script

Perform feature selection by comparing k-fold margins from multiple models. Based solely on this criterion, the classifier with the greatest margins is the best classifier.

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g').

load ionosphere

Randomly choose 10% of the predictor variables.

rng(1); % For reproducibility
p = size(X,2); % Number of predictors
idxPart = randsample(p,ceil(0.1*p));

Cross-validate two binary kernel classification models: one that uses all of the predictors, and one that uses 10% of the predictors.

CVMdl = fitckernel(X,Y,'CrossVal','on');
PCVMdl = fitckernel(X(:,idxPart),Y,'CrossVal','on');

CVMdl and PCVMdl are ClassificationPartitionedKernel models. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'.

Estimate the k-fold margins for each classifier.

fullMargins = kfoldMargin(CVMdl);
partMargins = kfoldMargin(PCVMdl);

Plot the distribution of the margin sets using box plots.

boxplot([fullMargins partMargins], ...
    'Labels',{'All Predictors','10% of the Predictors'});
title('Distribution of Margins')

Figure contains an axes object. The axes object with title Distribution of Margins contains 14 objects of type line. One or more of the lines displays its values using only markers

The quartiles of the PCVMdl margin distribution are situated higher than the quartiles of the CVMdl margin distribution, indicating that the PCVMdl model is the better classifier.

Input Arguments

collapse all

`CVMdl` — Cross-validated, binary kernel classification model
`ClassificationPartitionedKernel` model object

Cross-validated, binary kernel classification model, specified as a ClassificationPartitionedKernel model object. You can create a ClassificationPartitionedKernel model by using fitckernel and specifying any one of the cross-validation name-value pair arguments.

To obtain estimates, kfoldMargin applies the same data used to cross-validate the kernel classification model (X and Y).

Output Arguments

collapse all

`margin` — Classification margins
numeric vector

Classification margins, returned as a numeric vector. margin is an n-by-1 vector, where each row is the margin of the corresponding observation and n is the number of observations (size(CVMdl.Y,1)).

More About

collapse all

Classification Margin

The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class.

The software defines the classification margin for binary classification as

$m = 2 y f (x) .$

x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x).

If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

Classification Score

For kernel classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by

$f (x) = T (x) β + b .$

$T (\cdot)$ is a transformation of an observation for feature expansion.
β is the estimated column vector of coefficients.
b is the estimated scalar bias.

The raw classification score for classifying x into the negative class is −f(x). The software classifies observations into the class that yields a positive score.

If the kernel classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. (since R2025a)

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2018b

expand all

R2025a: Specify GPU arrays (requires Parallel Computing Toolbox)

kfoldMargin fully supports GPU arrays.

R2023b: Observations with missing predictor values are used in resubstitution and cross-validation computations

Starting in R2023b, the following classification model object functions use observations with missing predictor values as part of resubstitution ("resub") and cross-validation ("kfold") computations for classification edges, losses, margins, and predictions.

Model Type	Model Objects	Object Functions
Discriminant analysis classification model	`ClassificationDiscriminant`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
Discriminant analysis classification model	`ClassificationPartitionedModel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Ensemble of discriminant analysis learners for classification	`ClassificationEnsemble`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
	`ClassificationPartitionedEnsemble`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Gaussian kernel classification model	`ClassificationPartitionedKernel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Gaussian kernel classification model	`ClassificationPartitionedKernelECOC`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Linear classification model	`ClassificationPartitionedLinear`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Linear classification model	`ClassificationPartitionedLinearECOC`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Neural network classification model	`ClassificationNeuralNetwork`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
Neural network classification model	`ClassificationPartitionedModel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Support vector machine (SVM) classification model	`ClassificationSVM`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
Support vector machine (SVM) classification model	`ClassificationPartitionedModel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`

In previous releases, the software omitted observations with missing predictor values from the resubstitution and cross-validation computations.

kfoldMargin

Syntax

Description

Examples

Estimate k-Fold Cross-Validation Margins

Feature Selection Using k-Fold Margins

Input Arguments

CVMdl — Cross-validated, binary kernel classification model ClassificationPartitionedKernel model object

Output Arguments

margin — Classification margins numeric vector

More About

Classification Margin

Classification Score

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. (since R2025a)

Version History

R2025a: Specify GPU arrays (requires Parallel Computing Toolbox)

R2023b: Observations with missing predictor values are used in resubstitution and cross-validation computations

See Also

`CVMdl` — Cross-validated, binary kernel classification model
`ClassificationPartitionedKernel` model object

`margin` — Classification margins
numeric vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. (since R2025a)