ClassificationPartitionedEnsemble

Cross-validated classification ensemble

Description

ClassificationPartitionedEnsemble is a set of classification ensembles trained on cross-validated folds. You can estimate the quality of the classification by using one or more kfold functions: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun.

Every kfold function uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, when you use kfoldPredict with a k-fold cross-validated model, the software estimates a response for every observation using the model trained without that observation. For more information, see Partitioned Models.

Creation

You can create a ClassificationPartitionedEnsemble object in two ways:

Create a cross-validated model from a ClassificationEnsemble or ClassificationBaggedEnsemble model object by using the crossval object function.
Create a cross-validated classification model by using the fitcensemble or fitensemble function and specifying one of the name-value arguments CrossVal, CVPartition, Holdout, KFold, or Leaveout.

Properties

expand all

Cross-Validation Properties

`CrossValidatedModel` — Name of cross-validated model
Read-only: character vector

This property is read-only.

Name of the cross-validated model, returned as a character vector.

Data Types: char

`KFold` — Number of folds in ensemble
Read-only: positive integer

This property is read-only.

Number of folds in the cross-validated ensemble, returned as a positive integer.

Data Types: double

`ModelParameters` — Parameters of cross-validated ensemble
Read-only: object

This property is read-only.

Parameters of the cross-validated ensemble, returned as an object.

`NumTrainedPerFold` — Number of weak learners used to train each trained learner
Read-only: positive integer

This property is read-only.

Number of weak learners used to train each trained learner in Trained, returned as a positive integer.

Data Types: double

`Partition` — Partition used in cross-validation
Read-only: `cvpartition` object

This property is read-only.

Partition used in the cross-validation, returned as a cvpartition object.

`Trainable` — Trained learners (full)
Read-only: cell array of full ensembles

This property is read-only.

Trained learners, returned as a KFold-length cell array of full ensembles. Every ensemble is full, meaning it contains its training data and weights.

Data Types: cell

`Trained` — Trained learners (compact)
Read-only: cell array of compact ensembles

This property is read-only.

Trained learners, returned as a KFold-length cell array of compact ensembles.

Data Types: cell

Other Classification Properties

`BinEdges` — Bin edges for numeric predictors
Read-only: cell array of p numeric vectors

This property is read-only.

Bin edges for numeric predictors, returned as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the NumBins name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the NumBins value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for the numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Data Types: cell

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

This property is read-only.

Categorical predictor indices, returned as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`ClassNames` — Unique class labels
Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

This property is read-only.

Unique class labels used in training, returned as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order.

`Cost` — Misclassification costs
Read-only: square numeric matrix

This property is read-only.

Misclassification costs, returned as a square numeric matrix. Cost has K rows and columns, where K is the number of classes.

Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames.

Data Types: double

`NumObservations` — Number of observations in training data
Read-only: positive integer

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

This property is read-only.

Predictor names in order of their appearance in the predictor data X, returned as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X.

Data Types: cell

`Prior` — Prior probabilities for each class
Read-only: numeric vector

This property is read-only.

Prior probabilities for each class, returned as a K-element numeric vector, where K is the number of unique classes in the response. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Data Types: double

`ResponseName` — Name of response variable
Read-only: character vector

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

`ScoreTransform` — Score transformation function
function name | function handle

Score transformation function, specified as a character vector, string scalar, or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores.

To change the score transformation function to function, for example, use dot notation.

For a built-in function, enter a character vector or string scalar.

Mdl.ScoreTransform = "function";

This table lists the values for the available built-in functions.

Value	Description
`"doublelogit"`	1/(1 + e^–2x)
`"invlogit"`	log(x / (1 – x))
`"ismax"`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`"logit"`	1/(1 + e^–x)
`"none"` or `"identity"`	x (no transformation)
`"sign"`	–1 for x < 0 0 for x = 0 1 for x > 0
`"symmetric"`	2x – 1
`"symmetricismax"`	Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`"symmetriclogit"`	2/(1 + e^–x) – 1

For a MATLAB^® function or a function that you define, enter its function handle.
```
Mdl.ScoreTransform = @function;
```
function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Data Types: char | string | function_handle

`W` — Scaled weights in model
Read-only: numeric vector

This property is read-only.

Scaled weights in the model, returned as a numeric vector. W has length n, the number of rows in the training data.

Data Types: double

`X` — Predictor values
Read-only: real matrix | table

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

`Y` — Class labels
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

Class labels corresponding to the observations in X, returned as a categorical array, cell array of character vectors, character array, logical vector, or numeric vector. Each row of Y represents the classification of the corresponding row of X.

Object Functions

`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`kfoldEdge`	Classification edge for cross-validated classification model
`kfoldLoss`	Classification loss for cross-validated classification model
`kfoldMargin`	Classification margins for cross-validated classification model
`kfoldPredict`	Classify observations in cross-validated classification model
`kfoldfun`	Cross-validate function for classification
`resume`	Resume training of cross-validated classification ensemble model

Examples

collapse all

Evaluate Cross-Validation Error for Classification Ensemble

Open Live Script

Evaluate the 10-fold cross-validation error for a classification ensemble that models the Fisher iris data.

Load the sample data set.

load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2.

t = templateTree(MaxNumSplits=1); % Weak learner template tree object
ens = fitcensemble(meas,species,Method="AdaBoostM2",Learners=t);

Create a cross-validated ensemble from ens and find the 10-fold cross-validation error.

rng(10,"twister") % For reproducibility
cvens = crossval(ens);
L = kfoldLoss(cvens)

L = 
0.0533

Algorithms

expand all

Partitioned Models

You can create partitioned models by using k-fold cross-validation, holdout validation, leave-one-out cross-validation, or resubstitution.

k-fold cross-validation — The software divides the observations into KFold disjoint folds, each of which has approximately the same number of observations. The software trains KFold models (Trained), and each model is trained on KFold – 1 of the folds. When you use kfoldPredict, each model predicts the response values for the remaining fold.
Holdout validation — The software partitions the observations into a training set and a validation set. The software trains one model (Trained) using the training set. When you use kfoldPredict, the model predicts the response values for the validation set.
Leave-one-out cross-validation — The software creates NumObservations folds, where each observation is a fold. The software trains NumObservations models (Trained), and each model is trained on NumObservations – 1 of the folds. When you use kfoldPredict, each model predicts the response for the remaining fold (observation).
Resubstitution — The software does not partition the data. The software trains one model (Trained) on the entire data set. When you use kfoldPredict, the model predicts the response values for all observations.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

The object functions of a ClassificationPartitionedEnsemble model fully support GPU arrays.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

expand all

R2022a: `Cost` property stores the user-specified cost matrix

Starting in R2022a, the Cost property stores the user-specified cost matrix, so that you can compute the observed misclassification cost using the specified cost value. The software stores normalized prior probabilities (Prior) and observation weights (W) that do not reflect the penalties described in the cost matrix. To compute the observed misclassification cost, specify the LossFun name-value argument as "classifcost" when you call the kfoldLoss function.

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.

For training, the fitting function updates the specified prior probabilities by incorporating the penalties described in the specified cost matrix, and then normalizes the prior probabilities and observation weights. This behavior has not changed. In previous releases, the software stored the default cost matrix in the Cost property and stored the prior probabilities and observation weights used for training in the Prior and W properties, respectively. Starting in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized prior probabilities and observation weights that do not reflect the cost penalties. For more details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.

Some object functions use the Cost and W properties:

The kfoldLoss function uses the cost matrix stored in the Cost property if you specify the LossFun name-value argument as "classifcost" or "mincost".
The kfoldLoss and kfoldEdge functions use the observation weights stored in the W property.

If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.

If you want the software to handle the cost matrix, prior probabilities, and observation weights in the same way as in previous releases, adjust the prior probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a classification model, specify the adjusted prior probabilities and observation weights by using the Prior and Weights name-value arguments, respectively, and use the default cost matrix.

ClassificationPartitionedEnsemble

Description

Creation

Properties

Cross-Validation Properties

CrossValidatedModel — Name of cross-validated model Read-only: character vector

KFold — Number of folds in ensemble Read-only: positive integer

ModelParameters — Parameters of cross-validated ensemble Read-only: object

NumTrainedPerFold — Number of weak learners used to train each trained learner Read-only: positive integer

Partition — Partition used in cross-validation Read-only: cvpartition object

Trainable — Trained learners (full) Read-only: cell array of full ensembles

Trained — Trained learners (compact) Read-only: cell array of compact ensembles

Other Classification Properties

BinEdges — Bin edges for numeric predictors Read-only: cell array of p numeric vectors

CategoricalPredictors — Categorical predictor indices Read-only: vector of positive integers | []

ClassNames — Unique class labels Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

Cost — Misclassification costs Read-only: square numeric matrix

NumObservations — Number of observations in training data Read-only: positive integer

PredictorNames — Predictor names Read-only: cell array of character vectors

Prior — Prior probabilities for each class Read-only: numeric vector

ResponseName — Name of response variable Read-only: character vector

ScoreTransform — Score transformation function function name | function handle

W — Scaled weights in model Read-only: numeric vector

X — Predictor values Read-only: real matrix | table

Y — Class labels Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

Object Functions

Examples

Evaluate Cross-Validation Error for Classification Ensemble

Algorithms

Partitioned Models

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2022a: Cost property stores the user-specified cost matrix

See Also

`CrossValidatedModel` — Name of cross-validated model
Read-only: character vector

`KFold` — Number of folds in ensemble
Read-only: positive integer

`ModelParameters` — Parameters of cross-validated ensemble
Read-only: object

`NumTrainedPerFold` — Number of weak learners used to train each trained learner
Read-only: positive integer

`Partition` — Partition used in cross-validation
Read-only: `cvpartition` object

`Trainable` — Trained learners (full)
Read-only: cell array of full ensembles

`Trained` — Trained learners (compact)
Read-only: cell array of compact ensembles

`BinEdges` — Bin edges for numeric predictors
Read-only: cell array of p numeric vectors

`CategoricalPredictors` — Categorical predictor indices
Read-only: vector of positive integers | `[]`

`ClassNames` — Unique class labels
Read-only: categorical array | character array | logical vector | numeric vector | cell array of character vectors

`Cost` — Misclassification costs
Read-only: square numeric matrix

`NumObservations` — Number of observations in training data
Read-only: positive integer

`PredictorNames` — Predictor names
Read-only: cell array of character vectors

`Prior` — Prior probabilities for each class
Read-only: numeric vector

`ResponseName` — Name of response variable
Read-only: character vector

`ScoreTransform` — Score transformation function
function name | function handle

`W` — Scaled weights in model
Read-only: numeric vector

`X` — Predictor values
Read-only: real matrix | table

`Y` — Class labels
Read-only: categorical array | cell array of character vectors | character array | logical vector | numeric vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2022a: `Cost` property stores the user-specified cost matrix