ClassificationPartitionedModel
Cross-validated classification model
Description
ClassificationPartitionedModel is a set of classification
models trained on cross-validated folds. You can estimate the quality of the
classification by using one or more kfold functions: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun.
Every kfold function uses models trained on training-fold (in-fold)
observations to predict the response for validation-fold (out-of-fold) observations. For
example, when you use kfoldPredict with a
k-fold cross-validated model, the software estimates a response for
every observation using the model trained without that observation. For more
information, see Partitioned Models.
Creation
You can create a ClassificationPartitionedModel object in two ways:
Create a cross-validated model from a full classification model object by using the
crossvalobject function.Create a cross-validated model by using the function
fitcdiscr,fitcknn,fitcnb,fitcsvm, orfitctree, and specifying one of the name-value argumentsCrossVal,KFold,Holdout,Leaveout, orCVPartition.
Properties
Cross-Validation Properties
This property is read-only.
Name of the cross-validated model, returned as a character vector.
Data Types: char
This property is read-only.
Number of folds in the cross-validated model, returned as a positive integer.
Data Types: double
This property is read-only.
Parameters of the cross-validated model, returned as an object.
This property is read-only.
Partition used in the cross-validation, returned as a cvpartition object.
This property is read-only.
Trained learners, returned as a cell array of compact classification models. For more information, see Partitioned Models.
Data Types: cell
Other Classification Properties
This property is read-only.
Bin edges for numeric predictors, returned as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.
The software bins numeric predictors only if you specify the NumBins
name-value argument as a positive integer scalar when training a model with tree learners.
The BinEdges property is empty if the NumBins value
is empty (default).
You can reproduce the binned predictor data Xbinned by using the
BinEdges property of the trained model
mdl.
X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
idxNumeric = idxNumeric';
end
for j = idxNumeric
x = X(:,j);
% Convert x to array if x is a table.
if istable(x)
x = table2array(x);
end
% Group x into bins by using the discretize function.
xbinned = discretize(x,[-inf; edges{j}; inf]);
Xbinned(:,j) = xbinned;
endXbinned contains the bin indices, ranging from 1
to the number of bins, for the numeric predictors. Xbinned values are 0
for categorical predictors. If X contains NaNs, then
the corresponding Xbinned values are NaNs.Data Types: cell
This property is read-only.
Categorical predictor
indices, returned as a vector of positive integers. CategoricalPredictors
contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and p, where p is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty ([]).
Data Types: single | double
This property is read-only.
Unique class labels used in training, returned as a categorical or
character array, logical or numeric vector, or cell array of
character vectors. ClassNames has the same
data type as the class labels Y.
(The software treats string arrays as cell arrays of character
vectors.)
ClassNames also determines the class
order.
Data Types: categorical | char | logical | single | double | cell
Misclassification costs, specified as a square numeric matrix.
Cost has K rows and columns,
where K is the number of classes.
Cost(i,j) is the cost of classifying a point into
class j if its true class is i.
The order of the rows and columns of Cost
corresponds to the order of the classes in
ClassNames.
If the model is a cross-validated ClassificationDiscriminant,
ClassificationKNN, or ClassificationNaiveBayes
model, then you can change its cost matrix using dot notation. For
example, for a cross-validated model CVMdl and a cost
matrix costMatrix, you can
specify:
CVMdl.Cost = costMatrix;
Data Types: double
This property is read-only.
Number of observations in the training data, returned as a positive integer.
NumObservations can be less than the number of rows of input data
when there are missing values in the input data or response data.
Data Types: double
This property is read-only.
Predictor names in order of their appearance in the predictor data
X, returned as a cell array of
character vectors. The length of
PredictorNames is equal to the
number of columns in X.
Data Types: cell
Prior probabilities for each class, specified as a numeric vector. The
order of the elements of Prior corresponds to the
order of the classes in ClassNames.
If the model is a cross-validated ClassificationDiscriminant
or ClassificationNaiveBayes
model, then you can change its vector of priors using dot notation. For
example, for a cross-validated model CVMdl and a
vector of prior probabilities priorVector, you can
specify:
CVMdl.Prior = priorVector;
Data Types: double
This property is read-only.
Name of the response variable, returned as a character vector.
Data Types: char
Score transformation function, specified as a character vector, string scalar, or function
handle. ScoreTransform represents a built-in transformation function or a
function handle for transforming predicted classification scores.
To change the score transformation function to function, for example, use dot notation.
For a built-in function, enter a character vector or string scalar.
Mdl.ScoreTransform = "function";
This table lists the values for the available built-in functions.
Value Description "doublelogit"1/(1 + e–2x) "invlogit"log(x / (1 – x)) "ismax"Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 "logit"1/(1 + e–x) "none"or"identity"x (no transformation) "sign"–1 for x < 0
0 for x = 0
1 for x > 0"symmetric"2x – 1 "symmetricismax"Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 "symmetriclogit"2/(1 + e–x) – 1 For a MATLAB® function or a function that you define, enter its function handle.
Mdl.ScoreTransform = @function;
functionmust accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).
Data Types: char | string | function_handle
This property is read-only.
Scaled weights in the model, returned as a numeric vector. W has length n, the number of rows in the training data.
Data Types: double
This property is read-only.
Predictor values, returned as a real matrix or table. Each column of
X represents one variable (predictor), and each row represents
one observation.
Data Types: double | table
This property is read-only.
Class labels corresponding to the observations in X, returned as
a categorical array, cell array of character vectors, character array, logical vector,
or numeric vector. Each row of Y represents the classification of the
corresponding row of X.
Data Types: single | double | logical | char | string | cell | categorical
Object Functions
gather | Gather properties of Statistics and Machine Learning Toolbox object from GPU |
kfoldEdge | Classification edge for cross-validated classification model |
kfoldLoss | Classification loss for cross-validated classification model |
kfoldMargin | Classification margins for cross-validated classification model |
kfoldPredict | Classify observations in cross-validated classification model |
kfoldfun | Cross-validate function for classification |
Examples
Evaluate the 10-fold cross-validation error for a classification tree model.
Load Fisher's iris data set.
load fisheririsTrain a classification tree using default options.
Mdl = fitctree(meas,species);
Cross-validate the classification tree model.
CVMdl = crossval(Mdl);
Estimate the 10-fold cross-validation loss.
L = kfoldLoss(CVMdl)
L = 0.0533
Estimate positive class posterior probabilities for the test set of an SVM algorithm.
Load the ionosphere data set.
load ionosphereTrain an SVM classifier. Specify a 20% holdout sample. Standardize the predictors and specify the class order.
rng(1) % For reproducibility CVSVMModel = fitcsvm(X,Y,Holdout=0.2,Standardize=true, ... ClassNames={'b','g'});
CVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier.
Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as g.
ScoreCVSVMModel = fitSVMPosterior(CVSVMModel);
ScoreCVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier containing the optimal score transformation function estimated from the training data.
Estimate the out-of-sample positive class posterior probabilities. Display the results for the first 10 out-of-sample observations.
[~,OOSPostProbs] = kfoldPredict(ScoreCVSVMModel); indx = ~isnan(OOSPostProbs(:,2)); hoObs = find(indx); % Holdout observation numbers OOSPostProbs = [hoObs, OOSPostProbs(indx,2)]; table(OOSPostProbs(1:10,1),OOSPostProbs(1:10,2), ... VariableNames=["ObservationIndex","PosteriorProbability"])
ans=10×2 table
ObservationIndex PosteriorProbability
________________ ____________________
6 0.17378
7 0.89637
8 0.0076583
9 0.91602
16 0.026715
22 4.609e-06
23 0.9024
24 2.4135e-06
38 0.00042673
41 0.86427
Compute the loss and the predictions for a classification model, first partitioned using holdout validation and then partitioned using 3-fold cross-validation. Compare the two sets of losses and predictions.
Create a table from the fisheriris data set, which contains length and width measurements from the sepals and petals of three species of iris flowers. View the first eight observations.
fisheriris = readtable("fisheriris.csv");
head(fisheriris) SepalLength SepalWidth PetalLength PetalWidth Species
___________ __________ ___________ __________ __________
5.1 3.5 1.4 0.2 {'setosa'}
4.9 3 1.4 0.2 {'setosa'}
4.7 3.2 1.3 0.2 {'setosa'}
4.6 3.1 1.5 0.2 {'setosa'}
5 3.6 1.4 0.2 {'setosa'}
5.4 3.9 1.7 0.4 {'setosa'}
4.6 3.4 1.4 0.3 {'setosa'}
5 3.4 1.5 0.2 {'setosa'}
Partition the data using cvpartition. First, create a partition for holdout validation, using approximately 70% of the observations for the training data and 30% for the validation data. Then, create a partition for 3-fold cross-validation.
rng(0,"twister") % For reproducibility holdoutPartition = cvpartition(fisheriris.Species,Holdout=0.30); kfoldPartition = cvpartition(fisheriris.Species,KFold=3);
holdoutPartition and kfoldPartition are both stratified random partitions. You can use the training and test functions to find the indices for the observations in the training and validation sets, respectively.
Train a classification tree model using the fisheriris data. Specify Species as the response variable.
Mdl = fitctree(fisheriris,"Species");Create the partitioned classification models using crossval.
holdoutMdl = crossval(Mdl,CVPartition=holdoutPartition)
holdoutMdl =
ClassificationPartitionedModel
CrossValidatedModel: 'Tree'
PredictorNames: {'SepalLength' 'SepalWidth' 'PetalLength' 'PetalWidth'}
ResponseName: 'Species'
NumObservations: 150
KFold: 1
Partition: [1×1 cvpartition]
ClassNames: {'setosa' 'versicolor' 'virginica'}
ScoreTransform: 'none'
Properties, Methods
kfoldMdl = crossval(Mdl,CVPartition=kfoldPartition)
kfoldMdl =
ClassificationPartitionedModel
CrossValidatedModel: 'Tree'
PredictorNames: {'SepalLength' 'SepalWidth' 'PetalLength' 'PetalWidth'}
ResponseName: 'Species'
NumObservations: 150
KFold: 3
Partition: [1×1 cvpartition]
ClassNames: {'setosa' 'versicolor' 'virginica'}
ScoreTransform: 'none'
Properties, Methods
holdoutMdl and kfoldMdl are ClassificationPartitionedModel objects.
Compute the minimal expected misclassification cost for holdoutMdl and kfoldMdl using kfoldLoss. Because both models use the default cost matrix, this cost is the same as the classification error.
holdoutL = kfoldLoss(holdoutMdl)
holdoutL = 0.0889
kfoldL = kfoldLoss(kfoldMdl)
kfoldL = 0.0600
holdoutL is the error computed using the predictions for one validation set, while kfoldL is an average error computed using the predictions for three folds of validation data. Cross-validation metrics tend to be better indicators of a model's performance on unseen data.
Compute the validation data predictions for the two models using kfoldPredict.
[holdoutLabels,holdoutScores] = kfoldPredict(holdoutMdl); [kfoldLabels,kfoldScores] = kfoldPredict(kfoldMdl); holdoutClassNames = holdoutMdl.ClassNames; holdoutScores = array2table(holdoutScores,VariableNames=holdoutClassNames); kfoldClassNames = kfoldMdl.ClassNames; kfoldScores = array2table(kfoldScores,VariableNames=kfoldClassNames); predictions = table(holdoutLabels,kfoldLabels, ... holdoutScores,kfoldScores, ... VariableNames=["holdoutMdl Labels","kfoldMdl Labels", ... "holdoutMdl Scores","kfoldMdl Scores"])
predictions=150×4 table
holdoutMdl Labels kfoldMdl Labels holdoutMdl Scores kfoldMdl Scores
_________________ _______________ _________________________________ _________________________________
setosa versicolor virginica setosa versicolor virginica
______ __________ _________ ______ __________ _________
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} 1 0 0 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} 1 0 0 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
{'setosa'} {'setosa'} 1 0 0 1 0 0
{'setosa'} {'setosa'} 1 0 0 1 0 0
{'setosa'} {'setosa'} NaN NaN NaN 1 0 0
⋮
kfoldPredict returns NaN scores for the observations used to train holdoutMdl.Trained. For these observations, the function selects the class label with the highest frequency as the predicted label. In this case, because all classes have the same frequency, the function selects the first class (setosa) as the predicted label. The function uses the trained model to return predictions for the validation set observations. kfoldPredict returns each kfoldMdl prediction using the model in kfoldMdl.Trained that was trained without that observation.
To predict responses for unseen data, use the model trained on the entire data set (Mdl) and its predict function rather than a partitioned model such as holdoutMdl or kfoldMdl.
Tips
To estimate posterior probabilities of trained, cross-validated SVM classifiers, use
fitSVMPosterior.
Algorithms
You can create partitioned models by using k-fold cross-validation, holdout validation, leave-one-out cross-validation, or resubstitution.
k-fold cross-validation — The software divides the observations into
KFolddisjoint folds, each of which has approximately the same number of observations. The software trainsKFoldmodels (Trained), and each model is trained onKFold– 1 of the folds. When you usekfoldPredict, each model predicts the response values for the remaining fold.Holdout validation — The software partitions the observations into a training set and a validation set. The software trains one model (
Trained) using the training set. When you usekfoldPredict, the model predicts the response values for the validation set.Leave-one-out cross-validation — The software creates
NumObservationsfolds, where each observation is a fold. The software trainsNumObservationsmodels (Trained), and each model is trained onNumObservations– 1 of the folds. When you usekfoldPredict, each model predicts the response for the remaining fold (observation).Resubstitution — The software does not partition the data. The software trains one model (
Trained) on the entire data set. When you usekfoldPredict, the model predicts the response values for all observations.
Extended Capabilities
Usage notes and limitations:
ClassificationPartitionedModelcan be one of the following cross-validated model objects:The object functions of a
ClassificationPartitionedModelmodel fully support GPU arrays.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011aClassificationPartitionedModel fully supports GPU arrays for
ClassificationNeuralNetwork models.
fitcnet supports misclassification costs and prior probabilities for
neural network classifiers. Specify the Cost and
Prior name-value arguments when you create a model. Alternatively,
you can specify misclassification costs after training a model by using dot notation to
change the Cost property value of the
model.
Mdl.Cost = [0 2; 1 0];
Starting in R2022a, the Cost property of a cross-validated
SVM classification model stores the user-specified cost matrix, so that you can
compute the observed misclassification cost using the specified cost value. The
software stores normalized prior probabilities (Prior) and
observation weights (W) that do not reflect the penalties
described in the cost matrix. Other cross-validated models already had this
behavior. To compute the observed misclassification cost, specify the
LossFun name-value argument as
"classifcost" when you call the
kfoldLoss function.
Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.
For training an SVM model, the fitting function updates the specified prior
probabilities by incorporating the penalties described in the specified cost matrix,
and then normalizes the prior probabilities and observation weights. This behavior
has not changed. In previous releases, the software stored the default cost matrix
in the Cost property and stored the prior probabilities and
observation weights used for training in the Prior and
W properties, respectively. Starting in R2022a, the
software stores the user-specified cost matrix without modification, and stores
normalized prior probabilities and observation weights that do not reflect the cost
penalties. For more details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.
Some object functions use the Cost and W properties:
The
kfoldLossfunction uses the cost matrix stored in theCostproperty if you specify theLossFunname-value argument as"classifcost"or"mincost".The
kfoldLossandkfoldEdgefunctions use the observation weights stored in theWproperty.
If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.
If you want the software to handle the cost matrix, prior
probabilities, and observation weights in the same way as in previous releases, adjust the prior
probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a
classification model, specify the adjusted prior probabilities and observation weights by using
the Prior and Weights name-value arguments, respectively,
and use the default cost matrix.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)