Accelerating the pace of engineering and science

fitPosterior

Class: ClassificationSVM

Fit posterior probabilities

Syntax

• ScoreSVMModel = fitPosterior(SVMModel) example
• [ScoreSVMModel,ScoreTransform] = fitPosterior(SVMModel) example
• [ScoreSVMModel,ScoreTransform] = fitPosterior(SVMModel,Name,Value) example

Description

example

ScoreSVMModel = fitPosterior(SVMModel) returns a trained support vector machine (SVM) classifier ScoreSVMModel containing the optimal score-to-posterior-probability transformation function for two-class learning.

The software fits the appropriate score-to-posterior-probability transformation function using the SVM classifier SVMModel, and by conducting 10-fold cross validation using the stored predictor data (SVMModel.X) and the class labels (SVMModel.Y) as outlined in [1]. The transformation function computes the posterior probability that an observation is classified into the positive class (SVMModel.Classnames(2)).

• If the classes are inseparable, then the transformation function is the sigmoid function.

• If the classes are perfectly separable, then the transformation function is the step function.

• In two-class learning, if one of the two classes has a relative frequency of 0, then the transformation function is the constant function. fitPosterior is not appropriate for one-class learning.

• The software stores the optimal score transformation function in ScoreSVMModel.ScoreTransform.

example

[ScoreSVMModel,ScoreTransform] = fitPosterior(SVMModel) additionally returns the optimal score-to-posterior-probability transformation function parameters (ScoreTransform).

example

[ScoreSVMModel,ScoreTransform] = fitPosterior(SVMModel,Name,Value) returns the optimal score-to-posterior-probability transformation function and its parameters with additional options specified by one or more Name,Value pair arguments.

Tips

Here is one way to predict positive class posterior probabilities.

1. Train an SVM classifier by passing the data to fitcsvm. The result is a trained SVM classifier, such as, SVMModel, that stores the data. The software sets the score transformation function property (SVMModel.ScoreTransformation) to none.

2. Pass the trained SVM classifier SVMModel to fitSVMPosterior or fitPosterior. The result, for example, ScoreSVMModel, is the same, trained SVM classifier as SVMModel, except the software sets ScoreSVMModel.ScoreTransformation to the optimal score transformation function.

If you skip step 2, then predict returns the positive class score rather than the positive class posterior probability.

3. Pass the trained SVM classifier containing the optimal score transformation function (ScoreSVMModel) and predictor data matrix to predict. The second column of the second output argument stores the positive class posterior probabilities corresponding to each row of the predictor data matrix.

Input Arguments

expand all

SVMModel — Trained SVM classifierClassificationSVM classifier

Trained SVM classifier, specified as a ClassificationSVM.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'CVPartition' — Cross-validation partition[] (default) | cvpartition partition

Cross-validation partition used to compute the transformation function, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition as created by cvpartition. You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

crossval splits the data into subsets using cvpartition.

'Holdout' — Fraction of data for holdout validationscalar value in the range (0,1)

Fraction of data for holdout validation used to compute the transformation function, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). Holdout validation tests the specified fraction of the data, and uses the remaining data for training.

You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

Example: 'Holdout',0.1

Data Types: double | single

'KFold' — Number of folds10 (default) | positive integer value

Number of folds to use when computing the transformation function, specified as the comma-separated pair consisting of 'KFold' and a positive integer value.

You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

Example: 'KFold',8

Data Types: single | double

'Leaveout' — Leave-one-out cross-validation flag'off' (default) | 'on'

Leave-one-out cross-validation flag indicating whether to use leave-one-out cross validation to compute the transformation function, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Use leave-one-out cross validation by using 'on'.

You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

Example: 'Leaveout','on'

Output Arguments

expand all

ScoreSVMModel — Trained SVM classifierClassificationSVM classifier

Trained SVM classifier containing the estimated score-to-posterior-probability transformation function, returned as a ClassificationSVM classifier.

To estimate posterior probabilities for the training set observations, pass ScoreSVMModel to resubPredict.

To estimate posterior probabilities for new observations, then pass them and ScoreSVMModel to predict. If you set 'Standardize',true in fitcsvm to train SVMModel, then predict standardizes the columns of X using the corresponding means in SVMModel.Mu and standard deviations in SVMModel.Sigma.

ScoreTransform — Optimal score transformation function parametersstructure array

Optimal score-to-posterior-probability transformation function parameters, returned as a structure array.

• If field Type is sigmoid, then ScoreTransform has the following other fields:

• Slope: The value of A in the sigmoid function

• Intercept: The value of B in the sigmoid function

• If field Type is step, then ScoreTransform has the following other fields:

• PositiveClassProbability: The value of π in the step function. It represents the probability that an observation is in the positive class. Also, the posterior probability that an observation is in the positive class given that its score is in the interval (LowerBound,UpperBound).

• LowerBound: The value $\underset{{y}_{n}=-1}{\mathrm{max}}{s}_{n}$ in the step function. It represents the lower bound of the score interval that assigns observations with scores in the interval the posterior probability of being in the positive class PositiveClassProbability. Any observation with a score less than LowerBound has the posterior probability of being the positive class 0.

• UpperBound: The value $\underset{{y}_{n}=+1}{\mathrm{min}}{s}_{n}$ in the step function. It represents the upper bound of the score interval that assigns observations with scores in the interval the posterior probability of being in the positive class PositiveClassProbability. Any observation with a score greater than UpperBound has the posterior probability of being the positive class 1.

• If field Type is constant, then ScoreTransform.PredictedClass contains the name of the class prediction.

This result is the same as SVMModel.ClassNames. The posterior probability of an observation being in ScoreTransform.PredictedClass is always 1.

Definitions

Sigmoid Function

The sigmoid function that maps score sj corresponding to observation j to the positive class posterior probability is

$P\left({s}_{j}\right)=\frac{1}{1+\mathrm{exp}\left(A{s}_{j}+B\right)}.$

If the output argument ScoreTransform.Type is sigmoid, then parameters A and B correspond to the fields Scale and Intercept of ScoreTransform, respectively.

Step Function

The step function that maps score sj corresponding to observation j to the positive class posterior probability is

$P\left({s}_{j}\right)=\left\{\begin{array}{l}\begin{array}{cc}0;& s<\underset{{y}_{k}=-1}{\mathrm{max}}{s}_{k}\end{array}\\ \begin{array}{cc}\pi ;& \underset{{y}_{k}=-1}{\mathrm{max}}{s}_{k}\le {s}_{j}\le \underset{{y}_{k}=+1}{\mathrm{min}}{s}_{k}\end{array}\\ \begin{array}{cc}1;& {s}_{j}>\underset{{y}_{k}=+1}{\mathrm{min}}{s}_{k}\end{array}\end{array},$

where:

• sj the score of observation j.

• +1 and –1 denote the positive and negative classes, respectively.

• π is the prior probability that an observation is in the positive class.

If the output argument ScoreTransform.Type is step, then the quantities $\underset{{y}_{k}=-1}{\mathrm{max}}{s}_{k}$ and $\underset{{y}_{k}=+1}{\mathrm{min}}{s}_{k}$correspond to the fields LowerBound and UpperBound of ScoreTransform, respectively.

Constant Function

The constant function maps all scores in a sample to posterior probabilities 1 or 0.

If all observations have posterior probability 1, then they are expected to come from the positive class.

If all observations have posterior probability 0, then they are not expected to come from the positive class.

Examples

expand all

Estimate In-Sample Posterior Probabilities of SVM Classifiers

load ionosphere


Train an SVM classifier. It is good practice to specify the class order and standardize the data.

SVMModel = fitcsvm(X,Y,'ClassNames',{'b','g'},'Standardize',true);


SVMModel is a ClassificationSVM classifier. The positive class is 'g'.

Fit the optimal score-to-posterior-probability transformation function.

rng(1); % For reproducibility
ScoreSVMModel = fitPosterior(SVMModel)

ScoreSVMModel =

ClassificationSVM
PredictorNames: {1x34 cell}
ResponseName: 'Y'
ClassNames: {'b'  'g'}
ScoreTransform: '@(S)sigmoid(S,-9.481559e-01,-1.218511e-01)'
NumObservations: 351
Alpha: [89x1 double]
Bias: -0.1341
KernelParameters: [1x1 struct]
Mu: [1x34 double]
Sigma: [1x34 double]
BoxConstraints: [351x1 double]
ConvergenceInfo: [1x1 struct]
IsSupportVector: [351x1 logical]
Solver: 'SMO'



Since the classes are inseparable, the score transformation function (ScoreSVMModel.ScoreTransform) is the sigmoid function.

Estimate scores and positive class posterior probabilities for the training data. Display the results for the first 10 observations.

[label,scores] = resubPredict(SVMModel);
[~,postProbs] = resubPredict(ScoreSVMModel);
table(Y(1:10),label(1:10),scores(1:10,2),postProbs(1:10,2),'VariableNames',...
{'TrueLabel','PredictedLabel','Score','PosteriorProbability'})

ans =

TrueLabel    PredictedLabel     Score     PosteriorProbability
_________    ______________    _______    ____________________

'g'          'g'                1.4862      0.82216
'b'          'b'               -1.0004      0.30434
'g'          'g'                1.8686      0.86916
'b'          'b'               -2.6462     0.084158
'g'          'g'                1.2808      0.79187
'b'          'b'               -1.4617      0.22028
'g'          'g'                2.1674      0.89816
'b'          'b'               -5.7087    0.0050121
'g'          'g'                2.4798      0.92224
'b'          'b'               -2.7809     0.074823



Plot Posterior Probability Contours for Multiple Classes Using SVM

This example steps through the process of one-versus-all (OVA) classification to train a multiclass SVM classifier, and then plots probability contours for each class. To implement OVA directly, see fitcecoc.

Load Fisher's iris data set. Use the petal lengths and widths.

load fisheriris
X = meas(:,3:4);
Y = species;


Examine a scatter plot of the data.

figure
gscatter(X(:,1),X(:,2),Y);
title('{\bf Scatter Diagram of Iris Measurements}');
xlabel('Petal length');
ylabel('Petal width');
legend('Location','Northwest');
axis tight


Train three binary SVM classifiers that separate each type of iris from the others. Assume that a radial basis function is an appropriate kernel for each, and allow the algorithm to choose a kernel scale. It is good practice to define the class order and standardize the predictors.

classNames = {'setosa'; 'virginica'; 'versicolor'};
numClasses = size(classNames,1);
inds = cell(3,1); % Preallocation
SVMModel = cell(3,1);

rng(1); % For reproducibility
for j = 1:numClasses
inds{j} = strcmp(Y,classNames{j});  % OVA classification
SVMModel{j} = fitcsvm(X,inds{j},'ClassNames',[false true],...
'Standardize',true,'KernelFunction','rbf','KernelScale','auto');
end


fitcsvm uses a heuristic procedure that involves subsampling to compute the value of the kernel scale.

Fit the optimal score-to-posterior-probability transformation function for each classifier.

for j = 1:numClasses
SVMModel{j} = fitPosterior(SVMModel{j});
end

Warning: Classes are perfectly separated. The optimal score-to-posterior
transformation is a step function.


Define a grid to plot the posterior probability contours. Estimate the posterior probabilities over the grid for each classifier.

d = 0.02;
[x1Grid,x2Grid] = meshgrid(min(X(:,1)):d:max(X(:,1)),...
min(X(:,2)):d:max(X(:,2)));
xGrid = [x1Grid(:),x2Grid(:)];

posterior = cell(3,1);
for j = 1:numClasses
[~,posterior{j}] = predict(SVMModel{j},xGrid);
end


For each SVM classifier, plot the posterior probability contour under the scatter plot of the data.

figure
h = zeros(numClasses + 1,1); % Preallocation for graphics handles
for j = 1:numClasses
subplot(2,2,j)
contourf(x1Grid,x2Grid,reshape(posterior{j}(:,2),size(x1Grid,1),size(x1Grid,2)));
hold on
h(1:numClasses) = gscatter(X(:,1),X(:,2),Y);
title(sprintf('Posteriors for %s Class',classNames{j}));
xlabel('Petal length');
ylabel('Petal width');
legend off
axis tight
hold off
end
h(numClasses + 1) = colorbar('Location','EastOutside',...
'Position',[[0.8,0.1,0.05,0.4]]);
set(get(h(numClasses + 1),'YLabel'),'String','Posterior','FontSize',16);
legend(h(1:numClasses),'Location',[0.6,0.2,0.1,0.1]);


Fit Optimal Posterior Probability Function Using Holdout Cross Validation

Platt (2000) outlines a bias-reducing method of estimating the score-to-posterior-probability transformation function. This method estimates the transformation function after the SVM classifer is trained, and uses cross validation to reduce bias. By default, fitPosterior and fitSVMPosterior use 10-fold cross validation when they estimate the transformation function. To reduce run time for larger data sets, you can specify to use holdout cross validation instead.

load ionosphere


Train an SVM classifier. It is good practice to specify the class order and standardize the data.

SVMModel = fitcsvm(X,Y,'ClassNames',{'b','g'},'Standardize',true);


SVMModel is a ClassificationSVM classifier. The positive class is 'g'.

Fit the optimal score-to-posterior-probability transformation function. For comparison, use 10-fold cross validation (default) and specify a 10% holdout test sample.

rng(1); % For reproducibility

tic;    % Start the stopwatch
SVMModel_10FCV = fitPosterior(SVMModel);
toc     % Stop the stopwatch and display the run time

tic;
SVMModel_HO = fitPosterior(SVMModel,'Holdout',0.10);
toc

Elapsed time is 1.208565 seconds.
Elapsed time is 0.207077 seconds.


Though both runtimes are short because the data set is relatively small, SVMModel_HO fitted the score tansformation function much faster than SVMModel_10FCV.

Algorithms

If you reestimate the score-to-posterior-probability transformation function, that is, if you pass an SVM classifier to fitPosterior or fitSVMPosterior and its ScoreTransform property is not none, then the software:

• Displays a warning

• Resets the original transformation function to 'none' before estimating the new one

Alternatives

You can also fit the posterior probability function using fitSVMPosterior. This function is similar to fitPosterior, except it is more broad since it accepts a wider range of SVM classifer types.

References

[1] Platt, J. "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods". In: Advances in Large Margin Classifiers. Cambridge, MA: The MIT Press, 2000, pp. 61–74.