classperf

Evaluate performance of classifier

Syntax

classperf
CP = classperf(truelabels)
CP = classperf(truelabels, classout)
CP = classperf(..., 'Positive', PositiveValue, 'Negative', NegativeValue)
classperf(CP, classout)
classperf(CP, classout, testidx)

Input Arguments

truelabels

True class labels for each observation, specified by one of the following:

  • Numeric vector

  • Cell array of strings

    Note:   When used in a cross-validation design experiment, truelabels should have the same size as the total number of observations.

classout

Classifier output, specified by one of the following:

  • Numeric vector

  • Cell array of strings

    Note:   classout must contain the same number of elements as truelabels.

PositiveValue

Numeric vector or cell array of strings that specifies the positive labels to identify the target class(es). Default is the first class returned by grp2idx(truelabels).

NegativeValue

Numeric vector or cell array of strings that specifies the negative labels to identify the control class(es). Default is all classes other than the first class returned by grp2idx(truelabels).

testidx

Vector that indicates the observations that were used in the current validation. Choices are:

  • Index vector

  • Logical index vector of the same size as truelabels used to construct the classifier performance object

Output Arguments

CPClassifier performance object with performance properties listed in the following table.

Description

classperf provides an interface to keep track of the performance during the validation of classifiers. classperf creates and, optionally, updates a classifier performance object, CP, which accumulates the results of the classifier. The performance properties of a classifier performance object are listed in the following table.

classperf, without input arguments, displays all the performance properties of a classifier performance object.

CP = classperf(truelabels) creates and initializes an empty classifier performance object. CP is the handle to the object. truelabels is a vector or cell array of strings containing the true class labels for every observation. When used in a cross-validation design experiment, truelabels must have the same size as the total number of observations.

CP = classperf(truelabels, classout) creates CP using truelabels, then updates CP using the classifier output, classout.

    Tip   This syntax is useful when you want to know the performance of a single validation.

CP = classperf(..., 'Positive', PositiveValue, 'Negative', NegativeValue) specifies the positive and negative labels to identify the target and the control classes, respectively. These labels are used to compute clinical diagnostic test performance.

If truelabels is a numeric vector, PositiveValue and NegativeValue must be numeric vectors whose entries are subsets of grp2idx(truelabels). If truelabels is a cell array of strings, PositiveValue and NegativeValue can be cell arrays of strings or numeric vectors whose entries are subsets of grp2idx(truelabels). PositiveValue defaults to the first class returned by grp2idx(truelabels), while NegativeValue defaults to all other classes.

PositiveValue and NegativeValue must consist of disjoint sets of the labels used in truelabels. For example, if

truelabels = [1 2 2 1 3 4 4 1 3 3 3 2]

you could set

p = [1 2];
n = [3 4];

For example, if you have a data set with data from six samples: five different types of cancer (ovarian, lung, prostate, skin, brain) and no cancer, then ClassLabels = {'Ovarian', 'Lung', 'Prostate', 'Skin', 'Brain', 'Healthy'}.

You could test a detector for lung cancer by using a PositiveValue of 2, and a NegativeValue = [1 3 4 5 6].

Or you can test for any type of cancer by using PositiveValue = [1 2 3 4 5] and a NegativeValue of 6.

In clinical tests, inconclusive values such as '' or NaN are counted as false negatives for the computation of the specificity, and as false positives for the computation of the sensitivity. That is, inconclusive results may decrease the diagnostic value of the test. Tested observations for which truelabels is not within the union of PositiveValue and NegativeValue are not considered. However, tested observations that result in a class not covered by the vector truelabels are counted as inconclusive.

classperf(CP, classout) updates CP, the classifier performance object, with the classifier output classout. classout must be the same size as truelabels, the vector or cell array used to construct the classifier performance object. When classout is a cell array of strings, an empty string, '', represents an inconclusive result of the classifier. For numeric arrays, NaN represents an inconclusive result.

classperf(CP, classout, testidx) updates CP, the classifier performance object, with the classifier output classout. classout has a smaller size than truelabels. testidx is an index vector or a logical index vector of the same size as truelabels, the vector or cell array used to construct the classifier performance object. testidx indicates the observations that were used in the current validation.

    Note:   In the two previous syntaxes, you do not need to create a separate output variable to update the classifier performance object, CP.

Properties of a Classifier Performance Object

You can access classifier performance object properties by using the get function

get(CP, 'ControlClasses')

or using dot notation

CP.ControlClasses

You cannot directly modify the classifier performance object properties by using the set function, with the exception of the Label and Description properties.

    Tip   To modify properties, use either of the following syntaxes:

    classperf(CP, classout)
    classperf(CP, classout, testidx)
PropertyDescription
Label

String to label the classifier performance object. Default is ''.

Description

String to describe the classifier performance object. Default is ''.

ClassLabels

Numeric vector or cell array of strings specifying a unique set of class labels from unique(truelabels).

GroundTruth

Numeric vector or cell array of strings that specifies the true class labels for each observation. The number of elements = NumberOfObservations.

NumberOfObservations

Positive integer specifying the number of observations in the study.

ControlClasses

Indices to the ClassLabels vector or cell array, indicating which classes to be considered as the control or negative classes in a diagnostic test.

    Tip   You set the ControlClasses property with the 'Negative' property name/value pair. If you do not specify the 'Negative' property, ControlClasses defaults to all classes other than the first class returned by grp2idx(truelabels).

TargetClasses

Indices to the ClassLabels vector or cell array, indicating which classes to be considered as the target or positive classes in a diagnostic test.

    Tip   You set the TargetClasses property with the 'Positive' property name/value pair. If you do not specify the 'Positive' property, TargetClasses defaults to the first class returned by grp2idx(truelabels).

ValidationCounter

Positive integer specifying the number of validations performed.

SampleDistribution

Numeric vector indicating how many times each sample was considered in the validation.

For example, if you use resubstitution, SampleDistribution is a vector of ones and ValidationCounter = 1. If you have a ten-fold cross-validation, SampleDistribution is also a vector of ones, but ValidationCounter = 10.

    Tip   SampleDistribution is more useful when doing Monte Carlo partitions of the test sets, as this will help determine if all the samples are being equally tested.

ErrorDistribution

Numeric vector indicating how many times each sample was misclassified.

SampleDistributionByClass

Numeric vector indicating the frequency of the true classes in the validation.

ErrorDistributionByClass

Numeric vector indicating the frequency of errors for each class in the validation.

CountingMatrix

The classification confusion matrix. The order of rows and columns is the same as grp2idx(truelabels). Columns represent the true classes, and rows represent the classifier prediction. The last row in CountingMatrix is reserved to count inconclusive results. There are some families of classifiers that can reserve the right to make a hard class assignment; this can be based on metrics, such as the posterior probabilities, or on how close a sample is to the class boundaries.

CorrectRate

Correctly Classified Samples / Classified Samples

    Note:   Inconclusive results are not counted.

ErrorRate

Incorrectly Classified Samples / Classified Samples

    Note:   Inconclusive results are not counted.

LastCorrectRate

The following equation applies only to samples considered the last time the classifier performance object was updated:

Correctly Classified Samples / Classified Samples

LastErrorRate

The following equation applies only to samples considered the last time the classifier performance object was updated:

Incorrectly Classified Samples / Classified Samples

InconclusiveRate

Nonclassified Samples / Total Number of Samples

ClassifiedRate

Classified Samples / Total Number of Samples

Sensitivity

Correctly Classified Positive Samples / True Positive Samples

    Note:   Inconclusive results that are true positives are counted as errors for computing Sensitivity (following a conservative approach). This is the same as being incorrectly classified as negatives.

Specificity

Correctly Classified Negative Samples / True Negative Samples

    Note:   Inconclusive results that are true negatives are counted as errors for computing Specificity (following a conservative approach). This is the same as being incorrectly classified as positives.

PositivePredictiveValue

Correctly Classified Positive Samples / Positive Classified Samples

    Note:   Inconclusive results are classified as negatives when computing PositivePredictiveValue.

NegativePredictiveValue

Correctly Classified Negative Samples / Negative Classified Samples

    Note:   Inconclusive results are classified as positives when computing NegativePredictiveValue.

PositiveLikelihood

Sensitivity / (1 – Specificity)

NegativeLikelihood

(1 – Sensitivity) / Specificity

Prevalence

True Positive Samples / Total Number of Samples

DiagnosticTable

A 2-by-2 numeric array with diagnostic counts. The first row indicates the number of samples that were classified as positive, with the number of true positives in the first column, and the number of false positives in the second column. The second row indicates the number of samples that were classified as negative, with the number of false negatives in the first column, and the number of true negatives in the second column.

Correct classifications appear in the diagonal elements, and errors appear in the off-diagonal elements. Inconclusive results are considered errors and counted in the off-diagonal elements.

For an illustration of a diagnostic table, see below.

Example Diagnostic Table

In a cancer study of ten patients, suppose we get the following results:

PatientClassifier OutputHas Cancer
1PositiveYes
2PositiveYes
3PositiveYes
4PositiveNo
5NegativeYes
6NegativeNo
7NegativeNo
8NegativeNo
9NegativeNo
10InconclusiveYes

The diagnostic table would look as follows:

Examples

% Classify the fisheriris data with a K-Nearest Neighbor classifier
load fisheriris
c = knnclassify(meas,meas,species,4,'euclidean','Consensus');
cp = classperf(species,c)
get(cp)
 
% 10-fold cross-validation on the fisheriris data using linear
% discriminant analysis and the third column as only feature for
% classification
load fisheriris
indices = crossvalind('Kfold',species,10);
cp = classperf(species); % initializes the CP object
for i = 1:10
    test = (indices == i); train = ~test;
    class = classify(meas(test,3),meas(train,3),species(train));
    % updates the CP object with the current classification results
    classperf(cp,class,test)  
end
cp.CorrectRate % queries for the correct classification rate

 
cp =
 
	biolearning.classperformance

                        Label: ''
                  Description: ''
                  ClassLabels: {3x1 cell}
                  truelabels: [150x1 double]
         NumberOfObservations: 150
               ControlClasses: [2x1 double]
                TargetClasses: 1
            ValidationCounter: 1
           SampleDistribution: [150x1 double]
            ErrorDistribution: [150x1 double]
    SampleDistributionByClass: [3x1 double]
     ErrorDistributionByClass: [3x1 double]
               CountingMatrix: [4x3 double]
                  CorrectRate: 1
                    ErrorRate: 0
             InconclusiveRate: 0.0733
               ClassifiedRate: 0.9267
                  Sensitivity: 1
                  Specificity: 0.8900
      PositivePredictiveValue: 0.8197
      NegativePredictiveValue: 1
           PositiveLikelihood: 9.0909
           NegativeLikelihood: 0
                   Prevalence: 0.3333
              DiagnosticTable: [2x2 double]


ans =
    0.9467
Was this topic helpful?