Accelerating the pace of engineering and science

# perfcurve

Compute Receiver Operating Characteristic (ROC) curve or other performance curve for classifier output

## Syntax

[X,Y] = perfcurve(labels,scores,posclass)
[X,Y] = perfcurve(labels,scores,posclass,'Name',value)
[X,Y,T,AUC,OPTROCPT,SUBY,SUBYNAMES] = perfcurve(labels,scores,posclass)
[X,Y,T,AUC] = perfcurve(labels,scores,posclass)

## Description

[X,Y] = perfcurve(labels,scores,posclass) computes a ROC curve for a vector of classifier predictions scores given true class labels, labels. labels can be a numeric vector, logical vector, character matrix, cell array of strings or categorical vector. scores is a numeric vector of scores returned by a classifier for some data. posclass is the positive class label (scalar), either numeric (for numeric labels), logical (for logical labels), or char. The returned values X and Y are coordinates for the performance curve and can be visualized with plot(X,Y). For more information on labels, scores, and posclass, see Input Arguments . For more information on X and Y, see Output Arguments.

[X,Y] = perfcurve(labels,scores,posclass,'Name',value) specifies one or more optional parameter name/value pairs, with Name in single quotes. See Input Arguments for a list of inputs, parameter name/value pairs, and respective explanations.

[X,Y,T,AUC,OPTROCPT,SUBY,SUBYNAMES] = perfcurve(labels,scores,posclass) returns:

• An array of thresholds on classifier scores for the computed values of X and Y (T).

• The area under curve (AUC) for the computed values of X and Y.

• The optimal operating point of the ROC curve (OPTROCPT).

• An array of Y values for negative subclasses (SUBY).

• A cell array of negative class names (SUBYNAMES).

[X,Y,T,AUC] = perfcurve(labels,scores,posclass) also returns pointwise confidence bounds for the computed values X, Y, T, and AUC if you supply cell arrays for labels and scores or set NBoot (see Input Arguments ) to a positive integer. To compute the confidence bounds, perfcurve uses either vertical averaging (VA) or threshold averaging (TA). The returned values Y are an m-by-3 array in which the 1st element in every row gives the mean value, the 2nd element gives the lower bound and the 3rd element gives the upper bound. The returned AUC is a row-vector with 3 elements following the same convention. For VA, the returned values T are an m-by-3 array and X is a column-vector. For TA, the returned values X are an m-by-3 matrix and T is a column-vector.

perfcurve computes confidence bounds using either cross validation or bootstrap. If you supply cell arrays for labels and scores, perfcurve uses cross validation and treats elements in the cell arrays as cross validation folds. labels can be a cell array of numeric vectors, logical vectors, character matrices, cell arrays of strings or categorical vectors. All elements in labels must have the same type. scores is a cell array of numeric vectors. The cell arrays for labels and scores must have the same number of elements, and the number of labels in cell k must be equal to the number of scores in cell k for any k in the range from 1 to the number of elements in scores.

If you set NBoot to a positive integer, perfcurve generates nboot bootstrap replicas to compute pointwise confidence bounds. You cannot supply cell arrays for labels and scores and set NBoot to a positive integer at the same time.

perfcurve returns pointwise confidence bounds. It does not return a simultaneous confidence band for the entire curve.

If you use 'XCrit' or 'YCrit' options described below to set the criterion for X or Y to an anonymous function, perfcurve can only compute confidence bounds by bootstrap.

## Input Arguments

 labels labels can be a numeric vector, logical vector, character matrix, cell array of strings or categorical vector. scores scores is a numeric vector of scores returned by a classifier for some data. This vector must have as many elements as labels does. posclass posclass is the positive class label. If labels is a:Numeric vector, then posclass is a numeric scalarLogical vector, then posclass is a logical scalarCharacter matrix, then posclass is a character stringCell array of strings, then posclass is a character string or cell containing a character stringCategorical vector, then posclass is a categorical scalarposclass must be a member of labels.

### Name-Value Pair Arguments

NameValue and Description
negClassList of negative classes. Can be either a numeric array or an array of chars or a cell array of strings. By default, negClass is set to 'all' and all classes found in the input array of labels that are not the positive class are considered negative. If negClass is a subset of the classes found in the input array of labels, instances with labels that do not belong to either positive or negative classes are discarded.

xCrit

Criterion to compute for X. This criterion must be a monotone function of the positive class score. perfcurve supports the following criteria:
• TP — Number of true positive instances.

• FN — Number of false negative instances.

• FP — Number of false positive instances.

• TN — Number of true negative instances.

• TP+FP — Sum of TP and FP.

• RPP — Rate of positive predictions. RPP=(TP+FP)/(TP+FN+FP+TN)

• RNP — Rate of negative predictions. RNP=(TN+FN)/(TP+FN+FP+TN)

• accu — Accuracy. accu = (TP+TN)/(TP+FN+FP+TN)

• TPR, sens, reca — True positive rate, sensitivity, recall. TPR, sens, reca = TP/(TP+FN)

• FNR, miss — False negative rate, miss. FNR,miss=FN/(TP+FN)

• FPR, fall — False positive rate, fallout. FPR,fall=FP/(TN+FP)

• TNR, spec — True negative rate, specificity. TNR,spec=TN/(TN+FP)

• PPV, prec — Positive predictive value, precision. PPV,prec=TP/(TP+FP)

• NPV — Negative predictive value. NPV=TN/(TN+FN)

• ecost — Expected cost. ecost=(TP*COST(P|P)+FN*COST(N|P)+FP* COST(P|N)+TN*COST(N|N))/(TP+FN+FP+TN)

In addition, you can define an arbitrary criterion by supplying an anonymous function of three arguments, (C,scale,cost), where C is a 2-by-2 confusion matrix, scale is a 2-by-1 array of class scales, and cost is a 2-by-2 misclassification cost matrix.
 Caution   Some of these criteria return NaN values at one of the two special thresholds, 'reject all' and 'accept all'.
yCritCriterion to compute for Y. perfcurve supports the same criteria as for X. This criterion does not have to be a monotone function of the positive class score.
XValsValues for the X criterion. The default value for xVals is 'all' and perfcurve computes X and Y values for all scores. If the value for xVals is not 'all', it must be a numeric array. In this case, perfcurve computes X and Y only for the specified xVals.
TValsThresholds for the positive class score. By default, TVals is unset and perfcurve computes X, Y, and T values for all scores. You can set TVals to either 'all' or a numeric array. If TVals is set to 'all' or unset and XVals is unset, perfcurve returns X, Y, and T values for all scores and computes pointwise confidence bounds for Y and X using threshold averaging. If TVals is set to a numeric array, perfcurve returns X, Y, and T values for the specified thresholds and computes pointwise confidence bounds for Y and X at these thresholds using threshold averaging. You cannot set XVals and TVals at the same time.
UseNearest'on' to use nearest values found in the data instead of the specified numeric XVals or TVals and 'off' otherwise. If you specify numeric XVals and set UseNearest to 'on', perfcurve returns nearest unique values X found in the data, as well as corresponding values of Y and T. If you specify numeric XVals and set UseNearest to 'off', perfcurve returns these XVals sorted. By default this parameter is set to 'on'. If you compute confidence bounds by cross validation or bootstrap, this parameter is always 'off'.
ProcessNaNSpecifies how perfcurve processes NaN scores. The default value is 'ignore' and perfcurve removes observations with NaN scores from the data. If you set the parameter to 'addtofalse', perfcurve adds instances with NaN scores to false classification counts in the respective class. That is, perfcurve always counts instances from the positive class as false negative (FN), and always counts instances from the negative class as false positive (FP).
PriorEither string or array with two elements. It represents prior probabilities for the positive and negative class, respectively. Default is 'empirical', that is, perfcurve derives prior probabilities from class frequencies. If set to 'uniform', perfcurve sets all prior probabilities equal.
CostA 2-by-2 matrix of misclassification costs [C(P|P) C(N|P); C(P|N) C(N|N)] where C(I|J) is the cost of misclassifying class J as class I. By default set to [0 0.5; 0.5 0].
AlphaA numeric value between 0 and 1. perfcurve returns 100*(1-Alpha) percent pointwise confidence bounds for X, Y, T and AUC. By default set to 0.05 for 95% confidence interval.
WeightsA numeric vector of nonnegative observation weights. This vector must have as many elements as scores or labels do. If you supply cell arrays for scores and labels and you need to supply weights, you must supply them as a cell array too. In this case, every element in weights must be a numeric vector with as many elements as the corresponding element in scores: numel(weights{1})==numel(scores{1}) etc. To compute X, Y and T or to compute confidence bounds by cross validation, perfcurve uses these observation weights instead of observation counts. To compute confidence bounds by bootstrap, perfcurve samples N out of N with replacement using these weights as multinomial sampling probabilities.
NBootNumber of bootstrap replicas for computation of confidence bounds. Must be a positive integer. By default this parameter is set to zero, and bootstrap confidence bounds are not computed. If you supply cell arrays for labels and scores, this parameter must be set to zero because perfcurve cannot use both cross validation and bootstrap to compute confidence bounds.
BootType Confidence interval type bootci uses to compute confidence bounds. You can specify any type supported by bootci. By default set to 'bca'.
BootArgOptional input arguments for bootci used to compute confidence bounds. You can specify all arguments bootci supports. Empty by default.

## Output Arguments

 X x-coordinates for the performance curve. By default, X is false positive rate, FPR, (equivalently, fallout, or 1–specificity). To change this output, use the 'xCrit' name/value input. For accepted criterion, see 'xCrit' in Input Arguments for more information. Y y-coordinates for the performance curve. By default, Y is true positive rate, TPR, (equivalently, recall, or sensitivity). To change this output, use the 'yCrit' input. For accepted criterion, see 'xCrit' in Input Arguments for more information. T An array of thresholds on classifier scores for the computed values of X and Y. It has the same number of rows as X and Y. For each threshold, TP is the count of true positive observations with scores greater or equal to this threshold, and FP is the count of false positive observations with scores greater or equal to this threshold. perfcurve defines negative counts, TN and FN, in a similar way then sorts the thresholds in the descending order which corresponds to the ascending order of positive counts. For the M distinct thresholds found in the array of scores, perfcurve returns the X, Y and T arrays with M+1 rows. perfcurve sets elements T(2:M+1) to the distinct thresholds, and T(1) replicates T(2). By convention, T(1) represents the highest 'reject all' threshold and perfcurve computes the corresponding values of X and Y for TP=0 and FP=0. T(end) is the lowest 'accept all' threshold for which TN=0 and FN=0. AUC The area under curve (AUC) for the computed values of X and Y. If you set xVals to 'all' (the default), perfcurve computes AUC using the returned X and Y values. If xVals is a numeric array, perfcurve computes AUC using X and Y values found from all distinct scores in the interval specified by the smallest and largest elements of xVals. More precisely, perfcurve finds X values for all distinct thresholds as if xVals were set to 'all', then uses a subset of these (with corresponding Y values) between min(xVals) and max(xVals) to compute AUC. The function uses trapezoidal approximation to estimate the area. If the first or last value of X or Y are NaNs, perfcurve removes them to allow calculation of AUC. This takes care of criteria that produce NaNs for the special 'reject all' or 'accept all' thresholds, for example, positive predictive value (PPV) or negative predictive value (NPV). OPTROCPT The optimal operating point of the ROC curve as an array of size 1-by-2 with FPR and TPR values for the optimal ROC operating point. perfcurve computes optrocpt only for the standard ROC curve and sets to NaNs otherwise. To obtain the optimal operating point for the ROC curve, perfcurve first finds the slope, S, using $S=\frac{\mathrm{cos}t\left(P|N\right)-\mathrm{cos}t\left(N|N\right)}{\mathrm{cos}t\left(N|P\right)-\mathrm{cos}t\left(P|P\right)}*\frac{N}{P}$where cost(I|J) is the cost of assigning an instance of class J to class I, and P=TP+FN and N=TN+FP are the total instance counts in the positive and negative class, respectively. perfcurve then finds the optimal operating point by moving the straight line with slope S from the upper left corner of the ROC plot (FPR=0, TPR=1) down and to the right until it intersects the ROC curve. SUBY An array of Y values for negative subclasses. If you only specify one negative class, SUBY is identical to Y. Otherwise SUBY is a matrix of size M-by-K, where M is the number of returned values for X and Y, and K is the number of negative classes. perfcurve computes Y values by summing counts over all negative classes. SUBY gives values of the Y criterion for each negative class separately. For each negative class, perfcurve places a new column in SUBY and fills it with Y values for TN and FP counted just for this class. SUBYNAMES A cell array of negative class names. If you provide an input array, negClass, of negative class names, perfcurve copies it into SUBYNAMES. If you do not provide negClass, perfcurve extracts SUBYNAMES from input labels. The order of SUBYNAMES is the same as the order of columns in SUBY, that is, SUBY(:,1) is for negative class SUBYNAMES{1} etc.

## Examples

expand all

### Plot a ROC Curve for Classification Algorithms

Plot the ROC curve for classification by logistic regression.

```load fisheriris
x = meas(51:end,1:2);
% Iris data, 2 classes and 2 features
y = (1:100)'>50;
% Versicolor = 0, virginica = 1
b = glmfit(x,y,'binomial');
% Logistic regression
p = glmval(b,x,'logit');
% Fit probabilities for scores
[X,Y,T,AUC] = perfcurve(species(51:end,:),p,'virginica');
plot(X,Y)
xlabel('False positive rate'); ylabel('True positive rate')
title('ROC for classification by logistic regression')
```

Obtain errors on TPR by vertical averaging.

```[X,Y] = perfcurve(species(51:end,:),p,'virginica',...
'NBoot',1000,'XVals','All');
% Plot errors
errorbar(X,Y(:,1),Y(:,1)-Y(:,2),Y(:,3)-Y(:,1));
```

## References

[1] T. Fawcett, ROC Graphs: Notes and Practical Considerations for Researchers, 2004.

[2] M. Zweig and G. Campbell, Receiver-Operating Characteristic (ROC) Plots: A Fundamental Evaluation Tool in Clinical Medicine, Clin. Chem. 39/4, 561-577, 1993.

[3] J. Davis and M. Goadrich, The relationship between precision-recall and ROC curves, in Proceedings of ICML '06, 233-240, 2006.

[4] C. Moskowitz and M. Pepe, Quantifying and comparing the predictive accuracy of continuous prognostic factors for binary outcomes, Biostatistics 5, 113-127, 2004.

[5] Y. Huang, M. Pepe and Z. Feng, Evaluating the Predictiveness of a Continuous Marker, U. Washington Biostatistics Paper Series, 282, 2006.

[6] W. Briggs and R. Zaretzki, The Skill Plot: A Graphical Technique for Evaluating Continuous Diagnostic Tests, Biometrics 63, 250-261, 2008.

[8] R. Bettinger, Cost-Sensitive Classifier Selection Using the ROC Convex Hull Method, SAS Institute.

[9] http://www.stata.com/statalist/archive/2003-02/msg00060.html