# Confusion Matrix and its Derivations

## Contents

## Syntax

`C = multic(A,B)`

`C = multic(A,B,MODE)`

`C = multic(A,B,MODE,BETA)`

`[C,T] = multic(A,...)`

`[C,T,D] = multic(A,...)`

`[C,T,D,M] = multic(A,...)`

`[C,T,D,M,N] = multic(A,...)`

`[C,T,D,M,N,ORDER] = multic(A,...)`

## Introduction

Using results of multiclass classification the function builds a confusion matrix and tables of confusion and calculates derivations from a confusion matrix for each class and for all population. Derivations are metrics of classifier's performance. Each derivation serves different purposes. Consideration of derivations allows a comprehensive study of classifier's performance.

## Description

`C = multic(A,B)` returns the *n* X *n* confusion matrix *C* determined by the known and predicted groups in *A* and *B* , respectively. *A* and *B* are input vectors with the same number of observations. *n* is the total number of distinct elements in *A* and *B*. Input vectors must be of the same type and can be numeric arrays or cell arrays of strings. By default missing classification (observation) is not counted. Missing classification must be denoted as NaN for numeric type and be empty string for cell array of strings. Calculations are case insensitive. All strings will be transformed to lower case.

`C = multic(A,B,MODE)`, where `MODE = 1` calculates missing classification as "false negative" for class corresponding missing classification and "true negative" for other classes. `MODE = 0` is default.

`C = multic(A,B,1)` returns *n* X *(n+2)* augmented confusion matrix. Column *n* +1 contains NaN in a row corresponding a class with missing classification and column *n* +2 contains a number of missing classifications for each class.

`C = multic(A,B,MODE,BETA)`, `BETA` is a value for *F* -score. Default value is `BETA = 1.`

`[C,T] = multic(A,...)` returns *C* and *T* , where *T* is 2X2 confusion table:

TP |
FN |

FP |
TN |

*TP* is true positive (aka hit), *FP* is false positive (aka false alarm, Type I error), *FN* is false negative (aka miss, Type II error) and *TN* is true negative (aka correct rejection).

`[C,T,D] = multic(A,...)` returns as above and *D*, where *D* is a 18-vector of derivations from a confusion matrix.

They are:

- Accuracy,
- Precision (positive predictive value),
- False discovery rate,
- False omission rate,
- Negative predictive value,
- Prevalence,
- Recall (hit rate, sensitivity, true positive rate),
- False positive rate (fall-out),
- Positive likelihood ratio,
- False negative rate (miss rate),
- True negative rate (specificity),
- Negative likelihood ratio,
- Diagnostic odds ratio,
- Informedness,
- Markedness,
- F-score,
- G-measure,
- Matthews correlation coefficient.

`[C,T,D,M] = multic(A,...)` returns as above and *M* , where *M* is a 2 *n* X2 matrix, which contains confusion tables for each class (one versus all). Rows 2 *i* -1 and 2 *i* correspond to class with label number *i* , *i* = 1, 2, ..., *n* .

`[C,T,D,M,N] = multic(A,...)` returns as above and *N* , where *N* is a 18 X *n* matrix of derivations from confusion tables, which are contained in M. Column *i* corresponds to class with label number *i, i = 1, 2, ..., n* .

`[C,T,D,M,N,ORDER] = multic(A,...)` also returns the order of the rows and columns of *C* (class labels) in a variable ORDER the same type as input vectors.

## References

- https://en.wikipedia.org/wiki/Confusion_matrix
- http://araw.mede.uic.edu/cgi-bin/testcalc.pl
- https://www.medcalc.org/calc/diagnostic_test.php
- http://www.biochemia-medica.com/content/odds-ratio-calculation-usage-and-interpretation

## Examples

`Example 1:` calculation of the confusion matrix for data with one misclassification and one missing classification.

A = [1 1 1 2 2 2 2 2 3 3 ]; % Known groups B = [1 1 1 2 2 2 3 4 4 NaN]; % Predicted groups disp('Missing classification is not counted:') C = multic(A,B) % MODE=0 disp('Missing classification is counted:') C = multic(A,B,1) % MODE=1

Missing classification is not counted: C = 3 0 0 0 0 3 1 1 0 0 0 1 0 0 0 0 Missing classification is counted: C = 3 0 0 0 0 0 0 3 1 1 0 0 0 0 0 1 NaN 1 0 0 0 0 0 0

`Example 2:` calculation of the confusion matrix, confusion table, derivations of it and confusion tables and derivations for each class for data with three misclassifications and two missing classifications. Missing classifications are counted.

A=[1 1 3 1 1 2 2 2 2 2 2 2 3 1 3 3]; % Known groups B=[1 1 1 1 1 2 2 2 2 1 NaN NaN 2 1 3 3]; % Predicted groups [C,T,D,M,N,order] = multic(A,B,1)

C = 5 0 0 0 0 1 4 0 NaN 2 1 1 2 0 0 T = 11 5 3 29 D = 0.83333 0.78571 0.21429 0.14706 0.85294 0.33333 0.6875 0.09375 7.3333 0.3125 0.90625 0.34483 21.267 0.59375 0.63866 0.73333 0.73497 0.61579 M = 5 0 2 9 4 3 1 8 2 2 0 12 N = 0.875 0.75 0.875 0.71429 0.8 1 0.28571 0.2 0 0 0.27273 0.14286 1 0.72727 0.85714 0.3125 0.4375 0.25 0.3125 0.25 0.125 0.18182 0.11111 0 1.7188 2.25 Inf 0 0.1875 0.125 0.81818 0.88889 1 0 0.21094 0.125 Inf 10.667 Inf 0.13068 0.13889 0.125 0.71429 0.52727 0.85714 0.43478 0.38095 0.22222 0.47246 0.44721 0.35355 0.76447 0.49266 0.65465 order = 1 2 3

`Example 3:` calculation of the confusion matrix for data with two misclassifications and one missing classification.

A = {'Cats','Cats','Rats','Rats','Rats','Rabbits','Rabbits'}; % Known groups B = {'Cats','Cats','Rats','Rats','Rabbits','Rats',''}; % Predicted groups [C,~,~,~,~,order] = multic(A,B,1)

C = 2 0 0 0 0 0 0 1 NaN 1 0 1 2 0 0 order = 'cats' 'rabbits' 'rats'