Classification loss functions measure the predictive
inaccuracy of classification models. When you compare the same type of loss among many
models, a lower loss indicates a better predictive model.
Suppose the following:
L is the weighted average classification loss.
n is the sample size.
y_{j} is the observed class label. The
software codes it as –1 or 1, indicating the negative or positive class,
respectively.
f(X_{j}) is the raw
classification score for the transformed observation (row) j
of the predictor data X using feature expansion.
m_{j} =
y_{j}f(X_{j})
is the classification score for classifying observation j
into the class corresponding to y_{j}.
Positive values of m_{j} indicate correct
classification and do not contribute much to the average loss. Negative values
of m_{j} indicate incorrect
classification and contribute to the average loss.
The weight for observation j is
w_{j}. The software normalizes
the observation weights so that they sum to the corresponding prior class
probability. The software also normalizes the prior probabilities so that they
sum to 1. Therefore,
This table describes the supported loss functions that you can specify by using the
'LossFun'
namevalue pair argument.
Loss Function  Value of LossFun  Equation 

Binomial deviance  'binodeviance'  $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[2{m}_{j}\right]\right\}}.$$ 
Exponential loss  'exponential'  $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left({m}_{j}\right)}.$$ 
Classification error  'classiferror'  $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$ The classification error is the weighted
fraction of misclassified observations where $${\widehat{y}}_{j}$$ is the class label corresponding to the class with the
maximal posterior probability.
I{x} is the indicator
function. 
Hinge loss  'hinge'  $$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1{m}_{j}\right\}.$$ 
Logit loss  'logit'  $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left({m}_{j}\right)\right)}.$$ 
Minimal cost  'mincost'  The software computes the weighted minimal cost using this
procedure for observations j =
1,...,n.
Estimate the 1byK vector of expected
classification costs for observation j:
f(X_{j})
is the column vector of class posterior probabilities.
C is the cost matrix that the input
model stores in the Cost
property. For observation j, predict the class
label corresponding to the minimum expected classification
cost:
Using C, identify the cost incurred
(c_{j}) for
making the prediction.
The weighted, average, minimum cost loss is

Quadratic loss  'quadratic'  $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1{m}_{j}\right)}^{2}}.$$ 
This figure compares the loss functions (except minimal cost) for one observation over
m. Some functions are normalized to pass through [0,1].