Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

**Superclasses: **`CompactClassificationNaiveBayes`

Naive Bayes classification

`ClassificationNaiveBayes`

is a naive Bayes classifier
for multiclass learning. Use `fitcnb`

and
the training data to train a `ClassificationNaiveBayes`

classifier.

Trained `ClassificationNaiveBayes`

classifiers
store the training data, parameter values, data distribution, and
prior probabilities. You can use these classifiers to:

Estimate resubstitution predictions. For details, see

`resubPredict`

.Predict labels or posterior probabilities for new data. For details, see

`predict`

.

Create a `ClassificationNaiveBayes`

object by using `fitcnb`

.

compact | Compact naive Bayes classifier |

crossval | Cross-validated naive Bayes classifier |

resubEdge | Classification edge for naive Bayes classifiers by resubstitution |

resubLoss | Classification loss for naive Bayes classifiers by resubstitution |

resubMargin | Classification margins for naive Bayes classifiers by resubstitution |

resubPredict | Predict naive Bayes classifier resubstitution response |

edge | Classification edge for naive Bayes classifiers |

logP | Log unconditional probability density for naive Bayes classifier |

loss | Classification error for naive Bayes classifier |

margin | Classification margins for naive Bayes classifiers |

predict | Predict labels using naive Bayes classification model |

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

If you specify

`'DistributionNames','mn'`

when training`Mdl`

using`fitcnb`

, then the software fits a multinomial distribution using the bag-of-tokens model. The software stores the probability that tokenappears in class`j`

in the property`k`

`DistributionParameters{`

. Using additive smoothing [2], the estimated probability is,`k`

}`j`

$$P(\text{token}j|\text{class}k)=\frac{1+{c}_{j|k}}{P+{c}_{k}},$$

where:

$${c}_{j|k}={n}_{k}\frac{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}{x}_{ij}}{w}_{i}^{}}{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}{w}_{i}}};$$ which is the weighted number of occurrences of token

*j*in class*k*.*n*is the number of observations in class_{k}*k*.$${w}_{i}^{}$$ is the weight for observation

*i*. The software normalizes weights within a class such that they sum to the prior probability for that class.$${c}_{k}={\displaystyle \sum _{j=1}^{P}{c}_{j|k}};$$ which is the total weighted number of occurrences of all tokens in class

*k*.

If you specify

`'DistributionNames','mvmn'`

when training`Mdl`

using`fitcnb`

, then:For each predictor, the software collects a list of the unique levels, stores the sorted list in

`CategoricalLevels`

, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.For predictor

in class`j`

*k*, the software counts instances of each categorical level using the list stored in`CategoricalLevels{`

.}`j`

The software stores the probability that predictor

, in class`j`

, has level`k`

*L*in the property`DistributionParameters{`

, for all levels in,`k`

}`j`

`CategoricalLevels{`

. Using additive smoothing [2], the estimated probability is}`j`

$$P\left(\text{predictor}j=L|\text{class}k\right)=\frac{1+{m}_{j|k}(L)}{{m}_{j}+{m}_{k}},$$

where:

$${m}_{j|k}(L)={n}_{k}\frac{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}I\{{x}_{ij}=L\}{w}_{i}^{}}}{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}{w}_{i}^{}}};$$ which is the weighted number of observations for which predictor

*j*equals*L*in class*k*.*n*is the number of observations in class_{k}*k*.$$I\left\{{x}_{ij}=L\right\}=1$$ if

*x*=_{ij}*L*, 0 otherwise.$${w}_{i}^{}$$ is the weight for observation

*i*. The software normalizes weights within a class such that they sum to the prior probability for that class.*m*is the number of distinct levels in predictor_{j}*j*.*m*is the weighted number of observations in class_{k}*k*.

[1] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning*, Second Edition. NY:
Springer, 2008.

[2] Manning, C. D., P. Raghavan, and M. Schütze. *Introduction
to Information Retrieval*, NY: Cambridge University Press,
2008.