Naive Bayes classification for multiclass classification

`ClassificationNaiveBayes`

is a Naive Bayes classifier for multiclass
learning. Trained `ClassificationNaiveBayes`

classifiers store the training
data, parameter values, data distribution, and prior probabilities. Use these classifiers to
perform tasks such as estimating resubstitution predictions (see `resubPredict`

) and predicting labels or posterior probabilities for new data (see
`predict`

).

Create a `ClassificationNaiveBayes`

object by using `fitcnb`

.

`PredictorNames`

— Predictor namescell array of character vectors

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the
elements in `PredictorNames`

corresponds to the order in which the
predictor names appear in the training data `X`

.

`ExpandedPredictorNames`

— Expanded predictor namescell array of character vectors

This property is read-only.

Expanded predictor names, specified as a cell array of character vectors.

If the model uses dummy variable encoding for categorical variables, then
`ExpandedPredictorNames`

includes the names that describe the
expanded variables. Otherwise, `ExpandedPredictorNames`

is the same as
`PredictorNames`

.

`CategoricalPredictors`

— Categorical predictor indices`[]`

| vector of positive integersThis property is read-only.

Categorical predictor indices, specified as a vector of
positive integers. `CategoricalPredictors`

contains index values
corresponding to the columns of predictor data that contain categorical predictors. If
none of the predictors are categorical, then this property is empty
(`[]`

).

**Data Types: **`single`

| `double`

`CategoricalLevels`

— Multivariate multinomial levelscell array

This property is read-only.

Multivariate multinomial levels, specified as a cell array. The length of
`CategoricalLevels`

is equal to the number of
predictors (`size(X,2)`

).

The cells of `CategoricalLevels`

correspond to predictors
that you specify as `'mvmn'`

during training, that is, they
have a multivariate multinomial distribution. Cells that do not correspond
to a multivariate multinomial distribution are empty
(`[]`

).

If predictor *j* is multivariate multinomial, then
`CategoricalLevels{`

*j*`}`

is a list of all distinct values of predictor *j* in the
sample. `NaN`

s are removed from
`unique(X(:,j))`

.

`X`

— Unstandardized predictorsnumeric matrix

This property is read-only.

Unstandardized predictors used to train the naive Bayes classifier, specified as a
numeric matrix. Each row of `X`

corresponds to one observation, and
each column corresponds to one variable. The software excludes observations containing
at least one missing value, and removes corresponding elements from Y.

`DistributionNames`

— Predictor distributions`'normal'`

(default) | `'kernel'`

| `'mn'`

| `'mvmn'`

| cell array of character vectorsThis property is read-only.

Predictor distributions, specified as a character vector or cell array of
character vectors. `fitcnb`

uses the predictor
distributions to model the predictors. This table lists the available
distributions.

Value | Description |
---|---|

`'kernel'` | Kernel smoothing density estimate |

`'mn'` | Multinomial distribution. If you specify
`mn` , then all features are
components of a multinomial distribution.
Therefore, you cannot include
`'mn'` as an element of a string
array or a cell array of character vectors. For
details, see Estimated Probability for Multinomial Distribution. |

`'mvmn'` | Multivariate multinomial distribution. For details, see Estimated Probability for Multivariate Multinomial Distribution. |

`'normal'` | Normal (Gaussian) distribution |

If `DistributionNames`

is a 1-by-*P* cell
array of character vectors, then `fitcnb`

models the feature
*j* using the distribution in element
*j* of the cell array.

**Example: **`'mn'`

**Example: **`{'kernel','normal','kernel'}`

**Data Types: **`char`

| `string`

| `cell`

`DistributionParameters`

— Distribution parameter estimatescell array

This property is read-only.

Distribution parameter estimates, specified as a cell array.
`DistributionParameters`

is a
*K*-by-*D* cell array, where cell
(*k*,*d*) contains the distribution parameter
estimates for instances of predictor *d* in class *k*.
The order of the rows corresponds to the order of the classes in the property
`ClassNames`

, and the order of the predictors corresponds to the
order of the columns of `X`

.

If class * k* has no observations for predictor

`j`

`Distribution{``k`

,`j`

}

is empty (`[]`

).The elements of `DistributionParameters`

depend on the distributions
of the predictors. This table describes the values in
`DistributionParameters{`

.* k*,

`j`

Distribution of Predictor
j | Value of Cell Array for Predictor
`j` and Class `k` |
---|---|

`kernel` | A `KernelDistribution` model.
Display properties using cell indexing and dot notation. For
example, to display the estimated bandwidth of the kernel density
for predictor 2 in the third class, use
`Mdl.DistributionParameters{3,2}.BandWidth` . |

`mn` | A scalar representing the probability that token
j appears in class k. For
details, see Estimated Probability for Multinomial Distribution. |

`mvmn` | A numeric vector containing the probabilities for each possible
level of predictor j in class
k. The software orders the probabilities by
the sorted order of all unique levels of predictor
j (stored in the property
`CategoricalLevels` ). For more details, see
Estimated Probability for Multivariate Multinomial Distribution. |

`normal` | A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation. |

`Kernel`

— Kernel smoother type`'normal'`

(default) | `'box'`

| cell array | ...This property is read-only.

Kernel smoother type, specified as the name of a kernel or a cell array of kernel
names. The length of `Kernel`

is equal to the number of predictors
(`size(X,2)`

).
`Kernel{`

*j*`}`

corresponds to
predictor *j* and contains a character vector describing the type of
kernel smoother. If a cell is empty (`[]`

), then `fitcnb`

did not fit a kernel distribution to the corresponding
predictor.

This table describes the supported kernel smoother types.
*I*{*u*} denotes the indicator function.

Value | Kernel | Formula |
---|---|---|

`'box'` | Box (uniform) |
$$f(x)=0.5I\left\{\left|x\right|\le 1\right\}$$ |

`'epanechnikov'` | Epanechnikov |
$$f(x)=0.75\left(1-{x}^{2}\right)I\left\{\left|x\right|\le 1\right\}$$ |

`'normal'` | Gaussian |
$$f(x)=\frac{1}{\sqrt{2\pi}}\mathrm{exp}\left(-0.5{x}^{2}\right)$$ |

`'triangle'` | Triangular |
$$f(x)=\left(1-\left|x\right|\right)I\left\{\left|x\right|\le 1\right\}$$ |

**Example: **`'box'`

**Example: **`{'epanechnikov','normal'}`

**Data Types: **`char`

| `string`

| `cell`

`Support`

— Kernel smoother density supportcell array

This property is read-only.

Kernel smoother density support, specified as a cell array. The length of
`Support`

is equal to the number of predictors
(`size(X,2)`

). The cells represent the regions to which
`fitcnb`

applies the kernel density. If a cell is empty
(`[]`

), then `fitcnb`

did not fit a kernel distribution to the corresponding
predictor.

This table describes the supported options.

Value | Description |
---|---|

1-by-2 numeric row vector | The density support applies to the specified bounds, for example
`[L,U]` , where `L` and
`U` are the finite lower and upper bounds,
respectively. |

`'positive'` | The density support applies to all positive real values. |

`'unbounded'` | The density support applies to all real values. |

`Width`

— Kernel smoother window widthnumeric matrix

This property is read-only.

Kernel smoother window width, specified as a numeric matrix.
`Width`

is a
*K*-by-*P* matrix, where
*K* is the number of classes in the data, and
*P* is the number of predictors
(`size(X,2)`

).

`Width(`

is the kernel smoother window width for the kernel smoothing density of
predictor * k*,

`j`

`j`

`k`

`NaN`

s in column
`j`

`fitcnb`

did not fit
predictor `j`

`ClassNames`

— Unique class namescategorical array | character array | logical vector | numeric vector | cell array of character vectors

This property is read-only.

Unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

`ClassNames`

has the same data type as `Y`

, and
has *K* elements (or rows) for character arrays. (The software treats string arrays as cell arrays of character
vectors.)

**Data Types: **`categorical`

| `char`

| `string`

| `logical`

| `double`

| `cell`

`ResponseName`

— Response variable namecharacter vector

This property is read-only.

Response variable name, specified as a character vector.

**Data Types: **`char`

| `string`

`Y`

— Class labelscategorical array | character array | logical vector | numeric vector | cell array of character vectors

This property is read-only.

Class labels used to train the naive Bayes classifier, specified as a categorical
or character array, logical or numeric vector, or cell array of character vectors.
Each row of `Y`

represents the observed classification of the
corresponding row of `X`

.

`Y`

has the same data type as the data in `Y`

used for training the model. (The software treats string arrays as cell arrays of character
vectors.)

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

| `cell`

| `categorical`

`ModelParameters`

— Parameter values used to train modelstructure array

This property is read-only.

Parameter values used to train the `ClassificationNaiveBayes`

model, specified as a structure array. `ModelParameters`

contains
parameter values such as the name-value pair argument values used to train the naive
Bayes classifier.

Access the fields of `ModelParameters`

by using dot notation. For
example, access the kernel support using
`Mdl.ModelParameters.Support`

.

`NumObservations`

— Number of training observationsnumeric scalar

This property is read-only.

Number of training observations in the training data stored in
`X`

and `Y`

, specified as a numeric
scalar.

`Prior`

— Prior probabilitiesnumeric vector

Prior probabilities, specified as a numeric vector. The order of the elements in
`Prior`

corresponds to the elements of
`Mdl.ClassNames`

.

`fitcnb`

normalizes the prior probabilities
you set using the `'Prior'`

name-value pair argument, so that
`sum(Prior)`

= `1`

.

The value of `Prior`

does not affect the best-fitting model.
Therefore, you can reset `Prior`

after training `Mdl`

using dot notation.

**Example: **`Mdl.Prior = [0.2 0.8]`

**Data Types: **`double`

| `single`

`W`

— Observation weightsvector of nonnegative values

This property is read-only.

Observation weights, specified as a vector of nonnegative values with the same
number of rows as `Y`

. Each entry in `W`

specifies the relative importance of the corresponding observation in
`Y`

. `fitcnb`

normalizes the value you set for
the `'Weights'`

name-value pair argument, so that the weights within
a particular class sum to the prior probability for that class.

`Cost`

— Misclassification costsquare matrix

Misclassification cost, specified as a numeric square matrix, where
`Cost(i,j)`

is the cost of classifying a point into class
`j`

if its true class is `i`

. The rows correspond
to the true class and the columns correspond to the predicted class. The order of the
rows and columns of `Cost`

corresponds to the order of the classes in
`ClassNames`

.

The misclassification cost matrix must have zeros on the diagonal.

The value of `Cost`

does not influence training. You can reset
`Cost`

after training `Mdl`

using dot
notation.

**Example: **`Mdl.Cost = [0 0.5 ; 1 0]`

**Data Types: **`double`

| `single`

`HyperparameterOptimizationResults`

— Cross-validation optimization of hyperparameters`BayesianOptimization`

object | tableThis property is read-only.

Cross-validation optimization of hyperparameters, specified as a `BayesianOptimization`

object or a table of hyperparameters and associated
values. This property is nonempty if the `'OptimizeHyperparameters'`

name-value pair argument is nonempty when you create the model. The value of
`HyperparameterOptimizationResults`

depends on the setting of the
`Optimizer`

field in the
`HyperparameterOptimizationOptions`

structure when you create the
model.

Value of `Optimizer` Field | Value of `HyperparameterOptimizationResults` |
---|---|

`'bayesopt'` (default) | Object of class `BayesianOptimization` |

`'gridsearch'` or `'randomsearch'` | Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) |

`ScoreTransform`

— Classification score transformation`'none'`

(default) | `'doublelogit'`

| `'invlogit'`

| `'ismax'`

| `'logit'`

| function handle | ...Classification score transformation, specified as a character vector or function handle. This table summarizes the available character vectors.

Value | Description |
---|---|

`'doublelogit'` | 1/(1 + e^{–2x}) |

`'invlogit'` | log(x / (1 – x)) |

`'ismax'` | Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 |

`'logit'` | 1/(1 + e^{–x}) |

`'none'` or `'identity'` | x (no transformation) |

`'sign'` | –1 for x < 00 for x = 01 for x >
0 |

`'symmetric'` | 2x – 1 |

`'symmetricismax'` | Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 |

`'symmetriclogit'` | 2/(1 + e^{–x})
– 1 |

For a MATLAB^{®} function or a function you define, use its function handle for the score
transformation. The function handle must accept a matrix (the original scores) and
return a matrix of the same size (the transformed scores).

**Example: **`Mdl.ScoreTransform = 'logit'`

**Data Types: **`char`

| `string`

| `function handle`

`compact` | Reduce size of naive Bayes classifier |

`crossval` | Cross-validate naive Bayes classifier |

`edge` | Classification edge for naive Bayes classifier |

`logp` | Log unconditional probability density for naive Bayes classifier |

`loss` | Classification loss for naive Bayes classifier |

`margin` | Classification margins for naive Bayes classifier |

`partialDependence` | Compute partial dependence |

`plotPartialDependence` | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |

`predict` | Classify observations using naive Bayes classifier |

`resubEdge` | Resubstitution classification edge for naive Bayes classifier |

`resubLoss` | Resubstitution classification loss for naive Bayes classifier |

`resubMargin` | Resubstitution classification margins for naive Bayes classifier |

`resubPredict` | Classify observations using naive Bayes classifier |

Create a naive Bayes classifier for Fisher's iris data set. Then, specify prior probabilities after training the classifier.

Load the f`isheriris`

data set. Create `X`

as a numeric matrix that contains four petal measurements for 150 irises. Create `Y`

as a cell array of character vectors that contains the corresponding iris species.

```
load fisheriris
X = meas;
Y = species;
```

Train a naive Bayes classifier using the predictors `X`

and class labels `Y`

. `fitcnb`

assumes each predictor is independent and fits each predictor using a normal distribution by default.

Mdl = fitcnb(X,Y)

Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods

`Mdl`

is a trained `ClassificationNaiveBayes`

classifier. Some of the `Mdl`

properties appear in the Command Window.

Display the properties of `Mdl`

using dot notation. For example, display the class names and prior probabilities.

Mdl.ClassNames

`ans = `*3x1 cell*
{'setosa' }
{'versicolor'}
{'virginica' }

Mdl.Prior

`ans = `*1×3*
0.3333 0.3333 0.3333

The order of the class prior probabilities in `Mdl.Prior`

corresponds to the order of the classes in `Mdl.ClassNames`

. By default, the prior probabilities are the respective relative frequencies of the classes in the data. Alternatively, you can set the prior probabilities when calling `fitcnb`

by using the '`Prior'`

name-value pair argument.

Set the prior probabilities after training the classifier by using dot notation. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.

Mdl.Prior = [0.5 0.2 0.3];

You can now use this trained classifier to perform additional tasks. For example, you can label new measurements using `predict`

or cross-validate the classifier using `crossval`

.

Train and cross-validate a naive Bayes classifier. `fitcnb`

implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error.

Load the `ionosphere`

data set. Remove the first two predictors for stability.

load ionosphere X = X(:,3:end); rng('default') % for reproducibility

Train and cross-validate a naive Bayes classifier using the predictors `X`

and class labels `Y`

. A recommended practice is to specify the class names. `fitcnb`

assumes that each predictor is conditionally and normally distributed.

CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')

CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

`CVMdl`

is a `ClassificationPartitionedModel`

cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained `ClassificationNaiveBayes`

model by passing it to `crossval`

.

Display the first training fold of `CVMdl`

using dot notation.

CVMdl.Trained{1}

ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods

Each fold is a `CompactClassificationNaiveBayes`

model trained on 90% of the data.

Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing `CVMdl`

to `kfoldLoss`

.

genError = kfoldLoss(CVMdl)

genError = 0.1852

On average, the generalization error is approximately 19%.

You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

In the bag-of-tokens model, the value of predictor *j* is
the nonnegative number of occurrences of token *j* in the observation. The
number of categories (bins) in the multinomial model is the number of distinct tokens
(number of predictors).

*Naive Bayes* is a classification
algorithm that applies density estimation to the data.

The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].

Naive Bayes classifiers assign observations to the most probable class (in other words, the
*maximum a posteriori* decision rule). Explicitly, the algorithm
takes these steps:

Estimate the densities of the predictors within each class.

Model posterior probabilities according to Bayes rule. That is, for all

*k*= 1,...,*K*,$$\widehat{P}\left(Y=k|{X}_{1},\mathrm{..},{X}_{P}\right)=\frac{\pi \left(Y=k\right){\displaystyle \prod _{j=1}^{P}P}\left({X}_{j}|Y=k\right)}{{\displaystyle \sum}_{k=1}^{K}\pi \left(Y=k\right){\displaystyle \prod _{j=1}^{P}P}\left({X}_{j}|Y=k\right)},$$

where:

*Y*is the random variable corresponding to the class index of an observation.*X*_{1},...,*X*are the random predictors of an observation._{P}$$\pi \left(Y=k\right)$$ is the prior probability that a class index is

*k*.

Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probability$$\widehat{P}\left(Y=k|{X}_{1},\mathrm{..},{X}_{P}\right)\propto \pi \left(Y=k\right){P}_{mn}\left({X}_{1},\mathrm{...},{X}_{P}|Y=k\right),$$ where $${P}_{mn}\left({X}_{1},\mathrm{...},{X}_{P}|Y=k\right)$$ is the probability mass function of a multinomial distribution.

If you specify `'DistributionNames','mn'`

when training
`Mdl`

using `fitcnb`

, then the software fits a multinomial distribution using the Bag-of-Tokens Model. The software stores the
probability that token * j* appears in class

`k`

`DistributionParameters{``k`

,`j`

}

.
With additive smoothing [2], the estimated
probability is$$P(\text{token}j|\text{class}k)=\frac{1+{c}_{j|k}}{P+{c}_{k}},$$

where:

$${c}_{j|k}={n}_{k}\frac{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}{x}_{ij}}{w}_{i}^{}}{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}{w}_{i}}},$$ which is the weighted number of occurrences of token

*j*in class*k*.*n*is the number of observations in class_{k}*k*.$${w}_{i}^{}$$ is the weight for observation

*i*. The software normalizes weights within a class so that they sum to the prior probability for that class.$${c}_{k}={\displaystyle \sum _{j=1}^{P}{c}_{j|k}},$$ which is the total weighted number of occurrences of all tokens in class

*k*.

If you specify `'DistributionNames','mvmn'`

when training
`Mdl`

using `fitcnb`

, then the software takes these steps:

For each predictor, the software collects a list of the unique levels, stores the sorted list in

`CategoricalLevels`

, and considers each level a bin. Each combination of predictor and class is a separate, independent multinomial random variable.For predictor

in class`j`

*k*, the software counts instances of each categorical level using the list stored in`CategoricalLevels{`

.}`j`

The software stores the probability that predictor

in class`j`

has level`k`

*L*in the property`DistributionParameters{`

, for all levels in,`k`

}`j`

`CategoricalLevels{`

. With additive smoothing [2], the estimated probability is}`j`

$$P\left(\text{predictor}j=L|\text{class}k\right)=\frac{1+{m}_{j|k}(L)}{{m}_{j}+{m}_{k}},$$

where:

$${m}_{j|k}(L)={n}_{k}\frac{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}I\{{x}_{ij}=L\}{w}_{i}^{}}}{{\displaystyle \sum _{i:{y}_{i}\in \text{class}k}^{}{w}_{i}^{}}},$$ which is the weighted number of observations for which predictor

*j*equals*L*in class*k*.*n*is the number of observations in class_{k}*k*.$$I\left\{{x}_{ij}=L\right\}=1$$ if

*x*=_{ij}*L*, and 0 otherwise.$${w}_{i}^{}$$ is the weight for observation

*i*. The software normalizes weights within a class so that they sum to the prior probability for that class.*m*is the number of distinct levels in predictor_{j}*j*.*m*is the weighted number of observations in class_{k}*k*.

[1] Hastie, Trevor, Robert Tibshirani,
and Jerome Friedman. *The Elements of Statistical Learning: Data Mining, Inference,
and Prediction*. 2nd ed. Springer Series in Statistics. New York, NY: Springer,
2009. https://doi.org/10.1007/978-0-387-84858-7.

[2] Manning, Christopher D., Prabhakar
Raghavan, and Hinrich Schütze. *Introduction to Information
Retrieval*, NY: Cambridge University Press, 2008.

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The

`predict`

function supports code generation.When you train a naive Bayes model by using

`fitcnb`

, the following restrictions apply.The class labels input argument value (

`Y`

) cannot be a categorical array.Code generation does not support categorical predictors (

`logical`

,`categorical`

,`char`

,`string`

, or`cell`

). If you supply training data in a table, the predictors must be numeric (`double`

or`single`

). Also, you cannot use the`'CategoricalPredictors'`

name-value pair argument.The value of the

`'DistributionNames'`

name-value pair argument cannot contain`'mn'`

or`'mvmn'`

.The value of the

`'ClassNames'`

name-value pair argument cannot be a categorical array.The value of the

`'ScoreTransform'`

name-value pair argument cannot be an anonymous function.

For more information, see Introduction to Code Generation.

A modified version of this example exists on your system. Do you want to open this version instead?

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)