Documentation

# ClassificationNaiveBayes class

Superclasses: `CompactClassificationNaiveBayes`

Naive Bayes classification

## Description

`ClassificationNaiveBayes` is a naive Bayes classifier for multiclass learning. Use `fitcnb` and the training data to train a `ClassificationNaiveBayes` classifier.

Trained `ClassificationNaiveBayes` classifiers store the training data, parameter values, data distribution, and prior probabilities. You can use these classifiers to:

## Construction

Create a `ClassificationNaiveBayes` object by using `fitcnb`.

## Properties

expand all

Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty (`[]`).

Data Types: `single` | `double`

Multivariate multinomial levels, specified as a cell vector of numeric vectors. `CategoricalLevels` has length equal to the number of predictors (`size(X,2)`).

The cells of `CategoricalLevels` correspond to predictors that you specified as `'mvmn'` (i.e., having a multivariate multinomial distribution) during training. Cells that do not correspond to a multivariate multinomial distribution are empty (`[]`).

If predictor j is multivariate multinomial, then `CategoricalLevels{`j`}` is a list of all distinct values of predictor j in the sample (`NaN`s removed from `unique(X(:,j))`).

Data Types: `cell`

Distinct class names, specified as a categorical or character array, logical or numeric vector, or cell vector of character vectors.

`ClassNames` is the same data type as `Y`, and has K elements or rows for character arrays. (The software treats string arrays as cell arrays of character vectors.)

Data Types: `categorical` | `char` | `logical` | `single` | `double` | `cell`

Misclassification cost, specified as a K-by-K square matrix.

The value of `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`. The order of the rows and columns of `Cost` correspond to the order of the classes in `ClassNames`.

The value of `Cost` does not influence training. You can reset `Cost` after training `Mdl` using dot notation, e.g., `Mdl.Cost = [0 0.5; 1 0];`.

Data Types: `double` | `single`

Predictor distributions `fitcnb` uses to model the predictors, specified as a character vector or cell array of character vectors.

This table summarizes the available distributions.

ValueDescription
`'kernel'`Kernel smoothing density estimate.
`'mn'`Multinomial bag-of-tokens model. Indicates that all predictors have this distribution.
`'mvmn'`Multivariate multinomial distribution.
`'normal'`Normal (Gaussian) distribution.

If `Distribution` is a 1-by-P cell array of character vectors, then the software models feature j using the distribution in element j of the cell array.

Data Types: `char` | `cell`

Distribution parameter estimates, specified as a cell array. `DistributionParameters` is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property `ClassNames`, and the order of the predictors corresponds to the order of the columns of `X`.

If class `k` has no observations for predictor `j`, then `Distribution{k,j}` is empty (`[]`).

The elements of `DistributionParameters` depends on the distributions of the predictors. This table describes the values in `DistributionParameters{k,j}`.

Distribution of Predictor jValue
`kernel`A `KernelDistribution` model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use `Mdl.DistributionParameters{3,2}.BandWidth`.
`mn`A scalar representing the probability that token j appears in class k. For details, see Algorithms.
`mvmn`A numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property `CategoricalLevels`). For more details, see Algorithms.
`normal`A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation.

Data Types: `cell`

Expanded predictor names, stored as a cell array of character vectors.

If the model uses encoding for categorical variables, then `ExpandedPredictorNames` includes the names that describe the expanded variables. Otherwise, `ExpandedPredictorNames` is the same as `PredictorNames`.

Data Types: `cell`

Description of the cross-validation optimization of hyperparameters, specified as a `BayesianOptimization` object or a table of hyperparameters and associated values. This property is nonempty if the `'OptimizeHyperparameters'` name-value pair argument is nonempty when you create the model. The value of `HyperparameterOptimizationResults` depends on the setting of the `Optimizer` field in the `HyperparameterOptimizationOptions` structure when you create the model, as described in this table.

Value of `Optimizer` FieldValue of `HyperparameterOptimizationResults`
`'bayesopt'` (default)Object of class `BayesianOptimization`
`'gridsearch'` or `'randomsearch'`Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Kernel smoother types, specified as a character vector or cell array of character vectors. `Kernel` has length equal to the number of predictors (`size(X,2)`). `Kernel{`j`}` corresponds to predictor j, and contains a character vector describing the type of kernel smoother. This table describes the supported kernel smoother types. Let I{u} denote the indicator function.

ValueKernelFormula
`'box'`Box (uniform)

`$f\left(x\right)=0.5I\left\{|x|\le 1\right\}$`

`'epanechnikov'`Epanechnikov

`$f\left(x\right)=0.75\left(1-{x}^{2}\right)I\left\{|x|\le 1\right\}$`

`'normal'`Gaussian

`$f\left(x\right)=\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-0.5{x}^{2}\right)$`

`'triangle'`Triangular

`$f\left(x\right)=\left(1-|x|\right)I\left\{|x|\le 1\right\}$`

If a cell is empty (`[]`), then the software did not fit a kernel distribution to the corresponding predictor.

Data Types: `char` | `cell`

Parameter values used to train the classifier (such as the name-value pair argument values), specified as an object. This table summarizes the properties of `ModelParameters`. The properties correspond to the name-value pair argument values set for training the classifier.

PropertyPurpose
`DistributionNames`Data distribution or distributions. This is the same value as the property `DistributionNames`.
`Kernel`Kernel smoother type. This is the same as the property `Kernel`.
`Method`Training method. For naive Bayes, the value is `'NaiveBayes'`.
`Support`Kernel-smoothing density support. This is the same as the property `Support`.
`Type`Learning type. For classification, the value is `'classification'`.
`Width`Kernel smoothing window width. This is the same as the property `Width`.

Access fields of `ModelParameters` using dot notation. For example, access the kernel support using `Mdl.ModelParameters.Support`.

Number of training observations, specified as a numeric scalar.

If `X` or `Y` contain missing values, then `NumObservations` might be less than the length of `Y`.

Data Types: `double`

Predictor names, specified as a cell array of character vectors. The order of the elements in `PredictorNames` corresponds to the order in `X`.

Data Types: `cell`

Class prior probabilities, specified as a numeric row vector. `Prior` is a 1-by-K vector, and the order of its elements correspond to the elements of `ClassNames`.

`fitcnb` normalizes the prior probabilities you set using the name-value pair parameter `'Prior'` so that `sum(Prior)` = `1`.

The value of `Prior` does not change the best-fitting model. Therefore, you can reset `Prior` after training `Mdl` using dot notation, e.g., `Mdl.Prior = [0.2 0.8];`.

Data Types: `double` | `single`

Response name, specified as a character vector.

Data Types: `char`

Classification score transformation function, specified as a character vector or function handle.

To change the score transformation function to e.g., `function`, use dot notation.

• For a built-in function, enter this code and replace `function` with a value in the table.

`Mdl.ScoreTransform = 'function';`

ValueDescription
`'doublelogit'`1/(1 + e–2x)
`'invlogit'`log(x / (1 – x))
`'ismax'`Sets the score for the class with the largest score to `1`, and sets the scores for all other classes to `0`
`'logit'`1/(1 + ex)
`'none'` or `'identity'`x (no transformation)
`'sign'`–1 for x < 0
0 for x = 0
1 for x > 0
`'symmetric'`2x – 1
`'symmetricismax'`Sets the score for the class with the largest score to `1`, and sets the scores for all other classes to `–1`
`'symmetriclogit'`2/(1 + ex) – 1

• For a MATLAB® function, or a function that you define, enter its function handle.

`Mdl.ScoreTransform = @function;`

`function` should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Data Types: `char` | `function_handle`

Kernel smoother density support, specified as a cell vector. `Support` has length equal to the number of predictors (`size(X,2)`). The cells represent the regions to apply the kernel density.

This table describes the supported options.

ValueDescription
1-by-2 numeric row vectorFor example, `[L,U]`, where `L` and `U` are the finite lower and upper bounds, respectively, for the density support.
`'positive'`The density support is all positive real values.
`'unbounded'`The density support is all real values.

If a cell is empty (`[]`), then the software did not fit a kernel distribution to the corresponding predictor.

Observation weights, specified as a numeric vector.

The length of `W` is `NumObservations`.

`fitcnb` normalizes the value you set for the name-value pair parameter `'Weights'` so that the weights within a particular class sum to the prior probability for that class.

Data Types: `double`

Kernel smoother window width, specified as a numeric matrix. `Width` is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (`size(X,2)`).

`Width(k,j)` is the kernel smoother window width for the kernel smoothing density of predictor `j` within class `k`. `NaN`s in column `j` indicate that the software did not fit predictor `j` using a kernel density.

Unstandardized predictor data, specified as a numeric matrix. `X` has `NumObservations` rows and P columns.

Each row of `X` corresponds to one observation, and each column corresponds to one variable.

The software excludes rows removed due to missing values from `X`.

Data Types: `double`

Observed class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `Y` is the same data type as the input argument `Y` of `fitcnb`. (The software treats string arrays as cell arrays of character vectors.)

Each row of `Y` represents the observed classification of the corresponding row of `X`.

The software excludes elements removed due to missing values from `Y`.

Data Types: `categorical` | `char` | `logical` | `single` | `double` | `cell`

## Methods

 compact Compact naive Bayes classifier crossval Cross-validated naive Bayes classifier resubEdge Classification edge for naive Bayes classifiers by resubstitution resubLoss Classification loss for naive Bayes classifiers by resubstitution resubMargin Classification margins for naive Bayes classifiers by resubstitution resubPredict Predict resubstitution labels of naive Bayes classifier

### Inherited Methods

 edge Classification edge for naive Bayes classifiers logP Log unconditional probability density for naive Bayes classifier loss Classification error for naive Bayes classifier margin Classification margins for naive Bayes classifiers predict Predict labels using naive Bayes classification model

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

## Examples

collapse all

Construct a naive Bayes classifier for Fisher's iris data. Also, specify prior probabilities after training.

```load fisheriris X = meas; Y = species;```

`X` is a numeric matrix that contains four petal measurements for 150 irises. `Y` is a cell array of character vectors that contains the corresponding iris species.

Train a naive Bayes classifier.

`Mdl = fitcnb(X,Y)`
```Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods ```

`Mdl` is a trained `ClassificationNaiveBayes` classifier, and some of its properties display in the Command Window. By default, the software treats each predictor as independent, and fits them using normal distributions.

To access the properties of `Mdl`, use dot notation.

`Mdl.ClassNames`
```ans = 3x1 cell array {'setosa' } {'versicolor'} {'virginica' } ```
`Mdl.Prior`
```ans = 1×3 0.3333 0.3333 0.3333 ```

`Mdl.Prior` contains the class prior probabilities, which are settable using the name-value pair argument `'Prior'` in `fitcnb`. The order of the class prior probabilities corresponds to the order of the classes in `Mdl.ClassNames`. By default, the prior probabilities are the respective relative frequencies of the classes in the data.

You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3 respectively.

`Mdl.Prior = [0.5 0.2 0.3];`

You can pass `Mdl` to e.g., `predict` to label new measurements, or `crossval` to cross validate the classifier.

expand all

## Algorithms

• If you specify `'DistributionNames','mn'` when training `Mdl` using `fitcnb`, then the software fits a multinomial distribution using the bag-of-tokens model. The software stores the probability that token `j` appears in class `k` in the property `DistributionParameters{k,j}`. Using additive smoothing [2], the estimated probability is

where:

• which is the weighted number of occurrences of token j in class k.

• nk is the number of observations in class k.

• ${w}_{i}^{}$ is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.

• ${c}_{k}=\sum _{j=1}^{P}{c}_{j|k};$ which is the total weighted number of occurrences of all tokens in class k.

• If you specify `'DistributionNames','mvmn'` when training `Mdl` using `fitcnb`, then:

1. For each predictor, the software collects a list of the unique levels, stores the sorted list in `CategoricalLevels`, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.

2. For predictor `j` in class k, the software counts instances of each categorical level using the list stored in `CategoricalLevels{j}`.

3. The software stores the probability that predictor `j`, in class `k`, has level L in the property `DistributionParameters{k,j}`, for all levels in `CategoricalLevels{j}`. Using additive smoothing [2], the estimated probability is

where:

• which is the weighted number of observations for which predictor j equals L in class k.

• nk is the number of observations in class k.

• $I\left\{{x}_{ij}=L\right\}=1$ if xij = L, 0 otherwise.

• ${w}_{i}^{}$ is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.

• mj is the number of distinct levels in predictor j.

• mk is the weighted number of observations in class k.

## References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[2] Manning, C. D., P. Raghavan, and M. Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.