# Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

# fitNaiveBayes

(to be removed) Train naive Bayes classifier

`fitNaiveBayes` will be removed in a future release. Use `fitcnb` instead.

## Syntax

``NBModel = fitNaiveBayes(X,Y)``
``NBModel = fitNaiveBayes(X,Y,Name,Value)``

## Description

````NBModel = fitNaiveBayes(X,Y)` returns a naive Bayes classifier `NBModel`, trained by predictors `X` and class labels `Y` for K-level classification.Predict labels for new data by passing the data and `NBModel` to `predict`.```
````NBModel = fitNaiveBayes(X,Y,Name,Value)` returns a naive Bayes classifier with additional options specified by one or more `Name,Value` pair arguments.For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.```

## Input Arguments

collapse all

Predictor data, specified as a numeric matrix.

Each row of `X` corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature).

The length of `Y` and the number of rows of `X` must be equivalent.

Data Types: `double`

Class labels to which the naive Bayes classifier is trained, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each element of `Y` defines the class membership of the corresponding row of `X`. `Y` supports K class levels.

If `Y` is a character array, then each row must correspond to one class label.

The length of `Y` and the number of rows of `X` must be equivalent.

Data Types: `cell` | `char` | `double` | `logical`

### Note:

The software treats `NaN`, empty character vector (`''`), and `<undefined>` elements as missing values.

• If `Y` contains missing values, then the software removes them and the corresponding rows of `X`.

• If `X` contains any rows composed entirely of missing values, then the software removes those rows and the corresponding elements of `Y`.

• If `X` contains missing values and you set `'Distribution','mn'`, then the software removes those rows of `X` and the corresponding elements of `Y`.

• If a predictor is not represented in a class, that is, if all of its values are `NaN` within a class, then the software returns an error.

Removing rows of `X` and corresponding elements of `Y` decreases the effective training or cross-validation sample size.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Distribution','mn','Prior','uniform','KSWidth',0.5` specifies the following: the data distribution is multinomial, the prior probabilities for all classes are equal, and the kernel smoothing window bandwidth for all classes is `0.5` units.

collapse all

Data distributions `fitNaiveBayes` uses to model the data, specified as the comma-separated pair consisting of `'Distribution'` and `'kernel'`, `'mn'`, `'mvmn'`, `'normal'`, or a cell array of character vectors.

This table summarizes the available distributions.

ValueDescription
`'kernel'`Kernel smoothing density estimate.
`'mn'`Multinomial distribution. If you specify `mn`, then all features are components of a multinomial distribution. Therefore, you cannot include `'mn'` as an element of a cell array of character vectors. For details, see Algorithms.
`'mvmn'`Multivariate multinomial distribution. For details, see Algorithms.
`'normal'`Normal (Gaussian) distribution.

If you specify a character vector, then the software models all the features using that distribution. If you specify a 1-by-D cell array of character vectors, then the software models feature j using the distribution in element j of the cell array.

Example: `'Distribution',{'kernel','normal'}`

Data Types: `cell` | `char`

Kernel smoothing density support, specified as the comma-separated pair consisting of `'KSSupport'` and a numeric row vector, `'positive'`, `'unbounded'`, or a cell array. The software applies the kernel smoothing density to this region.

If you do not specify `'Distribution','kernel'`, then the software ignores the values of `'KSSupport'`, `'KSType'`, and `'KSWidth'`.

This table summarizes the available options for setting the kernel smoothing density region.

ValueDescription
1-by-2 numeric row vectorFor example, `[L,U]`, where `L` and `U` are the finite lower and upper bounds, respectively, for the density support.
`'positive'`The density support is all positive real values.
`'unbounded'`The density support is all real values.

If you specify a 1-by-D cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel support in cell j for feature j in `X`.

Example: `'KSSupport',{[-10,20],'unbounded'}`

Data Types: `cell` | `char` | `double`

Kernel smoother type, specified as the comma-separated pair consisting of `'KSType'` and `'box'`, `'epanechnikov'`, `'normal'`, `'triangle'`, or a cell array of character vectors.

If you do not specify `'Distribution','kernel'`, then the software ignores the values of `'KSSupport'`, `'KSType'`, and `'KSWidth'`.

This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indictor function.

ValueKernelFormula
`'box'`Box (uniform)

`$f\left(x\right)=0.5I\left\{|x|\le 1\right\}$`

`'epanechnikov'`Epanechnikov

`$f\left(x\right)=0.75\left(1-{x}^{2}\right)I\left\{|x|\le 1\right\}$`

`'normal'`Gaussian

`$f\left(x\right)=\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-0.5{x}^{2}\right)$`

`'triangle'`Triangular

`$f\left(x\right)=\left(1-|x|\right)I\left\{|x|\le 1\right\}$`

If you specify a 1-by-D cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel smoother type in cell j for feature j in `X`.

Example: `'KSType',{'epanechnikov','normal'}`

Data Types: `cell` | `char`

Kernel smoothing window bandwidth, specified as the comma-separated pair consisting of `'KSWidth'` and a matrix of numeric values, numeric row vector, numeric column vector, scalar, or structure array.

If you do not specify `'Distribution','kernel'`, then the software ignores the values of `'KSSupport'`, `'KSType'`, and `'KSWidth'`.

Suppose there are K class levels and D predictors. This table summarizes the available options for setting the kernel smoothing window bandwidth.

ValueDescription
K-by-D matrix of numeric valuesElement (k,d) specifies the bandwidth for predictor d in class k.
K-by-1 numeric column vectorElement k specifies the bandwidth for all predictors in class k.
1-by-D numeric row vectorElement d specifies the bandwidth in all class levels for predictor d.
scalarSpecifies the bandwidth for all features in all classes.
structure arrayA structure array `S` containing class levels and their bandwidths. `S` must have two fields:
• `S.width`: A numeric row vector of bandwidths, or a matrix of numeric values with D columns.

• `S.group`: A vector of the same type as `Y`, containing unique class levels indicating the class for the corresponding element of `S.width`.

By default, the software selects a default bandwidth automatically for each combination of feature and class by using a value that is optimal for a Gaussian distribution.

Example: `'KSWidth',struct('width',[0.5,0.25],'group',{{'b';'g'}})`

Data Types: `double` | `struct`

Class prior probabilities, specified as the comma-separated pair consisting of `'Prior'` and a numeric vector, structure array, `'uniform'`, or `'empirical'`.

This table summarizes the available options for setting prior probabilities.

ValueDescription
`'empirical'`The software uses the class relative frequencies distribution for the prior probabilities.
numeric vector

A numeric vector of length K specifying the prior probabilities for each class. The order of the elements of `Prior` should correspond to the order of the class levels. For details on the order of the classes, see Algorithms.

The software normalizes prior probabilities to sum to `1`.

structure arrayA structure array `S` containing class levels and their prior probabilities. `S` must have two fields:
• `S.prob`: A numeric vector of prior probabilities. The software normalizes prior probabilities to sum to `1`.

• `S.group`: A vector of the same type as `Y` containing unique class levels indicating the class for the corresponding element of `S.prob`. `S.class` must contain all the K levels in `Y`. It can also contain classes that do not appear in `Y`. This can be useful if `X` is a subset of a larger training set. The software ignores any classes that appear in `S.group` but not in `Y`.

`'uniform'`The prior probabilities are equal for all classes.

Example: `'Prior',struct('prob',[1,2],'group',{{'b';'g'}})`

Data Types: `char` | `double` | `struct`

## Output Arguments

collapse all

Trained naive Bayes classifier, returned as a `NaiveBayes` classifier.

collapse all

### Bag-of-Tokens Model

In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors.

## Tips

• For classifying count-based data, such as the bag-of-tokens model, use the multinomial distribution (e.g., set `'Distribution','mn'`).

• This list defines the order of the classes. It is useful when you specify prior probabilities by setting `'Prior',prior`, where `prior` is a numeric vector.

• If `Y` is a categorical array, then the order of the class levels matches the output of `categories(Y)`.

• If `Y` is a numeric or logical vector, then the order of the class levels matches the output of `sort(unique(Y))`.

• For cell arrays of character vectors and character arrays, the order of the class labels is the order which each label appears in `Y`.

## Algorithms

• If you specify `'Distribution','mn'`, then the software considers each observation as multiple trials of a multinomial distribution, and considers each occurrence of a token as one trial (see Bag-of-Tokens Model).

• If you specify `'Distribution','mvmn'`, then the software assumes each individual predicator follows a multinomial model within a class. The parameters for a predictor include the probabilities of all possible values that the corresponding feature can take.