Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

fitNaiveBayes

(to be removed) Train naive Bayes classifier

fitNaiveBayes will be removed in a future release. Use fitcnb instead.

Syntax

NBModel = fitNaiveBayes(X,Y)
NBModel = fitNaiveBayes(X,Y,Name,Value)

Description

NBModel = fitNaiveBayes(X,Y) returns a naive Bayes classifier NBModel, trained by predictors X and class labels Y for K-level classification.

Predict labels for new data by passing the data and NBModel to predict.

NBModel = fitNaiveBayes(X,Y,Name,Value) returns a naive Bayes classifier with additional options specified by one or more Name,Value pair arguments.

For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.

Input Arguments

collapse all

Predictor data, specified as a numeric matrix.

Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature).

The length of Y and the number of rows of X must be equivalent.

Data Types: double

Class labels to which the naive Bayes classifier is trained, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each element of Y defines the class membership of the corresponding row of X. Y supports K class levels.

If Y is a character array, then each row must correspond to one class label.

The length of Y and the number of rows of X must be equivalent.

Data Types: cell | char | double | logical

    Note:   The software treats NaN, empty character vector (''), and <undefined> elements as missing values.

    • If Y contains missing values, then the software removes them and the corresponding rows of X.

    • If X contains any rows composed entirely of missing values, then the software removes those rows and the corresponding elements of Y.

    • If X contains missing values and you set 'Distribution','mn', then the software removes those rows of X and the corresponding elements of Y.

    • If a predictor is not represented in a class, that is, if all of its values are NaN within a class, then the software returns an error.

    Removing rows of X and corresponding elements of Y decreases the effective training or cross-validation sample size.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Distribution','mn','Prior','uniform','KSWidth',0.5 specifies the following: the data distribution is multinomial, the prior probabilities for all classes are equal, and the kernel smoothing window bandwidth for all classes is 0.5 units.

collapse all

Data distributions fitNaiveBayes uses to model the data, specified as the comma-separated pair consisting of 'Distribution' and 'kernel', 'mn', 'mvmn', 'normal', or a cell array of character vectors.

This table summarizes the available distributions.

ValueDescription
'kernel'Kernel smoothing density estimate.
'mn'Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a cell array of character vectors. For details, see Algorithms.
'mvmn'Multivariate multinomial distribution. For details, see Algorithms.
'normal'Normal (Gaussian) distribution.

If you specify a character vector, then the software models all the features using that distribution. If you specify a 1-by-D cell array of character vectors, then the software models feature j using the distribution in element j of the cell array.

Example: 'Distribution',{'kernel','normal'}

Data Types: cell | char

Kernel smoothing density support, specified as the comma-separated pair consisting of 'KSSupport' and a numeric row vector, 'positive', 'unbounded', or a cell array. The software applies the kernel smoothing density to this region.

If you do not specify 'Distribution','kernel', then the software ignores the values of 'KSSupport', 'KSType', and 'KSWidth'.

This table summarizes the available options for setting the kernel smoothing density region.

ValueDescription
1-by-2 numeric row vectorFor example, [L,U], where L and U are the finite lower and upper bounds, respectively, for the density support.
'positive'The density support is all positive real values.
'unbounded'The density support is all real values.

If you specify a 1-by-D cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel support in cell j for feature j in X.

Example: 'KSSupport',{[-10,20],'unbounded'}

Data Types: cell | char | double

Kernel smoother type, specified as the comma-separated pair consisting of 'KSType' and 'box', 'epanechnikov', 'normal', 'triangle', or a cell array of character vectors.

If you do not specify 'Distribution','kernel', then the software ignores the values of 'KSSupport', 'KSType', and 'KSWidth'.

This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indictor function.

ValueKernelFormula
'box'Box (uniform)

f(x)=0.5I{|x|1}

'epanechnikov'Epanechnikov

f(x)=0.75(1x2)I{|x|1}

'normal'Gaussian

f(x)=12πexp(0.5x2)

'triangle'Triangular

f(x)=(1|x|)I{|x|1}

If you specify a 1-by-D cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel smoother type in cell j for feature j in X.

Example: 'KSType',{'epanechnikov','normal'}

Data Types: cell | char

Kernel smoothing window bandwidth, specified as the comma-separated pair consisting of 'KSWidth' and a matrix of numeric values, numeric row vector, numeric column vector, scalar, or structure array.

If you do not specify 'Distribution','kernel', then the software ignores the values of 'KSSupport', 'KSType', and 'KSWidth'.

Suppose there are K class levels and D predictors. This table summarizes the available options for setting the kernel smoothing window bandwidth.

ValueDescription
K-by-D matrix of numeric valuesElement (k,d) specifies the bandwidth for predictor d in class k.
K-by-1 numeric column vectorElement k specifies the bandwidth for all predictors in class k.
1-by-D numeric row vectorElement d specifies the bandwidth in all class levels for predictor d.
scalarSpecifies the bandwidth for all features in all classes.
structure arrayA structure array S containing class levels and their bandwidths. S must have two fields:
  • S.width: A numeric row vector of bandwidths, or a matrix of numeric values with D columns.

  • S.group: A vector of the same type as Y, containing unique class levels indicating the class for the corresponding element of S.width.

By default, the software selects a default bandwidth automatically for each combination of feature and class by using a value that is optimal for a Gaussian distribution.

Example: 'KSWidth',struct('width',[0.5,0.25],'group',{{'b';'g'}})

Data Types: double | struct

Class prior probabilities, specified as the comma-separated pair consisting of 'Prior' and a numeric vector, structure array, 'uniform', or 'empirical'.

This table summarizes the available options for setting prior probabilities.

ValueDescription
'empirical'The software uses the class relative frequencies distribution for the prior probabilities.
numeric vector

A numeric vector of length K specifying the prior probabilities for each class. The order of the elements of Prior should correspond to the order of the class levels. For details on the order of the classes, see Algorithms.

The software normalizes prior probabilities to sum to 1.

structure arrayA structure array S containing class levels and their prior probabilities. S must have two fields:
  • S.prob: A numeric vector of prior probabilities. The software normalizes prior probabilities to sum to 1.

  • S.group: A vector of the same type as Y containing unique class levels indicating the class for the corresponding element of S.prob. S.class must contain all the K levels in Y. It can also contain classes that do not appear in Y. This can be useful if X is a subset of a larger training set. The software ignores any classes that appear in S.group but not in Y.

'uniform'The prior probabilities are equal for all classes.

Example: 'Prior',struct('prob',[1,2],'group',{{'b';'g'}})

Data Types: char | double | struct

Output Arguments

collapse all

Trained naive Bayes classifier, returned as a NaiveBayes classifier.

More About

collapse all

Bag-of-Tokens Model

In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors.

Tips

  • For classifying count-based data, such as the bag-of-tokens model, use the multinomial distribution (e.g., set 'Distribution','mn').

  • This list defines the order of the classes. It is useful when you specify prior probabilities by setting 'Prior',prior, where prior is a numeric vector.

    • If Y is a categorical array, then the order of the class levels matches the output of categories(Y).

    • If Y is a numeric or logical vector, then the order of the class levels matches the output of sort(unique(Y)).

    • For cell arrays of character vectors and character arrays, the order of the class labels is the order which each label appears in Y.

Algorithms

  • If you specify 'Distribution','mn', then the software considers each observation as multiple trials of a multinomial distribution, and considers each occurrence of a token as one trial (see Bag-of-Tokens Model).

  • If you specify 'Distribution','mvmn', then the software assumes each individual predicator follows a multinomial model within a class. The parameters for a predictor include the probabilities of all possible values that the corresponding feature can take.

Introduced in R2014a

Was this topic helpful?