Documentation 
Train naive Bayes classifier
NBModel = fitNaiveBayes(X,Y) returns a naive Bayes classifier NBModel, trained by predictors X and class labels Y for Klevel classification.
Predict labels for new data by passing the data and NBModel to predict.
NBModel = fitNaiveBayes(X,Y,Name,Value) returns a naive Bayes classifier with additional options specified by one or more Name,Value pair arguments.
For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.
Load Fisher's iris data set.
load fisheriris
X = meas(:,3:4);
Y = species;
tabulate(Y)
Value Count Percent setosa 50 33.33% versicolor 50 33.33% virginica 50 33.33%
The software can classify data with more than two classes using naive Bayes methods.
Train a naive Bayes classifier.
NBModel = fitNaiveBayes(X,Y)
NBModel = Naive Bayes classifier with 3 classes for 2 dimensions. Feature Distribution(s):normal Classes:setosa, versicolor, virginica
NBModel is a trained NaiveBayes classifier.
By default, the software models the predictor distribution within each class using a Gaussian distribution having some mean and standard deviation. Use dot notation to display the parameters of a particular Gaussian fit, e.g., display the fit for the first feature within setosa.
setosaIndex = strcmp(NBModel.ClassLevels,'setosa');
estimates = NBModel.Params{setosaIndex,1}
estimates = 1.4620 0.1737
The mean is 1.4620 and the standard deviation is 0.1737.
Plot the Gaussian contours.
figure gscatter(X(:,1),X(:,2),Y); xylim = cell2mat(get(gca,{'Xlim','YLim'})); % Gets current axis limits hold on Params = cell2mat(NBModel.Params); Mu = Params(2*(1:3)1,1:2); % Extracts the means Sigma = zeros(2,2,3); for j = 1:3 Sigma(:,:,j) = diag(Params(2*j,:)); % Extracts the standard deviations ezcontour(@(x1,x2)mvnpdf([x1,x2],Mu(j,:),Sigma(:,:,j)),... xylim+0.5*[1,1,1,1]) ... % Draws contours for the multivariate normal distributions end title('Naive Bayes Classifier  Fisher''s Iris Data') xlabel('Petal Length (cm)') ylabel('Petal Width (cm)') hold off
You can change the default distribution using the namevalue pair argument 'Distribution'. For example, If some predictors are count based, then you can specify that they are multinomial random variables using 'Distribution','mn' .
Load Fisher's iris data set.
load fisheriris
X = meas;
Y = species;
Train a naive Bayes classifier using every predictor.
NBModel1 = fitNaiveBayes(X,Y);
NBModel1.ClassLevels % Display the class order
NBModel1.Params
NBModel1.Params{1,2}
ans = 'setosa' 'versicolor' 'virginica' ans = [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] ans = 3.4280 0.3791
By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. There are four predictors and three class levels. Each cell in NBModel1.Params corresponds to a numeric vector containing the mean and standard deviation of each distribution, e.g., the mean and standard deviation for setosa iris sepal widths are 3.4280 and 0.3791, respectively.
Estimate the confusion matrix for NBModel1.
predictLabels1 = predict(NBModel1,X); [ConfusionMat1,labels] = confusionmat(Y,predictLabels1)
ConfusionMat1 = 50 0 0 0 47 3 0 3 47 labels = 'setosa' 'versicolor' 'virginica'
Element (j, k) of ConfusionMat1 represents the number of observations that the software classifies as k, but the data show as being in class j.
Retrain the classifier using the Gaussian distribution for predictors 1 and 2 (the sepal lengths and widths), and the default normal kernel density for predictors 3 and 4 (the petal lengths and widths).
NBModel2 = fitNaiveBayes(X,Y,... 'Distribution',{'normal','kernel','normal','kernel'}); NBModel2.Params{1,2}
ans = KernelDistribution Kernel = normal Bandwidth = 0.179536 Support = unbounded
The software does not train parameters to the kernel density. Rather, the software chooses an optimal width. However, you can specify a width using the 'KSWidth' namevalue pair argument.
Estimate the confusion matrix for NBModel2.
predictLabels2 = predict(NBModel2,X); ConfusionMat2 = confusionmat(Y,predictLabels2)
ConfusionMat2 = 50 0 0 0 47 3 0 3 47
Based on the confusion matrices, the two classifiers perform similarly in the training sample.
Some spam filters classify an incoming email as spam based on how many times a word or puncutation (called tokens) occurs in an email. The predictors are the frequencies of particular words or punctuations in an email. Therefore, the predictors compose multinomial random variables.
This example illustrates classification using naive Bayes and mutlinomial predictors.
Suppose you observed 1000 emails and classified them as spam or not spam. Do this by randomly assigning 1 or 1 to y for each email.
n = 1000; % Sample size rng(1); % For reproducibility y = randsample([1 1],n,true); % Random labels
To build the predictor data, suppose that there are five tokens in the vocabulary, and 20 observed tokens per email. Generate predictor data from the five tokens by drawing multinomial deviates. The relative frequencies for tokens corresponding to spam emails should differ from emails that are not spam.
tokenProbs = [0.2 0.3 0.1 0.15 0.25;... 0.4 0.1 0.3 0.05 0.15]; % Token relative frequencies tokensPerEmail = 20; X = zeros(n,5); X(y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),sum(y == 1)); X(y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),sum(y == 1));
Train a naive Bayes classifier. Specify that the predictors are multinomial.
NBModel = fitNaiveBayes(X,y,'Distribution','mn');
NBModel is a trained NaiveBayes classifier.
Assess the insample performance of NBModel by estimating the misclassification rate.
predSpam = predict(NBModel,X); misclass = sum(y'~=predSpam)/n
misclass = 0.0200
The insample misclassification rate is 2%.
Randomly generate deviates that represent a new batch of emails.
nOut = 500; yOut = randsample([1 1],nOut,true); XOut = zeros(nOut,5); XOut(yOut == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),... sum(yOut == 1)); XOut(yOut == 1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),... sum(yOut == 1));
Classify the new emails using the trained naive Bayes classifier NBModel, and determine whether the algorithm generalizes.
predSpamOut = predict(NBModel,XOut); genRate = sum(yOut'~=predSpamOut)/nOut
genRate = 0.0260
The outofsample misclassification rate is 2.6% indicating that the classifier generalizes fairly well.
Predictor data to which the naive Bayes classifier is trained, specified as a matrix of numeric values.
Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature).
The length of Y and the number of rows of X must be equivalent.
Data Types: double
Class labels to which the naive Bayes classifier is trained, specified as a categorical or character array, logical or numeric vector, or cell array of strings. Each element of Y defines the class membership of the corresponding row of X. Y supports K class levels.
If Y is a character array, then each row must correspond to one class label.
The length of Y and the number of rows of X must be equivalent.
Data Types: cell  char  double  logical
Note: The software treats NaN, empty string (''), and <undefined> elements as missing values.
Removing rows of X and corresponding elements of Y decreases the effective training or crossvalidation sample size. 
Specify optional commaseparated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.
Example: 'Distribution','mn','Prior','uniform','KSWidth',0.5 specifies the following: the data distribution is multinomial, the prior probabilities for all classes are equal, and the kernel smoothing window bandwidth for all classes is 0.5 units.Data distributions fitNaiveBayes uses to model the data, specified as the commaseparated pair consisting of 'Distribution' and a string or cell array of strings.
This table summarizes the available distributions.
Value  Description 

'kernel'  Kernel smoothing density estimate. 
'mn'  Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a cell array of strings. For details, see Algorithms. 
'mvmn'  Multivariate multinomial distribution. For details, see Algorithms. 
'normal'  Normal (Gaussian) distribution. 
If you specify a string, then the software models all the features using that distribution. If you specify a 1byD cell array of strings, then the software models feature j using the distribution in element j of the cell array.
Example: 'Distribution',{'kernel','normal'}
Data Types: cell  char
Kernel smoothing density support, specified as the commaseparated pair consisting of 'KSSupport' and a numeric row vector, a string, or a cell array. The software applies the kernel smoothing density to this region.
If you do not specify 'Distribution','kernel', then the software ignores the values of 'KSSupport', 'KSType', and 'KSWidth'.
This table summarizes the available options for setting the kernel smoothing density region.
Value  Description 

1by2 numeric row vector  For example, [L,U], where L and U are the finite lower and upper bounds, respectively, for the density support. 
'positive'  The density support is all positive real values. 
'unbounded'  The density support is all real values. 
If you specify a 1byD cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel support in cell j for feature j in X.
Example: 'KSSupport',{[10,20],'unbounded'}
Data Types: cell  char  double
Kernel smoother type, specified as the commaseparated pair consisting of 'KSType' and a string or cell array of strings.
If you do not specify 'Distribution','kernel', then the software ignores the values of 'KSSupport', 'KSType', and 'KSWidth'.
This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indictor function.
Value  Kernel  Formula 

'box'  Box (uniform)  $$f(x)=0.5I\left\{\leftx\right\le 1\right\}$$ 
'epanechnikov'  Epanechnikov  $$f(x)=0.75\left(1{x}^{2}\right)I\left\{\leftx\right\le 1\right\}$$ 
'normal'  Gaussian  $$f(x)=\frac{1}{\sqrt{2\pi}}\mathrm{exp}\left(0.5{x}^{2}\right)$$ 
'triangle'  Triangular  $$f(x)=\left(1\leftx\right\right)I\left\{\leftx\right\le 1\right\}$$ 
If you specify a 1byD cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel smoother type in cell j for feature j in X.
Example: 'KSType',{'epanechnikov','normal'}
Data Types: cell  char
Kernel smoothing window bandwidth, specified as the commaseparated pair consisting of 'KSWidth' and a matrix of numeric values, numeric row vector, numeric column vector, scalar, or structure array.
If you do not specify 'Distribution','kernel', then the software ignores the values of 'KSSupport', 'KSType', and 'KSWidth'.
Suppose there are K class levels and D predictors. This table summarizes the available options for setting the kernel smoothing window bandwidth.
Value  Description 

KbyD matrix of numeric values  Element (k,d) specifies the bandwidth for predictor d in class k. 
Kby1 numeric column vector  Element k specifies the bandwidth for all predictors in class k. 
1byD numeric row vector  Element d specifies the bandwidth in all class levels for predictor d. 
scalar  Specifies the bandwidth for all features in all classes. 
structure array  A structure array S containing class levels
and their bandwidths. S must have two fields:

By default, the software selects a default bandwidth automatically for each combination of feature and class by using a value that is optimal for a Gaussian distribution.
Example: 'KSWidth',struct('width',[0.5,0.25],'group',{{'b';'g'}})
Data Types: double  struct
Class prior probabilities, specified as the commaseparated pair consisting of 'Prior' and a numeric vector, structure array, or string.
This table summarizes the available options for setting prior probabilities.
Value  Description 

'empirical'  The software uses the class relative frequencies distribution for the prior probabilities. 
numeric vector  A numeric vector of length K specifying the prior probabilities for each class. The order of the elements of Prior should correspond to the order of the class levels. For details on the order of the classes, see Algorithms. The software normalizes prior probabilities to sum to 1. 
structure array  A structure array S containing class levels
and their prior probabilities. S must have two
fields:

'uniform'  The prior probabilities are equal for all classes. 
Example: 'Prior',struct('prob',[1,2],'group',{{'b';'g'}})
Data Types: char  double  struct
Trained naive Bayes classifier, returned as a NaiveBayes classifier.
In the bagoftokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors.
For classifying countbased data, such as the bagoftokens model, use the multinomial distribution (e.g., set 'Distribution','mn').
This list defines the order of the classes. It is useful when you specify prior probabilities by setting 'Prior',prior, where prior is a numeric vector.
If Y is a categorical array, then the order of the class levels matches the output of categories(Y).
If Y is a numeric or logical vector, then the order of the class levels matches the output of sort(unique(Y)).
For cell arrays of string and character arrays, the order of the class labels is the order which each label appears in Y.
If you specify 'Distribution','mn', then the software considers each observation as multiple trials of a multinomial distribution, and considers each occurrence of a token as one trial (see BagofTokens Model).
If you specify 'Distribution','mvmn', then the software assumes each individual predicator follows a multinomial model within a class. The parameters for a predictor include the probabilities of all possible values that the corresponding feature can take.