Classification Learner is a new app that lets you train models to classify data using supervised machine learning. You can explore your data, select features, specify crossvalidation schemes, train models, and assess results. You can choose from several classification types including decision trees, support vector machines, nearest neighbors, and ensemble classification.
Perform supervised machine learning by supplying a known set of input data (observations or examples) and known responses to the data (i.e., labels or classes). Use the data to train a model that generates predictions for the response to new data. To use the model with new data, or to learn about programmatic classification, you can export the model to the workspace or generate MATLAB^{®} code to recreate the trained model.
For details, see Explore Classification Models Interactively.
compareHoldout
, testcholdout
,
and testckfold
functionsYou can statistically assess the predictive accuracies of two classification models using holdout sample predictions or repeated cross validation.
The testcholdout
accepts
holdout sample predicted labels from both classification models and
the true labels. This function implements the asymptotic, exact, or
midp version of McNemar's test. If you
specify misclassification costs, testcholdout
compares
the models using a likelihood ratio or a chisquare test.
The compareHoldout
object
function accepts any two trained classification model objects in Statistics and Machine Learning Toolbox™,
sets of holdout predictor data for both models, and corresponding
true labels. Like testcholdout
, this object function
implements the asymptotic, exact, or midp version
of McNemar's test. If you specify misclassification costs, compareHoldout
compares
the models using a likelihood ratio or a chisquare test.
The testckfold
function
accepts any two trained classification model objects or templates
in Statistics and Machine Learning Toolbox, and repeatedly applies kfold
cross validation using two sets of outofsample predictor data and
true labels. Then, testckfold
assesses the resulting
accuracies using a t or an F test.
kmedoids
, fitcknn
,
and other functions when using cosine, correlation, or spearman distance
calculationsPairwise distance calculations (by pdist
and pdist2
) in kmedoids
and fitcknn
use Basic Linear Algebra Subroutines
(BLAS) libraries based on the Intel^{®} Math Kernel Library (MKL).
For details on Intel MKL, see https://software.intel.com/enus/intelmkl.
For dualcore systems and above, fitctree
, fitrtree
,
and fitensemble
parallelize training
decision trees using using Intel Threading Building Blocks (TBB).
For details on Intel TBB, see https://software.intel.com/enus/inteltbb.
You can parallelize computation of pointwise confidence
intervals perfcurve
returns for
the x and ycoordinates, thresholds,
or the area under the curve measure. You need Parallel Computing Toolbox™ to
use this option.
'MaxNumSplits'
argument
in fitctree
, fitrtree
, and templateTree
functionsYou can control the depth of a decision tree by choosing the
maximal number of splits (branch nodes) rather than choosing the minimum
leaf size or minimum parent size. Specify this option using the 'MaxNumSplits'
namevalue
pair argument in the fitctree
, fitrtree
, or templateTree
.
Full trees (ClassificationTree
or RegressionTree
classifiers)
contain the field MaxNumSplits
in the property ModelParameters
to
store the specified maximal number of splits.
pca
and probability
distribution functions (using MATLAB Coder)pca
, betafit
, betalike
and pearsrnd
are
now supported for code generation. For a full list of Statistics and Machine Learning Toolbox functions
that are supported by MATLAB Coder™, see Statistics and Machine Learning Toolbox.
sampsizepwr
functionsampsizepwr
returns the
power, sample size, or alternative hypothesis value for a twosample ttest
for populations with equal variances. Specify the twosample ttest
using 't2'
as the 'testtype'
input
variable.
You can reduce the memory footprint of a linear support vector
machine (SVM) model by discarding their support vectors. Pass a trained
SVM model (i.e., a ClassificationSVM
or CompactClassificationSVM
object)
to discardSupportVectors
to
discard:
The α coefficients (stored
in the Alpha
property)
The support vectors (stored in the SupportVectors
property)
The support vector labels (stored in the SupportVectorLabels
property)
By default, fitcsvm
and compact
do
not discard the α coefficients, support vectors,
and the support vector labels.
You can pass a trained error correcting output codes (ECOC)
model (i.e., a ClassificationECOC
or CompactClassificationECOC
object)
to discardSupportVectors
to
similarly discard the α coefficients, support
vectors, and the support vector labels from all linear SVM binary
learners. To control whether linear SVM binary learners store support
vectors, create an SVM template using templateSVM
and
set the 'SaveSupportVectors'
namevalue pair argument.
By default, fitcecoc
discards
the α coefficients, support vectors, and
the support vector labels from all linear SVM binary learners. To
store these estimates, create an SVM template and specify 'SaveSupportVectors',true
.
Then, pass the SVM template to fitcecoc
.
The default minimum leaf size for boosted regression trees is 5
.
To train boosted regression trees using the previous defaults,
construct a regression tree template using templateTree
,
and specify 'MinLeafSize',1
and 'MaxNumSplits',1
.
Then, pass the regression tree template to fitensemble
.
scatterhist
and gplotmatrix
functionsThe scatterhist
function
includes two new namevalue pair arguments that allow you to display
grouped histograms of the marginal distributions along the x
and yaxes of the scatter plot:
'PlotGroup'
allows you to specify
whether to plot the marginal distributions by group or for the entire
data set.
'Style'
allows you to specify whether
to display a stairstep plot, which shows the outline of a histogram
without filling in the bars, or a histogram bar plot.
If you specify a grouping variable that contains more than one
group, then by default scatterhist
displays grouped
stairstep plots. If you specify a grouping variable that contains
only one group, then scatterhist
displays a histogram
bar plot. To display kernel density plots, use the 'Kernel'
namevalue
pair argument.
The positional argument 'dispopt'
in gplotmatrix
supports two additional options
for controlling the appearance of the plots along the diagonal of
the plot matrix:
'stairs'
displays a stairstep plot,
which shows the outline of the grouped histograms without filling
in the bars.
'grpbars'
displays a standard grouped
histogram bar plot.
regress
regress
computes the confidence
intervals for studentized residuals using the degree of freedom n – p –
1, where n is the number of observations and p is
the number of predictor variables.
The degrees of freedom in the computation of confidence intervals
for studentized residuals that regress
returns
is n – p – 1 rather
than n – p. Results may
differ from those in previous releases.
Following functionality will be removed in a future release. Use the newer functionality instead.
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

princomp  Warns  pca  Replace instances of princomp with pca . 
treedisp  Warns  view (ClassificationTree )
or view (RegressionTree )  Use fitctree or fitrtree to grow a tree. Replace instances
of treedisp withview (ClassificationTree )
or view (RegressionTree ). 
treefit  Warns  fitctree or fitrtree  Replace instances of treefit with fitctree or fitrtree . 
treeprune  Warns  prune (ClassificationTree )
or prune (RegressionTree )  Use fitctree or fitrtree to grow a tree. Replace instances
of treeprune with prune (ClassificationTree )
or prune (RegressionTree )
. 
treetest  Warns  Use fitctree or fitrtree to grow a tree. Replace instances
of
 
treeval  Warns  predict (ClassificationTree )
or predict (RegressionTree )  Use fitctree or fitrtree to grow a tree. Replace instances
of treeval with predict (ClassificationTree )
or predict (RegressionTree ). 
classify  Still runs  fitcdiscr  Replace instances of classify with fitcdiscr . 
classregtree  Still runs  fitctree or fitrtree  Replace instances of classregtree with fitctree or fitrtree . 
fitNaiveBayes  Still runs  fitcnb  Replace instances of fitNaiveBayes with fitcnb . 
ProbDist  Still runs  makedist and fitdist  To create and fit probability distribution objects, use makedist and fitdist instead. 
ProbDistParametric  Still runs  makedist and fitdist  To create and fit probability distribution objects, use makedist and fitdist instead. 
ProbDistKernel  Still runs  makedist and fitdist  To create and fit probability distribution objects, use makedist and fitdist instead. 
ProbDistUnivKernel  Still runs  makedist and fitdist  To create and fit probability distribution objects, use makedist and fitdist instead. 
ProbDistUnivParam  Still runs  makedist and fitdist  To create and fit probability distribution objects, use makedist and fitdist instead. 
svmclassify  Still runs  fitcsvm  Replace instances of svmclassify with fitcsvm . 
svmtrain  Still runs  fitcsvm  Replace instances of svmtrain with fitcsvm . 
fitcecoc
functionThe fitcecoc
function
fits an errorcorrecting output code (ECOC) model for multiclass learning.
Using training data and a coding scheme, fitcecoc
combines
a set of binary learners, such as SVM classifiers, using a coding
design to create a multiclass model. You can use a supported coding
scheme, or specify your own using the designecoc
function.
The new functionality also supports fitting posterior probabilities
for most methods. fitcecoc
creates an object
of the new class ClassificationECOC
or ClassificationPartitionedECOC
.
ClassificationECOC
is a new class for accessing
and performing operations on the training data. CompactClassificationECOC
is
a new class for storing configurations of trained models without storing
training data. ClassificationPartitionedECOC
is a
new class for a set of crossvalidated ECOC models trained on crossvalidated
folds.
ClassificationECOC
is built on the same framework
as ClassificationTree
, ClassificationDiscriminant
, ClassificationKNN
,
and ClassificationSVM
, so you have a variety of options
and methods, including:
Cross validation
Resubstitution statistics
Generalization statistics
Weighted classification
For all methods and properties of the new objects, see the ClassificationECOC
, CompactClassificationECOC
,
and ClassificationPartitionedECOC
class pages.
fitglme
functionGeneralizedLinearMixedModel
is
a new class for fitting generalized linear mixedeffects (GLME) models.
Fit GLME models using fitglme
.
You can:
Specify GLME models using the formula notation.
Fit GLME models for a response with conditional distribution of normal, binomial, poisson, gamma, or inverse Gaussian.
Specify the link function using a string or a structure.
Fit GLME models using maximum pseudo likelihood (MPL), restricted maximum pseudo likelihood (REMPL), maximum likelihood using Laplace approximation, or maximum likelihood using approximate Laplace approximation with fixed effects profiled out.
Specify a covariance pattern for the random effects.
Calculate estimates of the empirical Bayes predictors (EBPs) for random effects.
Perform custom hypothesis tests on fixed effects.
Compute confidence intervals on fixed effects, random effects, and covariance parameters.
Examine residuals, diagnostic plots, fitted values, and design matrices.
Compare two different models using the theoretical likelihood ratio test.
Make predictions on new data using the fitted GLME model.
Generate random data using the fitted GLME model at new design points.
Refit a new GLME model based on the previously fitted model, using a new response vector.
For the properties and methods of this object, see the class
page for GeneralizedLinearMixedModel
.
kmedoids
functionThe kmedoids
function
partitions data into k clusters using the kmedoids
algorithm. This functionality provides clustering on categorical data,
clustering using arbitrary distance metrics, robustness to outliers,
and scaling to large data sets.
The fishertest
function
performs Fisher's exact test on 2by2 contingency tables.
The new functionality is appropriate for small sample sizes.
templateEnsemble
function for creating ensemble
learning templateYou can use the templateEnsemble
function
to create an ensemble learning template suitable for training errorcorrecting
output code (ECOC) multiclass classifiers. In particular, you can
perform multiclass classification by specifying binary learners that
use the ensemble methods GentleBoost, LogitBoost, and RobustBoost.
templateSVM
function for creating SVM learning
templateYou can use the templateSVM
function
to create an SVM learning template suitable for training errorcorrecting
output code (ECOC) multiclass classifiers. In particular, you can
perform multiclass classification by specifying binary learners that
standardize predictor data or use a particular box constraint.
You can standardize the training data before fitting the model
in knearest neighbor classification. The standardization takes into
account the observation weights and missing data. You can specify
this option using the 'Standardize'
namevalue
pair argument in the fitcknn
function.
fitcnb
function for naive Bayes classificationYou can use the fitcnb
function
to train a multiclass naive Bayes model. fitcnb
creates
an object of the new class ClassificationNaiveBayes
.
ClassificationNaiveBayes
is a new class for
accessing and performing operations on the training data. CompactClassificationNaiveBayes
is
a new class for storing configurations of trained models without storing
training data.
The fitcnb
function and ClassificationNaiveBayes
and CompactClassificationNaiveBayes
classes
include the functionality of the fitNaiveBayes
function
and NaiveBayes
class. ClassificationNaiveBayes
is
built on the same framework as ClassificationTree
, ClassificationDiscriminant
, ClassificationKNN
,
and ClassificationSVM
, so you have a variety of additional
options and methods, including:
Cross validation
Resubstitution statistics
Generalization statistics
Weighted classification
You can also use the templateNaiveBayes
function
to create a naive Bayes classifier template suitable for training
errorcorrecting output code (ECOC) multiclass classifiers.
For all methods and properties of the new objects, see the ClassificationNaiveBayes
and CompactClassificationNaiveBayes
class
pages.
fitrm
is
a new function for fitting models to repeated measures data, where
each subject has multiple response measurements. It produces an object
of the new RepeatedMeasuresModel
class.
You can:
Perform analysis of variance for betweensubjects
factors using anova
.
Perform multivariate analysis of variance using manova
.
Perform hypothesis tests on the coefficients using coeftest
.
Perform repeated measures analysis of variance using ranova
.
Test for sphericity (compound symmetry) with Mauchly's
test using mauchly
.
Plot data and estimated marginal means with optional
grouping using plot
and plotprofile
.
Compute summary statistics organized by group using grpstats
.
Perform multiple comparisons of marginal means using multcompare
.
Make predictions on new data with the fitted repeated
measures model using predict
.
Generate random data with the fitted repeated measures
model at new design points using random
.
For the properties and methods of this object, see the RepeatedMeasuresModel
class
page.
fitcsvm
function for enhanced performance
of support vector machines (SVMs) for binary classificationYou can now use the new fitcsvm
function
to train an SVM classifier for one or twoclass learning. fitcsvm
creates
an object of the new class ClassificationSVM
or
existing class ClassificationPartitionedModel
.
ClassificationSVM
is
a new class for accessing and performing operations on the training
data. CompactClassificationSVM
is
a new class for storing configurations of trained models without storing
training data. The syntax and methods resemble those in the existing ClassificationTree
and CompactClassificationTree
classes.
The new fitcsvm
function and ClassificationSVM
and CompactClassificationSVM
classes
include the functionality of the svmtrain
and svmclassify
functions. ClassificationSVM
provides
several benefits compared to the svmtrain
and svmclassify
functions:
The new functionality
Supports computation of soft classification scores
Supports fitting posterior probabilities
Has improved training speed, especially on big data with wellseparated classes by providing shrinkage
Allows a warm restart by accepting an initial α value
Allows training to resume after the maximum number of iterations is exceeded
Supports robust learning in the presence of outliers
ClassificationSVM
is built on the
same framework as ClassificationTree
, ClassificationDiscriminant
,
and ClassificationKNN
, so you have a variety of options
and methods, including:
Cross validation
Resubstitution statistics
Generalization statistics
Weighted classification
For all methods and properties of the new objects, see the ClassificationSVM
and CompactClassificationSVM
class
pages.
evalclusters
methods to expand the number
of clusters and number of gap criterion simulations There are two new methods for the objects created using the evalclusters
function:
addK
adds
additional number of clusters to be evaluated. This method applies
to all classes of cluster evaluation (i.e., clustering.evaluation.GapEvaluation
, clustering.evaluation.SilhouetteEvaluation
, clustering.evaluation.CalinskiHarabaszEvaluation
,
and clustering.evaluation.DaviesBouldinEvaluation
).
increaseB
increases
the number of reference data sets for gap criterion simulations. This
method applies to the clustering.evaluation.GapEvaluation
class.
The default value of the 'SearchMethod'
namevalue
pair argument for clustering.evaluation.GapEvaluation
objects
is now always 'globalMaxSE'
.
The default value of the 'SearchMethod'
namevalue
pair argument for clustering.evaluation.GapEvaluation
objects
is now always 'globalMaxSE'
and does not change
depending on the value of the 'KList'
namevalue
pair argument.
multcompare
function multcompare
now
returns the pvalue of each pairwise comparison
of group means. multcompare
returns the pvalue
in the sixth column of its first output argument. The pvalue
is the overall significance level at which the individual comparison
is borderline significant.
The first output argument of multcompare
now
has six columns, instead of five. The sixth column contains the pvalue.
table
inputs as an alternative
to dataset array inputsThe following functions and methods now accept table inputs as alternative to dataset array inputs.
Functions and Methods  Class 

fitlm , fitglm , fitlme , fitnlm , stepwiseglm , stepwiselm , grpstats , datasample  N/A 
predict , random , feval  LinearModel 
devianceTest , random , predict , feval  GeneralizedLinearModel 
random , predict , feval  NonLinearModel 
random , predict  LinearMixedModel 
table
rather
than a dataset array The following functions, methods, and model properties now return
a table
rather than a dataset
array.
Functions and Methods  Class 

xptread , grpstats *  N/A 
anova  LinearModel 
devianceTest  GeneralizedLinearMode l 
fixedEffects , randomEffects  LinearMixedModel 
Property  Class 

VariableInfo , ObservationInfo , Variables , Diagnostics , Residuals , Coefficients  LinearModel 
VariableInfo , ObservationInfo , Variables , Diagnostics , Residuals , Fitted , Coefficients  GeneralizedLinearMode l 
VariableInfo , ObservationInfo , Variables , Diagnostics , Residuals , Coefficients  NonLinearModel 
VariableInfo , ObservationInfo , Variables , Coefficients , ModelCriterion  LinearMixedModel 
*grpstats
now matches the output with input
type.
The functions and properties listed now return a table
instead
of a dataset
array. You can convert them to dataset
arrays using the table2dataset
function.
'EmptyAction'
on kmeans
is
now 'singleton'
. The default value of the 'EmptyAction'
namevalue
pair argument of the kmeans
function
is now 'singleton'
.
To set the value of 'EmptyAction'
to 'error'
,
you must explicitly specify 'EmptyAction','error'
.
The following are new functions for classification and regression trees, discriminant analysis, nearest neighbors, Naive Bayes classification, and Gaussian mixture models.
New Function  Replacing 

fitcdiscr  ClassificationDiscriminant.fit 
fitcknn  ClassificationKNN.fit 
fitctree  ClassificationTree.fit 
fitrtree  RegressionTree.fit 
fitNaiveBayes  NaiveBayes.fit 
fitgmdist  gmdistribution.fit 
templateDiscriminant  ClassificationDiscriminant.template 
templateKNN  ClassificationKNN.template 
templateTree  ClassificationTree.template or RegressionTree.template 
makecdiscr  ClassificationDiscriminant.make 
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

ClassificationDiscriminant.fit  Still runs  fitcdiscr  Replace instances of ClassificationDiscriminant.fit with fitcdiscr . 
ClassificationKNN.fit  Still runs  fitcknn  Replace instances of ClassificationKNN.fit with fitcknn . 
ClassificationTree.fit  Still runs  fitctree  Replace instances of ClassificationTree.fit with fitctree . 
RegressionTree.fit  Still runs  fitrtree  Replace instances of RegressionTree.fit with fitrtree . 
NaiveBayes.fit  Still runs  fitNaiveBayes  Replace instances of NaiveBayes.fit with fitNaiveBayes . 
gmdistribution.fit  Still runs  fitgmdist  Replace instances of gmdistribution.fit with fitgmdist . 
ClassificationDiscriminant.template  Still runs  templateDiscriminant  Replace instances of ClassificationDiscriminant.template with templateDiscriminant . 
ClassificationKNN.template  Still runs  templateKNN  Replace instances of ClassificationKNN.template with templateKNN . 
ClassificationTree.template or RegressionTree.template  Still runs  templateTree  Replace instances of ClassificationTree.template or RegressionTree.template with templateTree . 
ClassificationDiscriminant.make  Still runs  makecdiscr  Replace instances of ClassificationDiscriminant.make with makecdiscr . 
LinearMixedModel
is
a new class for fitting linear mixedeffects (LME) models. Fit multilevel
LME models or LME models with nested and/or crossed random effects
using the fitlme
or fitlmematrix
function.
You can:
Specify LME models using either the formula notation or via matrix input.
Fit LME models using maximum likelihood (ML) or restricted maximum likelihood (REML).
Specify a covariance pattern for the random effects.
Calculate estimates of best linear unbiased predictors (BLUPs) for random effects.
Perform custom joint hypothesis tests on fixed and random effects.
Compute confidence intervals on fixed effects, random effects, and covariance parameters.
Examine residuals, diagnostic plots, fitted values, and design matrices.
Compare two different models via theoretical or simulated likelihood ratio tests.
Make predictions on new data using the fitted LME model.
Generate random data using the fitted LME model at new design points.
For the properties and methods of this object, see the class
page for LinearMixedModel
.
Many probability distribution and descriptive statistics functions are now supported for code generation. For a full list of Statistics Toolbox™ functions that are supported by MATLAB Coder, see Statistics Toolbox Functions.
evalclusters
function for estimating the
optimal number of clusters in data The new function evalclusters
estimates
the optimal number of clusters for various criterion values, and returns
the clustering solution corresponding to the estimated optimal value.
You can provide clustering solutions, ask evalclusters
to
use one of the builtin clustering algorithms, 'kmeans'
, 'linkage'
,
or 'gmdistribution'
, or provide a function handle.
The following criteria are available:
The CalinskiHarabasz (CH) index
The Silhouette index
The Gap statistic
The DaviesBouldin (DB) index
mvregress
function that now accepts a design
matrix even if Y
has multiple columns mvregress
now
accepts an nby(p + 1) design
matrix X
, when the response Y
is
an nbyd matrix with d >
1, where n is the number of observations, p is
the number of predictor variables, d is the number
of dimensions in the response, and X
includes a
column of ones for the intercept (constant) term.
Statistics Toolbox now provides upper tail probability
calculations for cumulative distribution functions. You can compute
the upper tail probabilities using a trailing 'upper'
argument
in the following functions:
cdf
function for probability distribution
objects, returned by pd = makedist(distname)
or pd
= fitdist(X,distname)
:
cdf(pd,X,'upper')
cdf
function:
Y = cdf('name',X,A,'upper')
Y = cdf('name',X,A,B,'upper')
Y = cdf('name',X,A,B,C,'upper')
Distributionspecific cdf
functions:
Distribution  New Syntax 

Beta  p = betacdf(X,A,B,'upper') 
Binomial  Y = binocdf(X,N,P,'upper') 
Chisquare  p = chi2cdf(X,V,'upper') 
Extreme Value  P = evcdf(X,mu,sigma,'upper')

Exponential  P = expcdf(X,mu,'upper')

F  P = fcdf(X,V1,V2,'upper') 
Gamma  P = gamcdf(X,A,B,'upper')

Geometric  Y = geocdf(X,P,'upper') 
Generalized Extreme Value  P = gevcdf(X,k,sigma,mu,'upper') 
Generalized Pareto  P = gpcdf(X,sigma,theta,'upper') 
Hypergeometric  P = hygecdf(X,M,K,N,'upper') 
Lognormal  P = logncdf(X,mu,sigma,'upper')

Negative Binomial  Y = nbincdf(X,R,P,'upper') 
Noncentral F  P = ncfcdf(X,NU1,NU2,DELTA,'upper') 
Noncentral t  P = nctcdf(X,NU,DELTA,'upper') 
Noncentral Chisquare  P = ncx2cdf(X,V,DELTA,'upper') 
Normal  P = normcdf(X,mu,sigma,'upper')

Poisson  P = poisscdf(X,lambda,'upper') 
t  P = tcdf(X,V,'upper') 
Rayleigh  P = raylcdf(X,B,'upper') 
Uniform Discrete  P = unidcdf(X,N,'upper') 
Uniform Continuous  P = unidcdf(X,A,B,'upper') 
Weibull  P = wblcdf(X,A,B,'upper')

partialcorri
function for partial correlation
with asymmetric treatment of inputs and outputs The new function partialcorri
computes
linear partial correlation coefficients with internal adjustments.
You can compute partial correlation between pairs of variables in Y
and X
,
adjusting for the remaining variables in X
, or
between pairs of variables in Y
and X
,
adjusting for the remaining variables in X
, after
first controlling both X
and Y
for
the variables in Z
.
You can also:
Specify whether to use Pearson or Spearman partial correlations.
Specify how to handle missing values.
Perform hypotheses test of zero correlation against a onesided or twosided alternative.
There are new functions for the fitting and stepwise algorithms of linear and generalized linear models, and the fitting algorithm of nonlinear models. The new functions are as follows.
New Function  Replacing 

fitlm  LinearModel.fit 
stepwiselm  LinearModel.stepwise 
fitglm  GeneralizedLinearModel.fit 
stepwiseglm  GeneralizedLinearModel.stepwise 
fitnlm  NonLinearModel.fit 
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

LinearModel.fit  Still runs  fitlm  Replace instances of LinearModel.fit with fitlm 
LinearModel.stepwise  Still runs  stepwiselm  Replace instances of LinearModel.stepwise with stepwiselm 
GeneralizedLinearModel.fit  Still runs  fitglm  Replace instances of GeneralizedLinearModel.fit with fitglm 
GeneralizedLinearModel.stepwise  Still runs  stepwiseglm  Replace instances of GeneralizedLinearModel.stepwise with stepwiseglm 
NonLinearModel.fit  Still runs  fitnlm  Replace instances of NonLinearModel.fit with fitnlm 
Support vector machines are now in Statistics Toolbox.
Train support vector machine classifier using svmtrain
and
classify data using svmclassify
.
Two new features handle missing data in principal component analysis:
The new function adtest
performs
the AndersonDarling goodnessoffit test. adtest
can
perform:
Simple test: Test against a specific distribution with parameters specified. You can test against any continuous univariate parametric distribution.
Composite test: Test against a specified distribution family (also called an omnibus test). You can test against the normal, exponential, extremevalue, lognormal, or weibull distribution families.
The training speed for decision trees and their ensembles
is improved. The improvement is best seen in decision tree ensembles
obtained using the fitensemble
function
or TreeBagger
class.
Improved efficiency of TreeBagger
when
used in parallel mode.
You can specify the number of surrogate splits saved
in decision trees using the 'surrogate'
namevalue
pair argument in the fit
and template
methods
of the ClassificationTree
and RegressionTree
classes.
ClassificationTree.fit
and ClassificationTree.template
provide
several heuristic methods for splitting on categorical predictors
with many levels. Use the 'AlgorithmForCategorical'
namevalue
pair argument to specify the algorithm to find the best split and
the 'MaxCat'
namevalue pair argument to specify
the maximum number of categories you allow.
scatterhist
functionThe scatterhist
function
has these namevalue pair arguments:
'Group'
lets you specify a grouping
variable and produces a grouped scatter plot.
'Kernel'
lets you use grouped kernel
density plots instead of overall histograms for the marginal distributions.
Additional options let you change colors, line properties, legends, and more.
These functions now accept additional error models and fixed or fitdependent weights.
NonLinearModel methods: 

nlinfit 

nlpredci 

Additional functionality changes are:
disp
(NonLinearModel
method)
shows only estimable coefficients, and shows NaN
for
inestimable coefficients.
Ftest (NonLinearModel
method) automatically
decides whether to compare the full model against an interceptonly
model or zero.
NonLinearModel
properties such as Diagnostics
, Residuals
, LogLikelihood
, SSE
,
and SST
account for weights and error models.
Parametric hypothesis test functions accept optional input arguments as namevalue pair arguments.
adtest  AndersonDarling goodnessoffit test 
ansaribradley  AnsariBradley test 
dwtest  DurbinWatson test 
kstest  Onesample KolmogorovSmirnov test 
kstest2  Twosample KolmogorovSmirnov test 
lillietest  Lilliefors test 
ttest  Onesample ttest 
ttest2  Twosample ttest 
vartest  Onesample variance chisquare test 
vartest2  Twosample variance Ftest 
vartestn  Variance test across multiple groups 
ztest  ztest 
New probability distribution objects provide the following new functionality:
Create a distribution without fitting to data using
the new makedist
function.
Assign directly to parameter values.
Create truncated distributions.
Create and operate on arrays of distribution objects.
Create custom distributions. To begin, use dfittool
and
select Edit > Define Custom Distributions.
Use the provided template to define the 'Laplace'
distribution,
or modify it to create your own.
Compute and plot likelihood ratio confidence intervals and profile likelihood for fitted probability distributions.
Additional distributions in the probability distribution framework:
Multinomial
Piecewise Linear
Triangular
Uniform
You can continue fitting distributions to data using
the existing fitdist
function.
The class names of probability distribution objects returned
by fitdist
are different than in earlier releases.
There are three new boosting algorithms for classification:
RUSBoost (boosting by random undersampling) for imbalanced data (data in which one class has many more observations than the other).
LPBoost (linear programming) and TotalBoost (totally corrective boosting) which selfterminate, can lead to a sparse ensemble, and can be used for multiclass boosting.
There is a new probability distribution object for the Burr
Type XII distribution, a threeparameter family of continuous distributions
on the real line. Use fitdist
to
fit this distribution to data. Use ProbDistUnivParam
to
specify the distribution parameters directly. Either function produces
a distribution you can use to generate random samples or compute functions
such as pdf
and cdf
.
You can now import data from a file directly into a dataset
array
using the MATLAB Import Tool.
The new pca
function
includes additional functionality for principal component analysis.
Features of pca
include:
Handling of NaN as missing data values.
Weighted principal component analysis with userspecified weights.
Choice of SVD or EIG algorithm for computing principal components.
Option to specify number of components to return.
Option to not center before computing principal components.
Statistics Toolbox now supports parallel execution
for kmeans
.
The dendrogram
function
has new options for reordering the nodes of hierarchical binary cluster
trees:
The reorder
option allows you to
specify a permutation vector for the order of nodes in a dendrogram
plot.
The checkcrossings
option checks
whether a requested permutation vector leads to crossing branches
in a dendrogram plot.
The function optimalleaforder
generates
an optimal permutation of nodes.
You can add a vector of observation weights, or a handle to a function that returns a vector of observation weights, to these functions:
For an example of weighted fitting, see Weighted Nonlinear Regression.
Use either Weights
or RobustWgtFun
when
performing weighted nonlinear regression.
LinearModel
diagnostics The diagnostics in the Diagnostics
dataset
array for LinearModel
objects
are in a new order, and no longer appear in the Variables editor.
The new order is:
Leverage
CooksDistance
Dffits
S2_i
CovRatio
Dfbetas
HatMatrix
To access the correct diagnostics, you should update any code that indexes the diagnostics dataset array columns by number.
Functionality  What Happens When You Use This Functionality?  Use This Instead  Compatibility Considerations 

princomp  Still runs  pca  Replace instances of princomp with pca 
LinearModel
is
a new class for performing linear regression. LinearModel.fit
creates
a model that:
Lets you fit models with both categorical and continuous predictor variables
Contains information about the quality of the fit, such as residuals and ANOVA tables
Lets you easily plot the fit
Allows for automatic or manual exclusion of unimportant variables
Enables robust fitting for reduced influence of outliers
Lets you specify quadratic and other models using a symbolic formula
Enables stepwise model selection
There are similar improvements for generalized linear and nonlinear
modeling using the GeneralizedLinearModel
and NonLinearModel
classes.
For details, see the class reference pages in the reference material,
or Linear
Regression, Stepwise Regression, Robust Regression
— Reduce Outlier Effects, Generalized Linear
Regression, or Nonlinear Regression in
the User's Guide.
You can now edit, sort, plot, and select portions of dataset arrays from the MATLAB Variable Editor. For details, see Using Dataset Arrays in the User's Guide.
The lassoglm
function
regularizes generalized linear models. Use lassoglm
to
examine model alternatives and to constrain or remove redundant or
unimportant variables in generalized linear regression. For details,
see the function reference page, or Lasso Regularization
of Generalized Linear Models in the User's Guide.
ClassificationKNN.fit
creates
a classification model that performs knearest
neighbor classification. You can check the quality of the model with
cross validation or resubstitution. For details, see the ClassificationKNN
page
in the reference material, or Classification
Using Nearest Neighbors in the User's Guide.
fitensemble
can
construct random subspace ensembles to improve the classification
accuracy of both knearest neighbor classifiers
and discriminant analysis classifiers. For details, see Ensemble Methods or Random Subspace
Classification in the User's Guide.
ClassificationDiscriminant
models
now have two parameters, Gamma
and Delta
,
for regularization and lowering the number of variables. Set Gamma
to
regularize the discriminant. Set Delta
to eliminate
variables. Use cvshrink
to
obtain optimal Gamma
and Delta
parameters
by cross validation. For details, see the reference pages, or Regularize
a Discriminant Analysis Classifier in the User's Guide.
The stepwisefit
function
now returns the fitted coefficient history in the history.B
field.
The WgtFun
option is now called RobustWgtFun
in
the nlinfit
, statget
,
and statset
functions. RobustWgtFun
also
makes the Robust
option superfluous.
The WgtFun
and Robust
options
are currently accepted by all functions. To avoid potential future
incompatibilities, update code that uses the WgtFun
and Robust
options
to use the RobustWgtFun
option.
The ClassificationTree
predict
method
now chooses the class with minimal expected misclassification cost.
Previously, it chose the class with maximal posterior probability.
The new behavior is consistent with the cvLoss
method.
Furthermore, both ClassificationDiscriminant
and ClassificationKNN
predict
using minimal expected misclassification cost. For details, see predict
and loss
.
If you use a nondefault cost matrix, some ClassificationTree
classification
predictions can differ from those in previous versions.
The lasso
function
incorporates both the lasso regularization algorithm and the elastic
net regularization algorithm. Use lasso
to remove
redundant or unimportant variables in linear regression. The lassoPlot
function
helps you visualize lasso
results, with a variety
of coefficient trace plots and a crossvalidation plot.
For details, see Lasso and Elastic Net.
You can now use the ClassificationDiscriminant
and CompactClassificationDiscriminant
classes
for classification via discriminant analysis. The syntax and methods
resemble those in the existing ClassificationTree
and CompactClassificationTree
classes.
The ClassificationDiscriminant
class includes the
functionality of the classify
function. ClassificationDiscriminant
provides
several benefits compared to the classify
function:
After you fit a classifier, you can predict without refitting.
ClassificationDiscriminant
is built
on the same framework as ClassificationTree
, so you
have a variety of options and methods, including:
Cross validation
Resubstitution statistics
A choice of cost functions
Weighted classification
ClassificationDiscriminant
can fit
several models, including linear, quadratic, and linear or quadratic
with pseudoinverse.
For details, see Discriminant Analysis.
The rangesearch
function
finds all members of a data set that are within a specified distance
of members of another data set. As with the knnsearch
function,
you can set a variety of distance metrics, or program your own. rangesearch
has
counterparts that are methods of the ExhaustiveSearcher
and KDTreeSearcher
classes.
The datasample
function
samples with or without replacement from a data set. It can also perform
weighted sampling, with or without replacement.
The fracfactgen
function
now allows up to 52
factors, instead of the previous
limit of 26
factors. Specify factors as casesensitive
strings, using 'a'
through 'z'
for
the first 26
factors, and 'A'
through 'Z'
for
the remaining factors.
fracfact
now
checks for an arbitrary level of interaction in confounding, instead
of the previous limit of confounding up to products of two factors.
Set the MaxInt
namevalue pair to the level of
interaction you want. You can also set names for the factors using
the FactorNames
namevalue pair.
The nlmefit
function
now returns the covariance matrix of the estimated coefficients as
the covb
field of the stats
structure.
The signrank
test
now defines ties to be entries that differ by 2*eps
or
less. Previously, ties were entries that were identical to machine
precision.
For R2011b, error and warning message identifiers have changed in Statistics Toolbox.
If you have scripts or functions that use message identifiers
that changed, you must update the code to use the new identifiers.
Typically, message identifiers are used to turn off specific warning
messages, or in code that uses a try
/catch
statement
and performs an action based on a specific error identifier.
For example, if you use the 'resubstitution'
method,
the 'stats:plsregress:InvalidMCReps'
identifier
has changed to 'stats:plsregress:InvalidResubMCReps'
.
If you use the 'resubstitution'
method and your
code checks for 'stats:plsregress:InvalidMCReps'
,
you must update it to check for 'stats:plsregress:InvalidResubMCReps'
instead.
To determine the identifier for a warning, run the following command just after you see the warning:
[MSG,MSGID] = lastwarn;
This command saves the message identifier to the variable MSGID
.
To determine the identifier for an error, run the following command just after you see the error:
exception = MException.last; MSGID = exception.identifier;
Tip Warning messages indicate a potential issue with your code. While you can turn off a warning, a suggested alternative is to change your code so it runs warning free. 
The new fitensemble
function
constructs ensembles of decision trees. It provides:
Several popular boosting algorithms (AdaBoostM1
, AdaBoostM2
, GentleBoost
, LogitBoost
,
and RobustBoost
) for classification
Leastsquares boosting (LSBoost
)
for regression
Most TreeBagger
functionality
for ensembles of bagged decision trees
There is also an improved interface for classification trees
(ClassificationTree
)
and regression trees (RegressionTree
),
encompassing the functionality of classregtree
.
For details, see Ensemble Methods.
The linkage
and clusterdata
functions
have a new savememory
option that can use less
memory than before. With savememory
set to 'on'
,
the functions do not build a pairwise distance matrix, so use less
memory and, depending on problem size, can use less time. You can
use the savememory
option when:
The linkage method
is 'ward'
, 'centroid'
,
or 'median'
The linkage distance metric
is 'euclidean'
(default)
For details, see the linkage
and clusterdata
function
reference pages.
The nlmefit
and nlmefitsa
functions
now provide the conditional weighted residuals of the fit. Use this
information to assess the quality of the model; see Example:
Examining Residuals for Model Verification.
The statset
Options
structure
now includes 'DerivStep'
, which enables you to
set finite differences for gradient estimation.
knnsearch
now
optionally returns all kth nearest neighbors of
points, instead of just one. The knnsearch
methods
for ExhaustiveSearcher
and KDTreeSearcher
also
have this option.
MATLAB functions generated with the Distribution Fitting
Tool now use the fitdist
function
to create fitted probability distribution objects. The generated functions
return probability distribution objects as output arguments.
ncx2cdf
is
now faster and more accurate for large values of the noncentrality
parameter.
If the two categories in a binomial regression model (such as logit
or probit
)
are perfectly separated, the bestfitting model is degenerate with
infinite coefficients. In this case, the glmfit
function
is likely to exceed its iteration limit. glmfit
now
tries to detect this perfect separation and display a diagnostic message.
mdscale
now
enforces that, in each column of the output Y
,
the value with the largest magnitude has a positive sign. This change
makes results consistent across releases and platforms—small
changes used to lead to sign reversals.
Statistics Toolbox now supports parallel execution for the following functions:
For more information, see the Parallel Statistics chapter in the User's Guide.
New filter algorithm, relieff
,
is based on nearest neighbors. The ReliefF algorithm accounts for
correlations among predictors by computing the effect of every predictor
on the class label (or true response for regression) locally and then
integrates these local estimates over the entire predictor space.
nlmefit
now
supports the following error models:
combined
constant
exponential
proportional
You can specify an error model with both nlmefitsa
and nlmefit
.
The nlmefit
bic
calculation
has changed. Now the degrees of freedom value is based on the number
of groups rather than the number of observations. This conforms with
the bic
definition used by the nlmefitsa
function.
Both nlmefit
and nlmefitsa
now
store the estimated error parameters in the errorparm
field
of the output stats
structure. The rmse
field
of the structure now contains the root mean squared residual for all
error models; this value is computed on the log scale for the exponential
model.
In the previous release, the rmse
field was
used by nlmefitsa
for both mean squared residual
and the estimated error parameter. Change your code, if necessary,
to address the appropriate field in the stats
structure.
As described in nlmefit Support for Error Models, and nlmefitsa changes, nlmefit
now
calculates different bic
values than in previous
releases.
The new surrogate splits feature in classregtree
allows for
better handling of missing values, more accurate estimation of variable
importance, and calculation of the predictive measure of association
between variables.
TreeBagger
and CompactTreeBagger
classes
have two new properties:
NVarSplit
provides the number of
decision splits for each predictor variable.
VarAssoc
provides a measure of association
between pairs of predictor variables.
The distribution fitting GUI (dfittool
)
now allows you to export fits to the MATLAB workspace as probability
distribution fit objects. For more information, see Modeling
Data Using the Distribution Fitting Tool.
If you load a distribution fitting session that was created with previous versions of Statistics Toolbox, you cannot save an existing fit. Fit the distribution again to enable saving.
partialcorr
now
accepts a new syntax, RHO = partialcorr(X)
, which
returns the sample linear partial correlation coefficients between
pairs of variables in X
, controlling for the remaining
variables in X
. For more information, see the function
reference page.
quantile
now
accepts a new syntax, Y = quantile(X,N,...)
, which
returns quantiles at the cumulative probabilities (1:N
)/(N
+1)
where N
is a scalar positive integer value.
scatterhist
now
accepts three parameter name/value pairs that control where and how
the histogram plots appear. The new parameter names are NBins
, Location
,
and Direction
. For more information, see the function
reference page.
bootci
has
a new output option which returns the bootstrapped statistic computed
for each of the NBoot
bootstrap replicate samples.
For more information, see the function reference page.
New stochastic algorithm for fitting NLME models is more robust
with respect to starting values, enables parameter transformations,
and relaxes assumption of constant error variance. See nlmefitsa
.
New functions for kNearest Neighbor (kNN) search efficiently to find the closest points to any query point. For information, see kNearest Neighbor Search and Radius Search.
A new option in the perfcurve
function
computes confidence intervals for classifier performance curves.
Statistics Toolbox now supports parallel execution for the following functions:
For more information on parallel computing in the Statistics Toolbox, see Parallel Computing Support for Resampling Methods.
dataset.unstack
converts
a "tall" dataset array to an equivalent dataset array
that is in "wide format", by "unstacking" a single variable in the
tall dataset array into multiple variables in wide. dataset.stack
reverses
this manipulation by converting a "wide" dataset array
to an equivalent dataset array that is in "tall format", by "stacking
up" multiple variables in the wide dataset array into a single variable
in tall.
Statistics Toolbox now supports importing and exporting
files in SAS Transport (.xpt) format. For more information, see the xptread
and dataset.export
reference
pages.
An enhanced dataset.join
method
provides additional types of join operations:
join
can now perform more complicated
inner and outer join operations that allow a manytomany correspondence
between dataset arrays A
and B
,
and allow unmatched observations in either A
or B
.
join
can be of Type
'inner'
, 'leftouter'
, 'rightouter'
, 'fullouter'
,
or 'outer'
(which is a synonym for 'fullouter'
).
For an inner join, the dataset array, C
, only
contains observations corresponding to a combination of key values
that occurred in both A
and B
.
For a left (or right) outer join, C
also contains
observations corresponding to keys in A
(or B
)
that did not match any in B
(or A
).
join
can now return index vectors
indicating the correspondence between observations in C
and
those in A
and B
.
join
now supports using multiple
keys.
join
now supports an optional parameter
for specifying missing key behavior rather than raising an error.
An enhanced dataset.export
method
now supports exporting directly to Microsoft^{®} Excel^{®} files.
The NaiveBayes
classification
object is suitable for data sets that contain many predictors or features.
It supports normal, kernel, multinomial, and multivariate multinomial distributions.
New classification objects, TreeBagger
and CompactTreeBagger
,
provide improved performance through bootstrap aggregation (bagging).
Includes Breiman's "random forest" method.
Enhanced classregtree
has
more options for growing and pruning trees.
New perfcurve
function
provides graphical method to evaluate classification results.
Includes ROC (receiver operating characteristic) and other curves.
Provides a consistent interface for working with probability distributions.
Can be created directly using the ProbDistUnivParam
constructor, or fit
to data using the fitdist
function.
Option to fit distributions by group.
Includes kernel object methods and parametric object methods that you can use to analyze the distribution represented by the object.
Includes kernel object properties and parametric object properties that you can access to determine the fit results and evaluate their accuracy.
Related enhancements in the chi2gof
, histfit
, kstest
, probplot
,
and qqplot
functions.
The new confusionmat
function
tabulates misclassifications by comparing known and predicted classes
of observations.
Dataset arrays constructed by the dataset
function
can now be written to an external text file using the new export
function.
When reading external text files into a dataset array, dataset
has
a new 'TreatAsEmpty'
parameter for specifying strings
to be treated as empty.
In previous versions, dataset
used eval
to
evaluate strings in external text files before writing them into a
dataset array. As a result, strings such as '1/1/2008'
were
treated as numerical expressions with two divides. Now, dataset
treats
such expressions as strings, and writes a string variable into the
dataset array whenever a column in the external file contains a string
that does not represent a valid scalar value.
The crossvalidation function, crossval
,
has new options for directly specifying loss functions for meansquared
error or misclassification rate, without having to provide a separate
function Mfile.
The procrustes
function
has new options for computing linear transformations without scale
or reflection components.
The multivariate normal functions mvnpdf
, mvncdf
,
and mvnrnd
now
accept vector specification of diagonal covariance matrices, with
corresponding gains in computational efficiency.
The hypergeometric
distribution has been added to both the disttool
and randtool
graphical
user interfaces.
The ksdensity
function
may give different answers for the case where there are censoring
times beyond the last observed value. In this case, ksdensity
tries
to reduce the bias in its density estimate by folding kernel functions
across a folding point so that they do not extend into the area that
is completely censored. Two things have changed for this release:
In previous releases the folding point was the last observed value. In this release it is the first censoring time after the last observed value.
The folding procedure is applied not just when the 'function'
parameter
is 'pdf'
, but for all 'function'
values.
The new nlmefit
function
fits nonlinear mixedeffects
models to data with both fixed and random sources of variation.
Mixedeffects models are commonly used with data over multiple groups,
where measurements are correlated within groups but independent between
groups.
The boxplot
function
has new options for handling multiple grouping variables and extreme
outliers.
The lsline
, gline
, refline
,
and refcurve
functions
now work with scatter plots produced by the scatter
function.
In previous versions, these functions worked only with scatter plots
produced by the plot
function.
The following visualization functions now have custom data cursors, displaying information such as observation numbers, group numbers, and the values of related variables:
Changes to boxplot
have
altered a number of default behaviors:
Box labels are now drawn as text objects rather than tick labels. Any code that customizes the box labels by changing tick marks should now set the tick locations as well as the tick labels.
The function no longer returns a handles array with
a fixed number handles, and the order and meaning of the handles now
depends on which options are selected. To locate a handle of interest,
search for its 'Tag'
property using findobj
. 'Tag'
values
for box plot components are listed on the boxplot
reference
page.
There are now valid handles for outliers, even when
boxes have no outliers. In previous releases, the handles array returned
by the function had NaN
values
in place of handles when boxes had no outliers. Now the 'xdata'
and 'ydata'
for
outliers are NaN
when there are no outliers.
For small groups, the 'notch'
parameter
sometimes produces notches that extend outside of the box. In previous
releases, the notch was truncated to the extent of the box, which
could produce a misleading display. A new value of 'markers'
for
this parameter avoids the display issue.
As a consequence, the anova1
function,
which displays notched box plots for grouped data, may show notches
that extend outside the boxes.
The statistics options structure created by statset
now
includes a Jacobian
field to specify whether or
not an objective function can return the Jacobian as a second output.
Bootstrap confidence intervals computed by bootci
are
now more accurate for lumpy data.
The formula for bootci
confidence intervals
of type 'bca'
or 'cper'
involves
the proportion of bootstrap statistics less than the observed statistic.
The formula now takes into account cases where there are many bootstrap
statistics exactly equal to the observed statistic.
Two new crossvalidation functions, cvpartition
and crossval
,
partition data and assess models in regression, classification, and
clustering applications.
A new sequential
feature selection function, sequentialfs
,
selects predictor subsets that optimize userdefined prediction criteria.
The new nnmf
function
performs nonnegative
matrix factorization (NMF) for dimension reduction.
The new sobolset
and haltonset
functions
produce quasirandom point sets for applications in Monte Carlo integration,
spacefilling experimental designs, and global optimization. Options
allow you to skip, leap over, and scramble the points. The qrandstream
function
provides corresponding quasirandom number streams for intermittent
sampling.
The new plsregress
function
performs partial
leastsquares regression for data with correlated predictors.
The normspec
function
now shades regions of a normal density curve that are either inside
or outside specification limits.
The statistics options structure created by statset
now
includes fields for TolTypeFun
and TolTypeX
,
to specify tolerances on objective functions and parameter values,
respectively.
The new gmdistribution
class represents Gaussian mixture distributions,
where random points come from different multivariate normal distributions
with certain probabilities. The gmdistribution
constructor
creates mixture models with specified means, covariances, and mixture
proportions, or by fitting a mixture model with a specified number
of components to data. Methods for the class include:
The cluster
function
for hierarchical clustering now accepts a vector of cutoff values,
and returns a matrix of cluster assignments, with one column per cutoff
value.
The kmeans
function
now returns a vector of cluster indices of length n,
where n is the number of rows in the input data
matrix X
, even when X
contains NaN
values.
In the past, rows of X
with NaN
values
were ignored, and the vector of cluster indices was correspondingly
reduced in size. Now the vector of cluster indices contains NaN
values
where rows have been ignored, consistent with other toolbox functions.
The kstest
function
now uses a more accurate method to calculate the pvalue
for a singlesample KolmogorovSmirnov test.
kstest
now compares the computed pvalue
to the desired cutoff, rather than comparing the test statistic to
a table of values. Results may differ from those in previous releases,
especially for small samples in twosided tests where an asymptotic
formula was used in the past.
A new fitting function, copulafit
,
has been added to the family of functions that describe dependencies
among variables using copulas.
The function fits parametric copulas to data, providing a link between
models of marginal distributions and models of data correlations.
A number of probability functions now have improved accuracy, especially for extreme parameter values. The functions are:
betainv
—
More accurate for probabilities in P
near 1.
binocdf
—
More efficient and less likely to run out of memory for large values
in X
.
binopdf
—
More accurate when the probabilities in P
are on
the order of eps
.
fcdf
—
More accurate when the parameter ratios V2./V1
are
much less than the values in X
.
ncx2cdf
—
More accurate in some extreme cases that previously returned 0
.
poisscdf
—
More efficient and less likely to run out of memory for large values
in X
.
tcdf
—
More accurate when the squares of the values in X
are
much less than the parameters in V
.
tinv
—
More accurate when the probabilities in P
are very
close to 0.5 and the outputs are very small in magnitude.
Functionstyle syntax for paretotails
objects
has been removed.
The changes to the probability functions listed above may lead to different, but more accurate, outputs than in previous releases.
In previous releases, syntax of the form obj(x)
for
a paretotails
objects obj
invoked
the cdf
method. This
syntax now produces a warning. To evaluate the cumulative distribution
function, use the syntax cdf(obj,x)
.
The new corrcov
function
converts a covariance matrix to the corresponding correlation matrix.
The mvregress
function
now supports an option to force the estimated covariance matrix to
be diagonal.
In previous releases the mvregress
function,
when using the 'cwls'
algorithm, estimated the
covariance of coefficients COVB
using the estimated,
rather than the initial, covariance of the responses SIGMA
.
The initial SIGMA
is now used, and COVB
differs
to a degree dependent on the difference between the initial and final
estimates of SIGMA
.
The boxplot
function
has a new 'compact'
plot style suitable for displaying
large numbers of groups.
New categorical and dataset arrays are available for organizing and processing statistical data.
Categorical arrays facilitate the use of nominal and ordinal categorical data.
Dataset arrays provide a natural way to encapsulate heterogeneous statistical data and metadata, so that it can be accessed and manipulated using familiar methods analogous to those for numerical matrices.
Categorical and dataset arrays are supported by a variety of new functions for manipulating the encapsulated data.
Categorical arrays are now accepted as input arguments in all Statistics Toolbox functions that make use of grouping variables.
Expanded options are available for linear hypothesis testing.
The new linhyptest
function
performs linear hypothesis tests on parameters such as regression
coefficients. These tests have the form H*b = c
for
specified values of H
and c
,
where b
is a vector of unknown parameters.
The covb
output from regstats
and
the SIGMA
output from nlinfit
are
suitable for use as the covariance matrix input argument required
by linhyptest
. The following functions have been
modified to return a covb
output for use with linhyptest
: coxphfit
, glmfit
, mnrfit
, robustfit
.
The new cholcov
function
computes a Choleskylike decomposition of a covariance matrix, even
if the matrix is not positive definite. Factors are useful in many
of the same ways as Cholesky factors, such as imposing correlation
on random number generators.
The classify
function
for discriminant analysis has been improved.
The function now computes the coefficients of the discriminant functions that define boundaries between classification regions.
The output of the function is now of the same type
as the input grouping variable group
.
The classify
function now returns outputs
of different type than it did in the past. If the input argument group
is
a logical vector, output is now converted to a logical vector. In
the past, output was returned as a cell array of 0
s
and 1
s. If group
is numeric,
the output is now converted to the same type. For example, if group
is
of type uint8
, the output will be of type uint8
.
New paretotails
objects are available for
modeling distributions with an empirical cdf or similar distribution
in the center and generalized Pareto distributions in the tails.
The paretotails
function
converts a data sample to a paretotails
object.
The objects are useful for generating random samples from a distribution
similar to the data, but with tail behavior that is less discrete
than the empirical distribution.
Objects from the paretotails
class
are supported by a variety of new methods for working with the piecewise
distribution.
The paretotails
class provides
functionlike behavior, so that p(x)
evaluates
the cdf of p
at values x
.
The new mvregresslike
function
is a utility related to the mvregress
function
for fitting regression models to multivariate data with missing values.
The new function computes the objective (log likelihood) function,
and can also compute the estimated covariance matrix for the parameter
estimates.
New classregtree
objects are available
for creating and analyzing classification and regression trees.
The classregtree
function
fits a classification or regression tree to training data. The objects
are useful for predicting response values from new predictors.
Objects from the classregtree
class
are supported by a variety of new methods for accessing information
about the tree.
The classregtree
class provides
functionlike behavior, so that t(X)
evaluates
the tree t
at predictor values in X
.
The following functions now create or operate on objects
from the new classregtree
class: treefit
, treedisp
, treeval
, treefit
, treeprune
, treetest
.
Objects from the classregtree
class are intended
to be compatible with the structure arrays that were produced in previous
versions by the classification and regression tree functions listed
above. In particular, classregtree
supports dot
indexing of the form t.property
to obtain properties
of the object t
. The class also provides functionlike
behavior through parenthesis indexing, so that t(x)
uses
the tree t
to classify or compute fitted values
for predictors x
, rather than index into t
as
a structure array as it did in the past. As a result, cell arrays
should now be used to aggregate classregtree
objects.
The new scatterhist
function
produces a scatterplot of 2D data and illustrates the marginal distributions
of the variables by drawing histograms along the two axes. The function
is also useful for viewing properties of random samples produced by
functions such as copularnd
, mvnrnd
,
and lhsdesign
.
The following demo has been updated:
Selecting a Sample Size — Modified to highlight
the new sampsizepwr
function
The following visualization functions, commonly used in the design of experiments, have been added:
interactionplot
—
Twofactor interaction plot for the mean
maineffectsplot
—
Main effects plot for the mean
multivarichart
—
Multivari chart for the mean
The following functions for hypothesis testing have been added or improved:
jbtest
—
Replaces the chisquare approximation of the test statistic, which
is asymptotic, with a more accurate algorithm that interpolates pvalues
from a table of quantiles. A new option allows you to run Monte Carlo
simulations to compute pvalues outside of the
table.
lillietest
—
Uses an improved version of Lilliefors' table of quantiles, covering
a wider range of sample sizes and significance levels, with more accurate
values. New options allow you to test for exponential and extreme
value distributions, as well as normal distributions, and to run Monte
Carlo simulations to compute pvalues outside
of the tables.
runstest
—
Adds a test for runs up and down to the existing test for runs above
or below a specified value.
sampsizepwr
—
New function to compute the sample size necessary for a test to have
a specified power. Options are available for choosing a variety of
test types.
If the significance level for a test lies outside the range
of tabulated values, [0.001, 0.5], then both jbtest
and lillietest
now
return an error. In previous versions, jbtest
returned
an approximate pvalue and lillietest
returned
an error outside a smaller range, [0.01, 0.2]. Error messages suggest
using the new Monte Carlo option for computing values outside the
range of tabulated values.
If the data sample for a test leads to a pvalue
outside the range of tabulated values, then both jbtest
and lillietest
now
return, with a warning, either the smallest or largest tabulated value.
In previous versions, jbtest
returned an approximate pvalue
and lillietest
returned NaN
.
Support has been added for multinomial regression modeling of
discrete multicategory response data, including multinomial logistic
regression. The following new functions supplement the regression
models in glmfit
and glmval
by
providing for a wider range of response values:
The new mvregress
function
carries out multivariate regression on data with missing response
values. An option allows you to specify how missing data is handled.
coxphfit
—
A new option allows you to specify the values at which the baseline
hazard is computed.
The following new functions consolidate and expand upon existing functions for statistical process control:
capability
—
Computes a wider range of probabilities and capability indices than
the capable
function found in previous releases
controlchart
—
Displays a wider range of control charts than the ewmaplot
, schart
,
and xbarplot
functions found in previous releases
controlrules
—
Supplements the new controlchart
function by
providing for a wider range of control rules (Western Electric and
Nelson)
gagerr
—
Performs a gage repeatability and reproducibility study on measurements
grouped by operator and part
The capability
function subsumes the capable
function
that appeared in previous versions of Statistics Toolbox software,
and the controlchart
function subsumes the functions ewmaplot
, schart
,
and xbarplot
. The older functions remain in the
toolbox for backwards compatibility, but they are no longer documented
or supported.
Support for nested and continuous factors has been added to
the anovan
function
for Nway analysis of variance.
The following functions have been added to supplement the existing bootstrp
function
for bootstrap estimation:
The following demos have been added to the toolbox:
Bayesian Analysis for a Logistic Regression Model
Time Series Regression of Airline Passenger Data
The following demo has been updated to demonstrate new features:
Random Number Generation
The new fracfactgen
function
finds a set of fractional factorial design generators suitable for
fitting a specified model.
The following functions for Doptimal designs have been enhanced:
cordexch
, daugment
, dcovary
, rowexch
—
New options specify the range of values and the number of levels for
each factor, exclude factor combinations, treat factors as categorical
rather than continuous, control the number of iterations, and repeat
the design generation process from random starting points
candexch
—
New options control the number of iterations and repeat the design
generation process from random starting points
candgen
—
New options specify the range of values and the number of levels for
each factor, and treat factors as categorical rather than continuous
x2fx
—
New option treats factors as categorical rather than continuous
The new dwtest
function
performs a DurbinWatson test for autocorrelation in linear regression.
Two new functions have been added to compute multivariate cdfs. These supplement existing functions for pdfs and random number generators for the same distributions.
New functions have been added to the toolbox that allow you to use copulas to model correlated multivariate data and generate random numbers from multivariate distributions.
copulacdf
—
Cumulative distribution function for a copula
copulaparam
—
Copula parameters as a function of rank correlation
copulapdf
—
Probability density function for a copula
copularnd
—
Random numbers from a copula
copulastat
—
Rank correlation for a copula
The following functions generate random numbers from nonstandard distributions using Markov Chain Monte Carlo methods:
mhsample
—
Generate random numbers using the MetropolisHasting algorithm
slicesample
—
Generate random numbers using a slice sampling algorithm
The following demos have been added to the toolbox:
Curve Fitting and Distribution Fitting
Fitting a Univariate Distribution Using Cumulative Probabilities
Fitting an Orthogonal Regression Using Principal Components Analysis
Modelling Tail Data with the Generalized Pareto Distribution
Pitfalls in Fitting Nonlinear Models by Transforming to Linearity
Weighted Nonlinear Regression
The following demo has been updated:
Modelling Data with the Generalized Extreme Value Distribution
The new partialcorr
function
computes the correlation of one set of variables while controlling
for a second set of variables.
The grpstats
function
now computes a wider variety of descriptive statistics for grouped
data. Choices include the mean, standard error of the mean, number
of elements, group name, standard deviation, variance, confidence
interval for the mean, and confidence interval for new observations.
The function also supports the computation of userdefined statistics.
The new chi2gof
function
tests if a sample comes from a specified distribution, against the
alternative that it does not come from that distribution, using a
chisquare test statistic.
Three functions have been added to test sample variances:
vartest
—
Onesample chisquare variance test. Tests if a sample comes from
a normal distribution with specified variance, against the alternative
that it comes from a normal distribution with a different variance.
vartest2
—
Twosample Ftest for equal variances. Tests
if two independent samples come from normal distributions with the
same variance, against the alternative that they come from normal
distributions with different variances.
vartestn
—
Bartlett multiplesample test for equal variances. Tests if multiple
samples come from normal distributions with the same variance, against
the alternative that they come from normal distributions with different
variances.
The new ansaribradley
function
tests if two independent samples come from the same distribution,
against the alternative that they come from distributions that have
the same median and shape but different variances.
The new runstest
function
tests if a sequence of values comes in random order, against the alternative
that the ordering is not random.
Support has been added for two new distributions:
The Generalized Extreme Value distribution combines the Gumbel, Frechet, and Weibull distributions into a single distribution. It is used to model extreme values in data.
The following distribution functions have been added:
The Generalized Pareto distribution is used to model the tails of a data distribution.
The following distribution functions have been added:
The cophenet
function
now returns cophenetic distances as well as the cophenetic correlation
coefficient.
Release  Features or Changes with Compatibility Considerations 

R2015a  
R2014b  None 
R2014a  
R2013b  None 
R2013a  Probability distribution enhancements 
R2012b  
R2012a  
R2011b  Conversion of Error and Warning Message Identifiers 
R2011a  None 
R2010b  
R2010a  None 
R2009b  None 
R2009a  None 
R2008b  
R2008a  Descriptive Statistics 
R2007b  
R2007a  
R2006b  
R2006a  None 
R14SP3  None 
R14SP2  None 