MATLAB Examples

Load the Fisher iris sample data. The data contains length and width measurements from the sepals and petals of three species of iris flowers. Store the petal length data for the versicolor

Use quantile-quantile (q-q) plots to determine whether two samples come from the same distribution family. Q-Q plots are scatter plots of quantiles computed from each sample, with a line

Use normal probability plots to assess whether data comes from a normal distribution. Many statistical procedures make the assumption that an underlying distribution is normal. Normal

Use anovan to fit models where a factor's levels represent a random selection from a larger (infinite) set of possible levels.

Perform N-way ANOVA on car data with mileage and other information on 406 cars made between 1970 and 1982.

Perform one-way ANOVA to determine whether data from several groups have a common mean.

Perform two-way ANOVA to determine the effect of car model and factory on the mileage rating of cars.

Load the sample data.

Perform statistical analysis and machine learning on out-of-memory data with MATLAB® and Statistics and Machine Learning Toolbox™.

Use logistic regression and other techniques to perform data analysis on tall arrays. Tall arrays represent data that is too large to fit into computer memory.

Use Bayesian optimization to select optimal parameters for training a kernel classifier by using the 'OptimizeHyperparameters' name-value pair argument. The sample data set

Generate a nonlinear classifier with Gaussian kernel function. First, generate one class of points inside the unit disk in two dimensions, and another class of points in the annulus from

Perform linear and quadratic classification of Fisher iris data.

Create a classification tree ensemble for the ionosphere data set, and use it to predict the classification of a radar return with average measurements.

Use a random subspace ensemble to increase the accuracy of classification. It also shows how to use cross validation to determine good parameters for both the weak learner template and the

You can also use ensembles of decision trees for classification. For this example, use ionosphere data with 351 observations and 34 real-valued predictors. The response variable is

When you have missing data, trees and ensembles of trees give better predictions when they include surrogate splits. Furthermore, estimates of predictor importance are often different

Obtain the benefits of the LPBoost and TotalBoost algorithms. These algorithms share two beneficial characteristics:

The RobustBoost algorithm can make good classification predictions even when the training data has noise. However, the default RobustBoost parameters can produce an ensemble that does

Make a more robust and simpler model by trying to remove predictors without hurting the predictive power of the model. This is especially important when you have many predictors in your data.

Predict posterior probabilities of SVM models over a grid of observations, and then plot the posterior probabilities over the grid. Plotting posterior probabilities exposes decision

Determine which quadrant of an image a shape occupies by training an error-correcting output codes (ECOC) model comprised of linear SVM binary learners. This example also illustrates the

Train a basic discriminant analysis classifier to classify irises in Fisher's iris data.

Train an ensemble of classification trees using data containing predictors with many categorical levels.

Perform classification using discriminant analysis, naive Bayes classifiers, and decision trees. Suppose you have a data set containing observations with measurements on different

Build an automated credit rating tool.

Use a custom kernel function, such as the sigmoid kernel, to train SVM classifiers, and adjust custom kernel function parameters.

Train an ensemble of classification trees with unequal classification costs. This example uses data on patients with hepatitis to see if they live or die as a result of the disease. The data

Optimize an SVM classification using the bayesopt function. The classification works on locations of points from a Gaussian mixture model. In The Elements of Statistical Learning ,

Optimize an SVM classification using the fitcsvm function and OptimizeHyperparameters name-value pair. The classification works on locations of points from a Gaussian mixture model. In

Visualize posterior classification probabilities predicted by a naive Bayes classification model.

Perform five-fold cross validation of a quadratic discriminant analysis classifier.

Plot the decision surface of different classification algorithms.

Classify when one class has many more observations than another. Try the RUSBoost algorithm first, because it is designed to handle this case.

Find the indices of the three nearest observations in X to each observation in Y with respect to the chi-square distance. This distance metric is used in correspondence analysis,

Classify query data by:

Predict classification for a k -nearest neighbor classifier.

Examine the quality of a k -nearest neighbor classifier using resubstitution and cross validation.

Modify a k -nearest neighbor classifier.

Construct a k -nearest neighbor classifier for the Fisher iris data.

Examine similarities and dissimilarities of observations or objects using cluster analysis in Statistics and Machine Learning Toolbox™. Data often fall naturally into groups, or

Add a MATLAB Function block to a Simulink® for label prediction. The MATLAB Function block accepts streaming data, and predicts the label and classification score using a trained, support

Generate C code from a MATLAB function that classifies images of digits using a trained classification model. This example demonstrates an alternative workflow to Digit Classification

Specify variable-size input arguments when you generate code for the object functions of classification and regression model objects. Variable-size data is data whose size might change

Generate code for the prediction of classification and regression model objects at the command line. You can also generate code using the MATLAB® Coder™ app. See Code Generation for

Generate C code from a MATLAB® System object™ that classifies images of digits using a trained classification model. This example also shows how to use the System object for classification

Use a Stateflow® chart for label prediction. The example trains a discriminant analysis model for the Fisher iris data set by using fitcdiscr, and defines a function for code generation that

Generate C/C++ code for the prediction of classification and regression model objects by using the MATLAB® Coder™ app. You can also generate code at the command line using codegen . See Code

The object functions knnsearch and rangesearch of the nearest neighbor searcher objects, ExhaustiveSearcher and KDTreeSearcher , support code generation. This example shows how to

Create scatter plots using grouped sample data.

Several common indexing and searching methods.

Several indexing and searching methods for categorical arrays.

Compute and compare measures of dispersion for sample data that contains one outlier.

Explore the distribution of data using descriptive statistics.

Plot data grouped by the levels of a categorical variable.

Categorize numeric data into a categorical ordinal array using ordinal . This is useful for discretizing continuous data.

Work with dataset array variables and their data.

Select an observation or subset of observations from a dataset array.

Determine sorting order for ordinal arrays.

Sort observations (rows) in a dataset array using the command line. You can also sort rows using the Variables editor.

Create nominal arrays using nominal .

Compute and compare measures of location for sample data that contains one outlier.

Compute summary statistics grouped by levels of a categorical variable. You can compute group summary statistics for a numeric array or a dataset array using grpstats .

Create a 3-by-3 matrix of sample data. Remove two data values by replacing them with NaN .

Create a dataset array from a numeric array existing in the MATLAB® workspace.

Change the labels for category levels in categorical arrays using setlabels . You also have the option to specify labels when creating a categorical array.

Reorder the category levels in nominal arrays using reorderlevels . By definition, nominal array categories have no natural ordering. However, you might want to change the order of levels

Merge categories in a nominal or ordinal array using mergelevels . This is useful for collapsing categories with few observations.

Create a dataset array from heterogeneous variables existing in the MATLAB® workspace.

Create ordinal arrays using ordinal .

Reorder the category levels in an ordinal array using reorderlevels .

Visualize multivariate data using various statistical plots. Many statistical analyses involve only two variables: a predictor variable and a response variable. Such data are easy to

Perform linear and stepwise regression analyses using dataset arrays.

Use cmdscale to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

Analyze if companies within the same sector experience similar week-to-week changes in stock price.

Perform nonnegative matrix factorization.

Use Procrustes analysis to compare two handwritten number threes. Visually and analytically explore the effects of forcing size and reflection changes.

Visualize dissimilarity data using nonclassical forms of multidimensional scaling (MDS).

Use Principal Components Analysis (PCA) to fit a linear regression. PCA minimizes the perpendicular distances from the data to the fitted model. This is the linear case of what is known as

Perform classical multidimensional scaling using the cmdscale function in Statistics and Machine Learning Toolbox™. Classical multidimensional scaling, also known as Principal

Select features for classifying high-dimensional data. More specifically, it shows how to perform sequential feature selection, which is one of the most popular feature selection

Perform factor analysis using Statistics and Machine Learning Toolbox™.

Tune the regularization parameter in fscnca using cross-validation. Tuning the regularization parameter helps to correctly detect the relevant features in the data.

Perform feature selection that is robust to outliers using a custom robust loss function in NCA.

Use an output function in tsne.

The effects of various tsne settings.

A complete workflow for feature extraction from image data.

Visualize the MNIST data, which consists of images of handwritten digits, using the tsne function. The images are 28-by-28 pixels in grayscale. Each image has an associated label from 0

Use rica to disentangle mixed audio signals. You can use rica to perform independent component analysis (ICA) when prewhitening is included as a preprocessing step. The ICA model is

Hypothesis testing is a common method of drawing inferences about a population based on statistical evidence from a sample.

Determine the number of samples or observations needed to carry out a statistical test. It illustrates sample size calculations for a simple problem, then shows how to use the sampsizepwr

Estimate and plot the cumulative hazard and survivor functions for different groups.

Find the empirical survivor functions and the parametric survivor functions using the Burr type XII distribution fit to data for two groups.

Construct a Cox proportional hazards model, and assess the significance of the predictor variables.

Convert survival data to counting process form and then construct a Cox proportional hazards model with time-dependent covariates.

Compute and plot the pdf of a Poisson distribution with parameter lambda = 5 .

Use copulafit to calibrate copulas with data. To generate data Xsim with a distribution "just like" (in terms of marginal distributions and correlations) the distribution of data in the

Similar to the bootstrap is the jackknife, which uses resampling to estimate the bias of a sample statistic. Sometimes it is also used to estimate standard error of the sample statistic. The

Plot the pdf of a bivariate Student's t distribution. You can use this distribution for a higher number of dimensions as well, although visualization is not easy.

Compute and plot the pdf using four different values for the parameter r , the desired number of successes: .1 , 1 , 3 , and 6 . In each case, the probability of success p is .5 .

As for all discrete distributions, the cdf is a step function. The plot shows the discrete uniform cdf for N = 10.

Compute and plot the pdf of a multivariate normal distribution.

The bootstrap procedure involves choosing random samples with replacement from a data set and analyzing each sample the same way. Sampling with replacement means that each observation is

Compute the pdf of an F distribution with 5 numerator degrees of freedom and 3 denominator degrees of freedom.

Compute the pdf of a gamma distribution with parameters A = 100 and B = 10 . For comparison, also compute the pdf of a normal distribution with parameters mu = 1000 and sigma = 100 .

Compute the pdf of an exponential distribution with parameter mu = 2 .

Pick a random sample of 10 from a list of 553 items.

Suppose the income of a family of four in the United States follows a lognormal distribution with mu = log(20,000) and sigma = 1 . Compute and plot the income density.

Compute the pdf for a Student's t distribution with parameter nu = 5 , and for a standard normal distribution.

Compute the pdf of a chi-square distribution with 4 degrees of freedom.

Compute and plot the cdf of a hypergeometric distribution.

Since the bivariate normal distribution is defined on the plane, you can also compute cumulative probabilities over rectangular regions.

Generate examples of probability density functions for the three basic forms of the generalized extreme value distribution.

Suppose the probability of a five-year-old car battery not starting in cold weather is 0.03. What is the probability of the car starting for 25 consecutive days during a long cold snap?

Use haltonset to construct a 2-D Halton quasi-random point set.

Compute the pdf of an extreme value distribution.

Compute the pdf of three generalized Pareto distributions. The first has shape parameter k = -0.25 , the second has k = 0 , and the third has k = 1 .

The lognrnd function simulates independent lognormal random variables. In the following example, the mvnrnd function generates n pairs of independent normal random variables, and then

In this example, use a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric

Use Cook's Distance to determine the outliers in the data.

Test for the significance of the regression coefficients using t-statistic.

Display R-squared (coefficient of determination) and adjusted R-squared. Load the sample data and define the response and independent variables.

Fit a linear regression model. A typical workflow involves the following: import data, fit a regression, test its quality, modify it to improve the quality, and share it.

Compute the covariance matrix and standard errors of the coefficients.

Use the CovRatio statistics to determine the influential points in data. Load the sample data and define the response and predictor variables.

Uses a bagged ensemble so it can use all three methods of evaluating ensemble quality.

Identify and remove redundant predictors from a generalized linear model.

Uses data for predicting the insurance risk of a car based on its many attributes.

View a classification or regression tree. There are two ways to view a tree: view(tree) returns a text description and view(tree,'mode','graph') returns a graphic description of the tree.

Determine the observations that are influential on the fitted response values using Dffits values. Load the sample data and define the response and independent variables.

Test for autocorrelation among the residuals of a linear regression model.

Fit a generalized linear model and analyze the results. A typical workflow involves the following: import data, fit a generalized linear model, test its quality, modify it to improve the

Determine the observations that have large influence on coefficients using Dfbetas . Load the sample data and define the response and independent variables.

Compute coefficient confidence intervals.

Regularize binomial regression. The default (canonical) link function for binomial regression is the logistic function.

Compute Leverage values and assess high leverage observations. Load the sample data and define the response and independent variables.

Assess the model assumptions by examining the residuals of a fitted linear regression model.

Assess the fit of the model and the significance of the regression coefficients using the F-statistic.

There are diagnostic plots to help you examine the quality of a model. plotDiagnostics(mdl) gives a variety of plots, including leverage and Cook's distance plots. plotResiduals(mdl)

Use the methods predict , feval , and random to predict and simulate responses to new data.

Rare events prediction in complex technical systems has been very interesting and critical issue for many industrial and commercial fields due to huge increase of sensors and rapid growth

This example was authored by the MathWorks community.

Among many statistical anomaly detection techniques, Hotelling’s T-square method, a multivariate statistical analysis technique, has been one of the most typical method. This method

Though Hotelling’s T-square method is applicable for many multi-dimensional data sets, this method has a fundamental assumption that the data follow a unimodal distribution. So, when the

The time series from a SDOF is computed using the central difference method, and a white noise is used as an input force.

The previous methods, Hotelling’s T-square method and Gaussian mixture model, use Gaussian distribution-based parametric model. However, in practical situation, sometimes data

This demo showcases visualization and analysis (heavy statistics) for forecasting energy usage based on historical data. We have access to hour-by-hour utility usage for the month of

Demo file for the Data Management and Statistics Webinar. This demo requires the Statistics Toolbox and was created using MATLAB 7.7 (R2008b).

Statins are the most common class of drugs used for treating hyperlipdemia. However, studies have shown that even at their maximum dosage of 80 mg, many patients do not reach LDL cholesterol

Demonstration of dot product, orthogonality also includes some vector addition. Information from this tutorial is used in qr decomposition and multiple regression regression approach

This tutorial describes multivariate guassians as it walks through the major functioniality of the mmvn toolkit

After Cross-Validation, Optimal 'c' value- Yeilding Best Performance (F-measure)

This tutorial will go over some of the functions available for making inferences and testing hypothesis. I assume that you know how to construct a model using encode. If not see the

Linear Mixed-Effect (LME) Models are generalizations of linear regression models for data that is collected and summarized in groups. Linear Mixed- Effects models offer a flexible

This HighChart object enables easy use of the javascript technology provided by http://www.highcharts.com/ to generate interactive and dynamic charts in MATLAB web browser.

In this demo, we will perform statistical analysis on automotive fuel economy data provided by the United States Environmental Protection Agency. We will see how the Statistics Toolbox™

Regress_Bivariate:

The dynamic response of a 100 m high clamped-free steel beam is studied. Simulated time series are used, where the first three eigenmodes have been taken into account. More precisely, the

Linstats package provides a uniform mechanism for building any supported linear model. Once built the same model can be analyzed in many ways including least-squares regression, fit and

Consider the hypercube and an inscribed hypersphere with radius . Then the fraction of the volume of the cube contained in the hypersphere is given by:

In this script, I reproduce the results presented by John D. Holmes in the first part of the chapter 2 of his book: Wind loading of structures [1]. The notations he uses are slightly different in

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Contact your local office