| Products & Services | Solutions | Academia | Support | User Community | Company |
| Download Product Updates | | | Get Pricing | | | Trial Software |
| Documentation → Statistics Toolbox |
| Contents | Index |
| Learn more about Statistics Toolbox |
| On this page… |
|---|
The bootstrap procedure involves choosing random samples with replacement from a data set and analyzing each sample the same way. Sampling with replacement means that each observation is selected separately at random from the original dataset. So a particular data point from the original data set could appear multiple times in a given bootstrap sample. The number of elements in each bootstrap sample equals the number of elements in the original data set. The range of sample estimates you obtain enables you to establish the uncertainty of the quantity you are estimating.
This example from Efron and Tibshirani [31] compares Law School Admission Test (LSAT) scores and subsequent law school grade point average (GPA) for a sample of 15 law schools.
load lawdata plot(lsat,gpa,'+') lsline

The least-squares fit line indicates that higher LSAT scores go with higher law school GPAs. But how certain is this conclusion? The plot provides some intuition, but nothing quantitative.
You can calculate the correlation coefficient of the variables using the corr function.
rhohat = corr(lsat,gpa)
rhohat =
0.7764Now you have a number describing the positive connection between LSAT and GPA; though it may seem large, you still do not know if it is statistically significant.
Using the bootstrp function you can resample the lsat and gpa vectors as many times as you like and consider the variation in the resulting correlation coefficients.
Here is an example.
rhos1000 = bootstrp(1000,'corr',lsat,gpa);
This command resamples the lsat and gpa vectors 1000 times and computes the corr function on each sample. Here is a histogram of the result.
hist(rhos1000,30) set(get(gca,'Children'),'FaceColor',[.8 .8 1])

Nearly all the estimates lie on the interval [0.4 1.0].
It is often desirable to construct a confidence interval for a parameter estimate in statistical inferences. Using the bootci function, you can use bootstrapping to obtain a confidence interval. The confidence interval for the lsat and gpa data is computed as:
ci = bootci(5000,@corr,lsat,gpa)
ci =
0.3313
0.9427 Therefore, a 95% confidence interval for the correlation coefficient between LSAT and GPA is [0.33 0.94]. This is strong quantitative evidence that LSAT and subsequent GPA are positively correlated. Moreover, this evidence does not require any strong assumptions about the probability distribution of the correlation coefficient.
Although the bootci function computes the Bias Corrected and accelerated (BCa) interval as the default type, it is also able to compute various other types of bootstrap confidence intervals, such as the studentized bootstrap confidence interval.
Similar to the bootstrap is the jackknife, which uses resampling to estimate the bias of a sample statistic. Sometimes it is also used to estimate standard error of the sample statistic. The jackknife is implemented by the Statistics Toolbox function jackknife.
The jackknife resamples systematically, rather than at random as the bootstrap does. For a sample with n points, the jackknife computes sample statistics on n separate samples of size n-1. Each sample is the original data with a single observation omitted.
In the previous bootstrap example you measured the uncertainty in estimating the correlation coefficient. You can use the jackknife to estimate the bias, which is the tendency of the sample correlation to over-estimate or under-estimate the true, unknown correlation. First compute the sample correlation on the data:
load lawdata
rhohat = corr(lsat,gpa)
rhohat =
0.7764
Next compute the correlations for jackknife samples, and compute their mean:
jackrho = jackknife(@corr,lsat,gpa);
meanrho = mean(jackrho)
meanrho =
0.7759
Now compute an estimate of the bias:
n = length(lsat); biasrho = (n-1) * (meanrho-rhohat) biasrho = -0.0065
The sample correlation probably underestimates the true correlation by about this amount.
Parallel computing is the technique of using multiple processors on a single problem. The primary reason to use parallel computing is to shorten the computation time.
Resampling methods all take as input a statistical function and a set of supplied data, and evaluate the statistical function repeatedly, on multiple samples drawn from the supplied data. Resampling methods are statistically informative but they can be very time-consuming. But because the repeat evaluations are independent of one another, you can reduce computation time by performing those repeat evaluations in parallel. The following functions support parallel computing:
These functions use parallel resampling under the following conditions:
You have a license for Parallel Computing Toolbox™ software and the software is installed.
A group of processors has been prepared for parallel computation using the matlabpool command of the Parallel Computing Toolbox.
The option UseParallel is set to 'always'. The default value of this option is 'never'. You specify this option using the 'Options' argument that all of these resampling functions accept.
When these conditions hold, the functions resample in parallel. For more information on the Parallel Computing Toolbox, see Parallel Computing Toolbox User's Guide.
Resampling methods employ the Parallel Computing Toolbox function parfor to perform parallel evaluations. parfor does not work in parallel when called from within another parfor loop. Parallelization occurs only at the outermost level if you combine parallel resampling methods with parallel functionality in your statistical function or in the code that calls the resampling methods.
Suppose, for example, you want to apply the jackknife to your function userfcn, which calls parfor, and you wish to call jackknife in a loop. Suppose also that the conditions for parallel resampling of bootstrp, as given in the section above, are satisfied. The following figure shows three cases:
The outermost loop is parfor. Only that loop runs in parallel.
The outermost parfor loop is in jackknife. Only jackknife runs in parallel.
The outermost parfor loop is in userfcn. userfcn can use parfor in parallel.
When parfor Runs In Parallel

![]() | Measures of Shape | Data with Missing Values | ![]() |

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.
| © 1984-2009- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |