Randomly sample from data, with or without replacement
y = datasample(data,k)
y = datasample(data,k,dim)
[y,idx]
= datasample(data,k,...)
[y,...] = datasample(s,data,k,...)
[y,...] = datasample(data,k,Name,Value)
[y,...]
= datasample(data,k,dim,Name,Value)
returns y
= datasample(data
,k
)k
observations
sampled uniformly at random, with replacement, from the data in data
.
returns
a sample taken along dimension y
= datasample(data
,k
,dim
)dim
of data
.
[
returns
an index vector indicating which values y
,idx
]
= datasample(data
,k
,...)datasample
sampled
from data
.
[
uses
the random number stream y
,...] = datasample(s
,data
,k
,...)s
to generate random numbers.
[
or y
,...] = datasample(data
,k
,Name,Value
)[
samples
with additional options specified by one or more y
,...]
= datasample(data
,k
,dim
,Name,Value
)Name,Value
pair
arguments.

Vector, matrix, Ndimensional array, table,
or dataset array representing the data from which to sample. By default, 

Positive integer, the number of samples. 

Integer specifying the dimension on which to take samples. For
example, if Default: 

Random number stream. Create Default: The global random number stream 
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.

Select the sample with replacement if Default: 

Vector with the same number of elements as data elements in Default: 

When the sample is taken with replacement (default), 

Vector of indices indicating which elements

Draw five unique values from the integers 1:10
.
y = datasample(1:10,5,'Replace',false) y = 6 3 7 8 5
Generate a random sequence of the characters ACGT
,
with replacement, according to specified probabilities.
seq = datasample('ACGT',48,'Weights',[0.15 0.35 0.35 0.15]) seq = CTTCGACTGTGAGTGGGCGCGACAAGGCTACCGGCCCGGGCGGCACTC
Select a random subset of columns from a data matrix.
X = randn(10,1000); Y = datasample(X,5,2,'Replace',false) Y = 0.7007 0.3382 2.1298 0.1891 0.5026 0.6520 0.6693 0.1961 0.9915 1.9107 0.1785 0.6640 2.3247 1.1735 1.0020 1.6760 2.6102 0.8902 0.7735 1.8676 0.3251 0.6415 0.2572 0.1629 1.0523 0.1011 0.9323 1.3088 0.4477 0.8036 0.5767 0.5778 0.8556 0.8672 0.0727 0.0615 0.9084 0.9020 0.4185 1.9520 0.7256 1.1228 0.7558 1.2691 2.4997 1.2273 0.5754 0.8755 0.8224 1.2066
Resample observations from a dataset array to create a bootstrap replicate dataset.
load hospital y = datasample(hospital,size(hospital,1));
Use the second output to sample "in parallel" from two data vectors.
x1 = randn(100,1); x2 = randn(100,1); [y1,idx] = datasample(x1,10); y2 = x2(idx);
datasample
uses randperm
, rand
, or randi
to
generate random values. Therefore, datasample
changes
the state of the MATLAB^{®} global random number generator. Control
the random number generator using rng
.
For selecting weighted samples without replacement, datasample
uses
the algorithm of Wong and Easton [1].
You can use randi
or randperm
to generate indices for random
sampling with or without replacement, respectively. However, datasample
can
be more convenient because it samples directly from your data. datasample
also
allows weighted sampling.
[1] Wong, C. K. and M. C. Easton. An Efficient Method for Weighted Sampling Without Replacement. SIAM Journal of Computing 9(1), pp. 111–113, 1980.