Randomly sample from data, with or without replacement
y = datasample(data,k)
y = datasample(data,k,dim)
y = datasample(___,Name,Value)
y = datasample(s,___)
[y,idx]
= datasample(___)
returns y
= datasample(data
,k
)k
observations sampled uniformly at random, with replacement, from the
data in data
.
returns a sample taken along dimension y
= datasample(data
,k
,dim
)dim
of data
.
uses any of the input arguments in the previous syntaxes followed by one or more
y
= datasample(___,Name,Value
)Name,Value
pair arguments.
uses the random number stream y
= datasample(s
,___)s
to generate random numbers. The option
s
can precede any of the input arguments in the previous syntaxes.
[
also returns an index vector indicating which values y
,idx
]
= datasample(___)datasample
sampled from
data
using any of the input arguments in the previous syntaxes.

Vector, matrix, Ndimensional array, table, or dataset array representing
the data from which to sample. By default, 

Positive integer, the number of samples. 

Integer specifying the dimension to sample. For example, if Default: 

Random number stream. Create Default: The global stream. The 
Specify optional
commaseparated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside single quotes (' '
). You can
specify several name and value pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.

Sample with replacement if Default: 

Vector with the same number of elements as the size of the dimension being sampled. The vector
must have nonnegative elements and at least one positive value ( Default: 

If the input When the sample is taken with replacement (default), 

Vector of indices indicating which elements

Draw five unique values from the integers 1:10
.
s = RandStream('mlfg6331_64'); % For reproducibility y = datasample(s,1:10,5,'Replace',false) y = 9 8 3 6 2
Generate a random sequence of the characters ACGT
,
with replacement, according to specified probabilities.
s = RandStream('mlfg6331_64'); % For reproducibility seq = datasample(s,'ACGT',48,'Weights',[0.15 0.35 0.35 0.15]) seq = 'GGCGGCGCAAGGCGCCGGACCTGGCTGCACGCCGTTCCCTGCTACTCG'
Select a random subset of columns from a data matrix.
rng(10,'twister') % For reproducibility X = randn(10,1000); s = RandStream('mlfg6331_64'); % For reproducibility Y = datasample(s,X,5,2,'Replace',false) Y = 0.4317 0.3327 0.9112 2.3244 0.9559 0.6977 0.7422 0.4578 1.3745 0.8634 0.8543 0.3105 0.9836 0.6434 0.4457 0.1686 0.6609 0.0553 0.1202 1.3699 1.7649 1.1607 0.3513 1.5533 0.0597 0.3821 0.5696 1.6264 0.2104 1.5486 1.6844 0.7148 0.6876 0.4447 1.4615 0.4170 1.3696 1.1874 0.9901 0.5875 0.2410 1.4703 2.5003 1.1321 1.8451 0.6212 1.4118 0.4518 0.8697 0.8093
Resample observations from a dataset array to create a bootstrap replicate dataset. See Bootstrap Resampling for more information about bootstrapping.
load hospital y = datasample(hospital,size(hospital,1));
Use the second output to sample “in parallel" from two data vectors.
x1 = randn(100,1); x2 = randn(100,1); [y1,idx] = datasample(x1,10); y2 = x2(idx);
datasample
uses randperm
, rand
, or randi
to
generate random values. Therefore, datasample
changes
the state of the MATLAB^{®} global random number generator. Control
the random number generator using rng
.
For selecting weighted samples without replacement, datasample
uses
the algorithm of Wong and Easton [1].
You can use randi
or randperm
to generate indices
for random sampling with or without replacement, respectively. However,
datasample
can be more convenient because it samples directly from your
data. datasample
also allows weighted sampling.
[1] Wong, C. K. and M. C. Easton. An Efficient Method for Weighted Sampling Without Replacement. SIAM Journal of Computing 9(1), pp. 111–113, 1980.