Quantiles of a data set

returns quantiles of the elements in data vector or array `Y`

= quantile(`X`

,`p`

)`X`

for the
cumulative probability or probabilities `p`

in the interval [0,1].

If

`X`

is a vector, then`Y`

is a scalar or a vector having the same length as`p`

.If

`X`

is a matrix, then`Y`

is a row vector or a matrix where the number of rows of`Y`

is equal to the length of`p`

.For multidimensional arrays,

`quantile`

operates along the first nonsingleton dimension of`X`

.

returns quantiles for `Y`

= quantile(`X`

,`N`

)`N`

evenly spaced cumulative probabilities
(1/(`N`

+ 1), 2/(`N`

+ 1), ...,
`N`

/(`N`

+ 1)) for integer `N`

>1.

If

`X`

is a vector, then`Y`

is a scalar or a vector with length`N`

.If

`X`

is a matrix, then`Y`

is a matrix where the number of rows of`Y`

is equal to`N`

.For multidimensional arrays,

`quantile`

operates along the first nonsingleton dimension of`X`

.

returns quantiles over the dimensions specified in the vector `Y`

= quantile(___,`vecdim`

)`vecdim`

for either of the first two syntaxes. For example, if `X`

is a matrix,
then `quantile(X,0.5,[1 2])`

returns the 0.5 quantile of all the elements
of `X`

because every element of a matrix is contained in the array
slice defined by dimensions 1 and 2.

Calculate the quantiles of a data set for specified probabilities.

Generate a data set of size 10.

rng('default'); % for reproducibility x = normrnd(0,1,1,10)

`x = `*1×10*
0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694

Calculate the 0.3 quantile.

y = quantile(x,0.30)

y = -0.0574

Calculate the quantiles for the cumulative probabilities 0.025, 0.25, 0.5, 0.75, and 0.975.

y = quantile(x,[0.025 0.25 0.50 0.75 0.975])

`y = `*1×5*
-2.2588 -0.4336 0.4401 1.8339 3.5784

Calculate the quantiles of a data set for a given number of quantiles.

Generate a data set of size 10.

rng('default'); % for reproducibility x = normrnd(0,1,1,10)

`x = `*1×10*
0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694

Calculate four evenly spaced quantiles.

y = quantile(x,4)

`y = `*1×4*
-0.8706 0.3307 0.6999 2.3017

Using `y = quantile(x,[0.2,0.4,0.6,0.8])`

is another way to return the four evenly spaced quantiles.

Calculate the quantiles along the columns and rows of a data matrix for specified probabilities.

Generate a 4-by-6 data matrix.

rng default % For reproducibility X = normrnd(0,1,4,6)

`X = `*4×6*
0.5377 0.3188 3.5784 0.7254 -0.1241 0.6715
1.8339 -1.3077 2.7694 -0.0631 1.4897 -1.2075
-2.2588 -0.4336 -1.3499 0.7147 1.4090 0.7172
0.8622 0.3426 3.0349 -0.2050 1.4172 1.6302

Calculate the 0.3 quantile for each column of `X`

(`dim`

= 1).

y = quantile(X,0.3,1)

`y = `*1×6*
-0.3013 -0.6958 1.5336 -0.1056 0.9491 0.1078

`quantile`

returns a row vector `y`

when calculating one quantile for each column of a matrix. For example, `-0.3013`

is the 0.3 quantile of the first column of `X`

with elements (0.5377, 1.8339, -2.2588, 0.8622). Because the default value of `dim`

is 1, you can return the same result with `y = quantile(X,0.3)`

.

Calculate the 0.3 quantile for each row of `X`

(`dim`

= 2).

y = quantile(X,0.3,2)

`y = `*4×1*
0.3844
-0.8642
-1.0750
0.4985

`quantile`

returns a column vector `y`

when calculating one quantile for each row of a matrix. For example `0.3844`

is the 0.3 quantile of the first row of `X`

with elements (0.5377, 0.3188, 3.5784, 0.7254, -0.1241, 0.6715).

Calculate the $$N$$ evenly spaced quantiles along the columns and rows of a data matrix.

Generate a 6-by-10 data matrix.

rng('default'); % for reproducibility X = unidrnd(10,6,7)

`X = `*6×7*
9 3 10 8 7 8 7
10 6 5 10 8 1 4
2 10 9 7 8 3 10
10 10 2 1 4 1 1
7 2 5 9 7 1 5
1 10 10 10 2 9 4

Calculate three evenly spaced quantiles for each column of `X`

(`dim`

= 1).

y = quantile(X,3,1)

`y = `*3×7*
2.0000 3.0000 5.0000 7.0000 4.0000 1.0000 4.0000
8.0000 8.0000 7.0000 8.5000 7.0000 2.0000 4.5000
10.0000 10.0000 10.0000 10.0000 8.0000 8.0000 7.0000

Each column of matrix `y`

corresponds to the three evenly spaced quantiles of each column of matrix `X`

. For example, the first column of `y`

with elements (2, 8, 10) has the quantiles for the first column of `X`

with elements (9, 10, 2, 10, 7, 1). `y = quantile(X,3)`

returns the same answer because the default value of `dim`

is 1.

Calculate three evenly spaced quantiles for each row of `X`

(`dim`

= 2).

y = quantile(X,3,2)

`y = `*6×3*
7.0000 8.0000 8.7500
4.2500 6.0000 9.5000
4.0000 8.0000 9.7500
1.0000 2.0000 8.5000
2.7500 5.0000 7.0000
2.5000 9.0000 10.0000

Each row of matrix `y`

corresponds to the three evenly spaced quantiles of each row of matrix `X`

. For example, the first row of `y`

with elements (7, 8, 8.75) has the quantiles for the first row of `X`

with elements (9, 3, 10, 8, 7, 8, 7).

Calculate the quantiles of a multidimensional array for specified probabilities by using the `'all'`

and `vecdim`

input arguments.

Create a 3-by-5-by-2 array `X`

. Specify the vector of probabilities `p`

.

X = reshape(1:30,[3 5 2])

X = X(:,:,1) = 1 4 7 10 13 2 5 8 11 14 3 6 9 12 15 X(:,:,2) = 16 19 22 25 28 17 20 23 26 29 18 21 24 27 30

p = [0.25 0.75];

Calculate the 0.25 and 0.75 quantiles of all the elements in `X`

.

`Yall = quantile(X,p,'all')`

`Yall = `*2×1*
8
23

`Yall(1)`

is the 0.25 quantile of `X`

, and `Yall(2)`

is the 0.75 quantile of `X`

.

Calculate the 0.25 and 0.75 quantiles for each page of `X`

by specifying dimensions 1 and 2 as the operating dimensions.

Ypage = quantile(X,p,[1 2])

Ypage = Ypage(:,:,1) = 4.2500 11.7500 Ypage(:,:,2) = 19.2500 26.7500

For example, `Ypage(1,1,1)`

is the 0.25 quantile of the first page of `X`

, and `Ypage(2,1,1)`

is the 0.75 quantile of the first page of `X`

.

Calculate the 0.25 and 0.75 quantiles of the elements in each `X(i,:,:)`

slice by specifying dimensions 2 and 3 as the operating dimensions.

Yrow = quantile(X,p,[2 3])

`Yrow = `*3×2*
7 22
8 23
9 24

For example, `Yrow(3,1)`

is the 0.25 quantile of the elements in `X(3,:,:)`

, and `Yrow(3,2)`

is the 0.75 quantile of the elements in `X(3,:,:)`

.

Find median and quartiles of a vector, `x`

, with even number of elements.

Enter the data.

x = [2 5 6 10 11 13]

`x = `*1×6*
2 5 6 10 11 13

Calculate the median of `x`

.

y = quantile(x,0.50)

y = 8

Calculate the quartiles of `x`

.

y = quantile(x,[0.25, 0.5, 0.75])

`y = `*1×3*
5 8 11

Using `y = quantile(x,3)`

is another way to compute the quartiles of `x`

.

These results might be different than the textbook definitions because `quantile`

uses Linear Interpolation to find the median and quartiles.

Find median and quartiles of a vector, `x`

, with odd number of elements.

Enter the data.

x = [2 4 6 8 10 12 14]

`x = `*1×7*
2 4 6 8 10 12 14

Find the median of `x`

.

y = quantile(x,0.50)

y = 8

Find the quartiles of `x`

.

y = quantile(x,[0.25, 0.5, 0.75])

`y = `*1×3*
4.5000 8.0000 11.5000

Using `y = quantile(x,3)`

is another way to compute the quartiles of `x`

.

These results might be different than the textbook definitions because `quantile`

uses Linear Interpolation to find the median and quartiles.

Calculate exact and approximate quantiles of a tall column vector for a given probability.

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the `mapreducer`

function.

Create a datastore for the `airlinesmall`

data set. Treat `'NA'`

values as missing data so that `datastore`

replaces them with `NaN`

values. Specify to work with the `ArrTime`

variable.

ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'SelectedVariableNames','ArrTime');

Create a tall table on top of the datastore, and extract the data from the tall table into a tall vector.

`t = tall(ds) % Tall table`

Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). t = Mx1 tall table ArrTime _______ 735 1124 2218 1431 746 1547 1052 1134 : :

`x = t{:,:} % Tall vector`

x = Mx1 tall double column vector 735 1124 2218 1431 746 1547 1052 1134 : :

Calculate the exact quantile of x for `p`

= 0.5. Because `X`

is a tall column vector and `p`

is a scalar, `quantile`

returns the exact quantile value by default.

```
p = 0.5; % Cumulative probability
yExact = quantile(x,p)
```

yExact = tall double ?

Calculate the approximate quantile of x for `p`

= 0.5. Specify `'Method','approximate'`

to use an approximation algorithm based on T-Digest for computing the quantiles.

yApprox = quantile(x,p,'Method','approximate')

yApprox = MxNx... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

Evaluate the tall arrays and bring the results into memory by using `gather`

.

[yExact,yApprox] = gather(yExact,yApprox)

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 5.3 sec - Pass 2 of 4: Completed in 0.96 sec - Pass 3 of 4: Completed in 1.5 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 13 sec

yExact = 1522

yApprox = 1.5220e+03

The values of the approximate quantile and the exact quantile are the same to the four digits shown.

Calculate exact and approximate quantiles of a tall matrix for specified cumulative probabilities along different dimensions.

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the `mapreducer`

function.

Create a tall matrix `X`

containing a subset of variables from the `airlinesmall`

data set. See Quantiles of Tall Vector for Given Probability for details about the steps to extract data from a tall array.

varnames = {'ArrDelay','ArrTime','DepTime','ActualElapsedTime'}; % Subset of variables in the data set ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'SelectedVariableNames',varnames); % Datastore t = tall(ds); % Tall table

Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 4).

`X = t{:,varnames} % Tall matrix`

X = Mx4 tall double matrix 8 735 642 53 8 1124 1021 63 21 2218 2055 83 13 1431 1332 59 4 746 629 77 59 1547 1446 61 3 1052 928 84 11 1134 859 155 : : : : : : : :

When operating along a dimension that is not 1, the `quantile`

function calculates the exact quantiles only, so that it can perform the computation efficiently using a sorting-based algorithm (see Algorithms) instead of an approximation algorithm based on T-Digest.

Calculate the exact quantiles of `X`

along the second dimension for the cumulative probabilities 0.25, 0.5, and 0.75.

```
p = [0.25 0.50 0.75]; % Vector of cumulative probabilities
Yexact = quantile(X,p,2)
```

Yexact = MxNx... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

When the function operates along the first dimension and `p`

is a vector of cumulative probabilities, you must use the approximation algorithm based on t-digest to compute the quantiles. Using the sorting-based algorithm to find the quantiles along the first dimension of a tall array is computationally intensive.

Calculate the approximate quantiles of `X`

along the first dimension for the cumulative probabilities 0.25, 0.5, and 0.75. Because the default dimension is 1, you do not need to specify a value for `dim`

.

Yapprox = quantile(X,p,'Method','approximate')

Yapprox = MxNx... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

Evaluate the tall arrays and bring the results into memory by using `gather`

.

[Yexact,Yapprox] = gather(Yexact,Yapprox);

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 7.5 sec Evaluation completed in 10 sec

Show the first five rows of the exact quantiles of `X`

(along the second dimension) for the cumulative probabilities 0.25, 0.5, and 0.75.

Yexact(1:5,:)

ans =5×310^{3}× 0.0305 0.3475 0.6885 0.0355 0.5420 1.0725 0.0520 1.0690 2.1365 0.0360 0.6955 1.3815 0.0405 0.3530 0.6875

Each row of the matrix `Yexact`

contains the three quantiles of the corresponding row in `X`

. For example, `30.5`

, `347.5`

, and `688.5`

are the 0.25, 0.5, and 0.75 quantiles, respectively, of the first row in `X`

.

Show the approximate quantiles of `X`

(along the first dimension) for the cumulative probabilities 0.25, 0.5, and 0.75.

Yapprox

Yapprox =3×410^{3}× -0.0070 1.1149 0.9322 0.0700 0 1.5220 1.3350 0.1020 0.0110 1.9180 1.7400 0.1510

Each column of the matrix `Yapprox`

corresponds to the three quantiles for each column of the matrix `X`

. For example, the first column of `Yapprox`

with elements (–7, 0, 11) contains the quantiles for the first column of `X`

.

Calculate exact and approximate quantiles along different dimensions of a tall matrix for `N`

evenly spaced cumulative probabilities.

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the `mapreducer`

function.

Create a tall matrix `X`

containing a subset of variables from the `airlinesmall`

data set. See Quantiles of Tall Vector for Given Probability for details about the steps to extract data from a tall array.

varnames = {'ArrDelay','ArrTime','DepTime','ActualElapsedTime'}; % Subset of variables in the data set ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'SelectedVariableNames',varnames); % Datastore t = tall(ds); % Tall table

Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 4).

X = t{:,varnames}

X = Mx4 tall double matrix 8 735 642 53 8 1124 1021 63 21 2218 2055 83 13 1431 1332 59 4 746 629 77 59 1547 1446 61 3 1052 928 84 11 1134 859 155 : : : : : : : :

To find evenly spaced quantiles along the first dimension, you must use the approximation algorithm based on T-Digest. Using the sorting-based algorithm (see Algorithms) to find quantiles along the first dimension of a tall array is computationally intensive.

Calculate three evenly spaced quantiles along the first dimension of `X`

. Because the default dimension is 1, you do not need to specify a value for `dim`

. Specify `'Method','approximate'`

to use the approximation algorithm.

N = 3; % Number of quantiles Yapprox = quantile(X,N,'Method','approximate')

Yapprox = MxNx... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

To find evenly spaced quantiles along any other dimension (`dim`

is not `1`

), `quantile`

calculates the exact quantiles only, so that it can perform the computation efficiently by using the sorting-based algorithm.

Calculate three evenly spaced quantiles along the second dimension of `X`

. Because `dim`

is not 1, `quantile`

returns the exact quantiles by default.

Yexact = quantile(X,N,2)

Yexact = MxNx... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

Evaluate the tall arrays and bring the results into memory by using `gather`

.

[Yapprox,Yexact] = gather(Yapprox,Yexact);

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 7 sec Evaluation completed in 9.7 sec

Show the approximate quantiles of `X`

(along the first dimension) for the three evenly spaced cumulative probabilities.

Yapprox

Yapprox =3×410^{3}× -0.0070 1.1149 0.9322 0.0700 0 1.5220 1.3350 0.1020 0.0110 1.9180 1.7400 0.1510

Each column of the matrix `Yapprox`

corresponds to the three evenly spaced quantiles for each column of the matrix `X`

. For example, the first column of `Yapprox`

with elements (–7, 0, 11) contains the quantiles for the first column of `X`

.

Show the first five rows of the exact quantiles of `X`

(along the second dimension) for the three evenly spaced cumulative probabilities.

Yexact(1:5,:)

ans =5×310^{3}× 0.0305 0.3475 0.6885 0.0355 0.5420 1.0725 0.0520 1.0690 2.1365 0.0360 0.6955 1.3815 0.0405 0.3530 0.6875

Each row of the matrix `Yexact`

contains the three evenly spaced quantiles of the corresponding row in `X`

. For example, `30.5`

, `347.5`

, and `688.5`

are the 0.25, 0.5, and 0.75 quantiles, respectively, of the first row in `X`

.

`X`

— Input datavector | array

Input data, specified as a vector or array.

**Data Types: **`double`

| `single`

`p`

— Cumulative probabilitiesscalar | vector

Cumulative probabilities for which to compute the quantiles, specified as a scalar or vector of scalars from 0 to 1.

**Example: **0.3

**Example: **[0.25, 0.5, 0.75]

**Example: **(0:0.25:1)

**Data Types: **`double`

| `single`

`N`

— Number of quantilespositive integer

Number of quantiles to compute, specified as a positive integer. `quantile`

returns `N`

quantiles
that divide the data set into evenly distributed `N`

+1
segments.

**Data Types: **`double`

| `single`

`dim`

— Dimension positive integer

Dimension along which the quantiles of a matrix `X`

are requested, specified
as a positive integer. For example, for a matrix `X`

, when
`dim`

= 1, `quantile`

returns the quantile(s) of the
columns of `X`

; when `dim`

= 2,
`quantile`

returns the quantile(s) of the rows of
`X`

. For a multidimensional array `X`

, the length of
the `dim`

th dimension of `Y`

is the same as the length
of `p`

.

**Data Types: **`single`

| `double`

`vecdim`

— Vector of dimensionspositive integer vector

Vector of dimensions, specified as a positive integer vector. Each element of
`vecdim`

represents a dimension of the input array
`X`

. In the smallest specified operating dimension (that is,
dimension `min(vecdim)`

), the output `Y`

has length
equal to the number of quantiles requested (either `N`

or
`length(p)`

). In each of the remaining operating dimensions,
`Y`

has length 1. The other dimension lengths are the same for
`X`

and `Y`

.

For example, consider a 2-by-3-by-3 array `X`

with ```
p = [0.2
0.4 0.6 0.8]
```

. In this case, `quantile(X,p,[1 2])`

returns
an array, where each page of the array contains the 0.2, 0.4, 0.6, and 0.8 quantiles of
the elements on the corresponding page of `X`

. Because 1 and 2 are
the operating dimensions, with `min([1 2]) = 1`

and
`length(p) = 4`

, the output is a 4-by-1-by-3 array.

**Data Types: **`single`

| `double`

`method`

— Method for calculating quantiles`'exact'`

(default) | `'approximate'`

Method for calculating quantiles, specified as `'exact'`

or
`'approximate'`

. By default, `quantile`

returns
the exact quantiles by implementing an algorithm that uses sorting. You can specify
`'method','approximate'`

for `quantile`

to return
approximate quantiles by implementing an algorithm that uses T-Digest.

**Data Types: **`char`

| `string`

`Y`

— Quantilesscalar | array

Quantiles of a data vector or array, returned as a scalar or array for one or multiple values of cumulative probabilities.

If

`X`

is a vector, then`Y`

is a scalar or a vector with the same length as the number of quantiles requested (`N`

or`length(p)`

).`Y(i)`

contains the`p(i)`

quantile.If

`X`

is an array of dimension*d*, then`Y`

is an array with the length of the smallest operating dimension equal to the number of quantiles requested (`N`

or`length(p)`

).

A *multidimensional array* is
an array with more than two dimensions. For example, if `X`

is
a 1-by-3-by-4 array, then `X`

is a 3-D array.

A *nonsingleton dimension* of an array is a
dimension whose size is not equal to 1. A *first nonsingleton dimension*
of an array is the first dimension that satisfies the nonsingleton condition. For example,
if `X`

is a 1-by-1-by-2-by-4 array, then the third dimension is the first
nonsingleton dimension of `X`

.

Linear interpolation uses linear polynomials
to find *y _{i}* = f(

$$y=f(x)={y}_{1}+\frac{\left(x-{x}_{1}\right)}{\left({x}_{2}-{x}_{1}\right)}\left({y}_{2}-{y}_{1}\right).$$

Similarly, if the 1.5/*n* quantile is *y*_{1.5/n} and
the 2.5/*n* quantile is *y*_{2.5/n},
then linear interpolation finds the 2.3/*n* quantile *y*_{2.3/n} as

$${y}_{\frac{2.3}{n}}={y}_{\frac{1.5}{n}}+\frac{\left(\frac{2.3}{n}-\frac{1.5}{n}\right)}{\left(\frac{2.5}{n}-\frac{1.5}{n}\right)}\left({y}_{\frac{2.5}{n}}-{y}_{\frac{1.5}{n}}\right).$$

T-digest[2] is a probabilistic data structure that is a sparse representation of the empirical cumulative distribution function (CDF) of a data set. T-digest is useful for computing approximations of rank-based statistics (such as percentiles and quantiles) from online or distributed data in a way that allows for controllable accuracy, particularly near the tails of the data distribution.

For data that is distributed in different partitions, t-digest computes quantile estimates
(and percentile estimates) for each data partition separately, and then combines the
estimates while maintaining a constant-memory bound and constant relative accuracy of
computation ($$q(1-q)$$ for the *q*th quantile). For these reasons, t-digest is
practical for working with tall arrays.

To estimate quantiles of an array that is distributed in different partitions, first build a
t-digest in each partition of the data. A t-digest clusters the data in the partition and
summarizes each cluster by a centroid value and an accumulated weight that represents the
number of samples contributing to the cluster. T-digest uses large clusters (widely spaced
centroids) to represent areas of the CDF that are near

and uses small clusters (tightly spaced centroids) to represent areas of
the CDF that are near *q* =
0.5

or
*q* = 0

.*q* = 1

T-digest controls the cluster size by using a scaling function that maps a quantile
*q* to an index *k* with a compression parameter $$\delta $$. That is,

$$k(q,\delta )=\delta \cdot \left(\frac{{\mathrm{sin}}^{-1}(2q-1)}{\pi}+\frac{1}{2}\right),$$

where the mapping *k* is monotonic with minimum value *k*(0,*δ*) = 0 and maximum value *k*(1,*δ*) =
*δ*. The following figure shows the scaling function for *δ* = 10.

The scaling function translates the quantile *q* to the scaling factor
*k* in order to give variable size steps in *q*. As a
result, cluster sizes are unequal (larger around the center quantiles and smaller near

or *q* = 0

). The smaller clusters allow for better accuracy near the edges of the
data.*q* =
1

To update a t-digest with a new observation that has a weight and location, find the cluster closest to the new observation. Then, add the weight and update the centroid of the cluster based on the weighted average, provided that the updated weight of the cluster does not exceed the size limitation.

You can combine independent t-digests from each partition of the data by taking a union of the t-digests and merging their centroids. To combine t-digests, first sort the clusters from all the independent t-digests in decreasing order of cluster weights. Then, merge neighboring clusters, when they meet the size limitation, to form a new t-digest.

Once you form a t-digest that represents the complete data set, you can estimate the end-points (or boundaries) of each cluster in the t-digest and then use interpolation between the end-points of each cluster to find accurate quantile estimates.

For an *n*-element vector `X`

, `quantile`

computes quantiles by using a sorting-based algorithm as follows:

The sorted elements in

`X`

are taken as the (0.5/*n*), (1.5/*n*), ..., ([*n*– 0.5]/*n*) quantiles. For example:For a data vector of five elements such as {6, 3, 2, 10, 1}, the sorted elements {1, 2, 3, 6, 10} respectively correspond to the 0.1, 0.3, 0.5, 0.7, 0.9 quantiles.

For a data vector of six elements such as {6, 3, 2, 10, 8, 1}, the sorted elements {1, 2, 3, 6, 8, 10} respectively correspond to the (0.5/6), (1.5/6), (2.5/6), (3.5/6), (4.5/6), (5.5/6) quantiles.

`quantile`

uses Linear Interpolation to compute quantiles for probabilities between (0.5/*n*) and ([*n*– 0.5]/*n*).For the quantiles corresponding to the probabilities outside that range,

`quantile`

assigns the minimum or maximum values of the elements in`X`

.

`quantile`

treats `NaN`

s
as missing values and removes them.

[1] Langford, E. “Quartiles in
Elementary Statistics”, *Journal of Statistics Education*. Vol. 14, No.
3, 2006.

[2] Dunning, T., and O. Ertl. “Computing Extremely Accurate Quantiles Using T-Digests.” August 2017.

Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

`Y = quantile(X,p)`

and`Y = quantile(X,N)`

return the exact quantiles (using a sorting-based algorithm) only if`X`

is a tall column vector.`Y = quantile(__,dim)`

returns the exact quantiles only when*one*of these conditions exists:`X`

is a tall column vector.`X`

is a tall array and`dim`

is not`1`

. For example,`quantile(X,p,2)`

returns the exact quantiles along the rows of the tall array`X`

.

If

`X`

is a tall array and`dim`

is`1`

, then you must specify`'Method','approximate'`

to use an approximation algorithm based on T-Digest for computing the quantiles. For example,`quantile(X,p,1,'Method','approximate')`

returns the approximate quantiles along the columns of the tall array`X`

.`Y = quantile(__,vecdim)`

returns the exact quantiles only when*one*of these conditions exists:`X`

is a tall column vector.`X`

is a tall array and`vecdim`

does not include`1`

. For example, if`X`

is a 3-by-5-by-2 array, then`quantile(X,p,[2,3])`

returns the exact quantiles of the elements in each`X(i,:,:)`

slice.`X`

is a tall array and`vecdim`

includes`1`

and all the nonsingleton dimensions of`X`

. For example, if`X`

is a 10-by-1-by-4 array, then`quantile(X,p,[1 3])`

returns the exact quantiles of the elements in`X(:,1,:)`

.

If

`X`

is a tall array and`vecdim`

includes`1`

but does not include all the nonsingleton dimensions of`X`

, then you must specify`'Method','approximate'`

to use the approximation algorithm. For example, if`X`

is a 10-by-1-by-4 array, you can use`quantile(X,p,[1 2],'Method','approximate')`

to find the approximate quantiles of each page of`X`

.

For more information, see Tall Arrays (MATLAB).

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The

`'all'`

and`vecdim`

input arguments are not supported.The

`'Method'`

name-value pair argument is not supported.The

`dim`

input argument must be a compile-time constant.If you do not specify the

`dim`

input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see Automatic dimension restriction (MATLAB Coder).If the output

`Y`

is a vector, the orientation of`Y`

differs from MATLAB^{®}when all of the following are true:You do not supply

`dim`

.`X`

is a variable-size array, and not a variable-size vector, at compile time, but`X`

is a vector at run time.The orientation of the vector

`X`

does not match the orientation of the vector`p`

.

In this case, the output

`Y`

matches the orientation of`X`

, not the orientation of`p`

.

For more information on code generation, see Introduction to Code Generation and General Code Generation Workflow.

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

The

`'all'`

and`vecdim`

input arguments are not supported.The

`'Method'`

name-value pair argument is not supported.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

A modified version of this example exists on your system. Do you want to open this version instead?

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)