Note: Use only in the MuPAD Notebook Interface. This functionality does not run in MATLAB. |
Measures of central tendency locate a distribution of data along an appropriate scale. There are several standard measures of central tendency. Knowing the properties of a particular data sample (such as the origin of the data sample and possible outliers and their values) can help you choose the most useful measure of central tendency for that data sample. MuPAD^{®} provides the following functions for calculating the measures of central tendency:
The stats::modal
function
returns the most frequent value of a data sample and the number of
occurrences of that value.
The stats::mean
function
calculates the arithmetic mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The stats::quadraticMean
function
calculates the quadratic mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The stats::median
function
returns the element x_{n} of
a sorted data sample x_{1}, x_{2}, ..., x_{2n}.
The stats::geometricMean
function
calculates the geometric mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The stats::harmonicMean
function
calculates the harmonic mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The arithmetic average is a simple and popular measure of central tendency. It serves best for data samples that do not have significant outliers. Unfortunately, outliers (for example, data-entry errors or glitches) exist in almost all real data. The arithmetic average and quadratic mean are sensitive to these problems. One bad data value can move the average away from the center of the rest of the data by an arbitrarily large distance. For example, create the following two lists of entries that contain only one outlier. The outlier is equal to 100 in the first list and to 1 in the second list:
L := [1, 1, 1, 1, 1, 100.0]: S := [100, 100, 100, 100, 100, 1.0]:
The stats::modal
function
shows that the most frequent entry of the first list is 1. The most
frequent entry of the second list is 100. A most frequent entry appears
in each list five times:
modalL = stats::modal(L); modalS = stats::modal(S)
If the value of the outlier is large, the outlier can significantly move the mean and the quadratic mean away from the center:
meanL = stats::mean(L); quadraticMeanL = stats::quadraticMean(L)
Large outliers affect the geometric mean and the harmonic mean less than they affect the simple arithmetic average. Nevertheless, both geometric and harmonic means are also not completely resistant to outliers:
geometricMeanL = stats::geometricMean(L); harmonicMeanL = stats::harmonicMean(L)
If the value of the outlier is small, the impact on the mean of a data set is less noticeable. Quadratic mean can effectively mitigate the impact of a few small outliers:
meanS = stats::mean(S); quadraticMeanS = stats::quadraticMean(S)
The small outlier significantly affects the geometric and harmonic
means computed for the list S
:
geometricMeanS = stats::geometricMean(S); harmonicMeanS = stats::harmonicMean(S)
The median is usually resistant to both large and small outliers:
medianL = stats::median(L); medianS = stats::median(S)
For data samples that contain an even number of elements, MuPAD can
use two definitions of the median. By default, stats::median
returns the n/2
-th
element of a sorted data sample:
z := [1, 1, 1, 100, 100, 100]: medianZ = stats::median(z)
When you use the Averaged
option, stats::median
returns
the arithmetic average of the two central elements of a sorted data
sample:
z := [1, 1, 1, 100, 100, 100]: medianZ = stats::median(z, Averaged)
Nevertheless, the median is not always the best choice for measuring central tendency of a data sample. For example, the following data sample distribution has a step in the middle:
z := [1, 1, 1, 2, 100, 100, 100]: medianZ = stats::median(z)