Documentation |
Measures of central tendency locate a distribution of data along an appropriate scale. There are several standard measures of central tendency. Knowing the properties of a particular data sample (such as the origin of the data sample and possible outliers and their values) can help you choose the most useful measure of central tendency for that data sample. MuPAD^{®} provides the following functions for calculating the measures of central tendency:
The stats::modal function returns the most frequent value of a data sample and the number of occurrences of that value.
The stats::mean function calculates the arithmetic mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The stats::quadraticMean function calculates the quadratic mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The stats::median function returns the element x_{n} of a sorted data sample x_{1}, x_{2}, ..., x_{2n}.
The stats::geometricMean function calculates the geometric mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The stats::harmonicMean function calculates the harmonic mean
of a data sample x_{1}, x_{2}, ..., x_{n}.
The arithmetic average is a simple and popular measure of central tendency. It serves best for data samples that do not have significant outliers. Unfortunately, outliers (for example, data-entry errors or glitches) exist in almost all real data. The arithmetic average and quadratic mean are sensitive to these problems. One bad data value can move the average away from the center of the rest of the data by an arbitrarily large distance. For example, create the following two lists of entries that contain only one outlier. The outlier is equal to 100 in the first list and to 1 in the second list:
L := [1, 1, 1, 1, 1, 100.0]: S := [100, 100, 100, 100, 100, 1.0]:
The stats::modal function shows that the most frequent entry of the first list is 1. The most frequent entry of the second list is 100. A most frequent entry appears in each list five times:
modalL = stats::modal(L); modalS = stats::modal(S)
If the value of the outlier is large, the outlier can significantly move the mean and the quadratic mean away from the center:
meanL = stats::mean(L); quadraticMeanL = stats::quadraticMean(L)
Large outliers affect the geometric mean and the harmonic mean less than they affect the simple arithmetic average. Nevertheless, both geometric and harmonic means are also not completely resistant to outliers:
geometricMeanL = stats::geometricMean(L); harmonicMeanL = stats::harmonicMean(L)
If the value of the outlier is small, the impact on the mean of a data set is less noticeable. Quadratic mean can effectively mitigate the impact of a few small outliers:
meanS = stats::mean(S); quadraticMeanS = stats::quadraticMean(S)
The small outlier significantly affects the geometric and harmonic means computed for the list S:
geometricMeanS = stats::geometricMean(S); harmonicMeanS = stats::harmonicMean(S)
The median is usually resistant to both large and small outliers:
medianL = stats::median(L); medianS = stats::median(S)
For data samples that contain an even number of elements, MuPAD can use two definitions of the median. By default, stats::median returns the n/2-th element of a sorted data sample:
z := [1, 1, 1, 100, 100, 100]: medianZ = stats::median(z)
When you use the Averaged option, stats::median returns the arithmetic average of the two central elements of a sorted data sample:
z := [1, 1, 1, 100, 100, 100]: medianZ = stats::median(z, Averaged)
Nevertheless, the median is not always the best choice for measuring central tendency of a data sample. For example, the following data sample distribution has a step in the middle:
z := [1, 1, 1, 2, 100, 100, 100]: medianZ = stats::median(z)