Example C: Tail Probabilities of Multivariate Normal Distribution.



Examples A and B make it clear that if we are trying to view uniform data over the hypercube $\Re^{10}$ most (spherical) neighborhoods will be empty! Let us examine what happens if the data follow the standard d-dimensional Normal distribution:

$$ f_d(\bf{x}) = (2\pi)^{-d/2}e^{-\bf{x}^T\bf{x}/2} $$

Clearly, the origin (mode) is the most IikeIy point and the equiprobable contours are spheres. Consider the spherical contour, $S_{0.01}(\bf{x})$, where the density value is only 1% of the value at the mode. Now

$$ \frac{f(\bf{x})}{f(\bf{0})}\ = e^{-\bf{x}^T\bf{x}/2} $$ and $$ -2 log
                                                \frac{f(\bf{x})}{f(\bf{0})}\ = \sum_{i=1}^d {x_i}^2 \sim \chi^2(d) $$

therefore, the probability that a point is within the 1% spherical contour may be computed by

$$ Pr(\frac{f(\bf{x})}{f(\bf{0})}\  \geq  1/100) = Pr(\chi^2(d)  \leq -2log(1/100))  $$

The last equation gives the probability a random point will not fall in the “tails” or, in other words, will fall in the medium- to high-density region. Around 5 or 6 dimensions, the probability mass of a multivariate Normal begins a rapid migration into the extreme tails. In fact, more than half of the probability mass is in a very low density region for 10-dimensional data! Silverman (1986) has dramatized this in 10 dimensions by noting that $$ Prob( \mid\mid x \mid\mid \geq 1.6) = 0.99 $$. In very high dimensions, virtually the entire sample will be in the tails!

% Workspace Initialization.
clc; clear; close all;

Monte Carlo Estimation of Multivariate Normal Tail/Body Probabilities

NumOfRuns = 1e4;       % Number of Monte Carlo runs.
Dimensions = 1:20;     % Problem dimensions.
LD = length(Dimensions);
counter_tails = zeros(1,LD);
counter_body = zeros(1,LD);
PDF_Values = zeros(NumOfRuns,LD);

for i=1:LD
     dim = Dimensions(i);
     % Multivariable normal numbers generation:
     M = zeros(dim,1);   % Mean vector.
     SIGMA = eye(dim);   % Covariance Matrix.
     NormalSamples = mvnrnd(M,SIGMA,NumOfRuns);   % Generate random points following the n-dimensional
                                                  % standard normal distribution.
     max_valuePDF = (2*pi)^(-dim/2);

     for k=1:NumOfRuns
         PDF_Values(k,i) = mvnpdf(NormalSamples(k,:).',M,SIGMA);
         % Every point x with pdf(x) < 0.01*max(pdf) is considered to belong to
         % the tail of the multivariate Normal distribution.
          if PDF_Values(k,i)<0.01*max_valuePDF
             counter_tails(i) = counter_tails(i) + 1;
             counter_body(i) = counter_body(i) + 1;

Plot the Results

Plot the fraction of datapoints in the tails found by Monte Carlo simulation:

figure('Position',[100 100 900 400]);
hold on;
ylabel('Fraction of Total Generated Samples in Tails');
grid on;
xlim([Dimensions(1) Dimensions(end)]);
title('Analytically Calculated (blue circles) and Monte Carlo Estimated (red stars)');

% Plot pdf values of randomly generated points:
[xx, yy] = meshgrid(Dimensions,1:NumOfRuns);
shading interp;
title('PDF Value at Randomly Generated Points');
zlabel('Normal N-Dim PDF Value');
ylim([1 NumOfRuns]);


% 1) The low multidimensional normal pdf values happen because the following relation holds:
%                                  n
%            N-dim_NormalPDF(X) = Prod(one-dim.normalpdf(x_i))
%                                 i=1
% This happens because the normal random variables x_i of this example are independent.
% And this is the case because the covariance matrix sigma is diagonal.

% 2) You may plot the fraction of datapoints in the body with red stars:
% figure(2);
% stem(Dimensions,counter_body/NumOfRuns,'r*');
% grid on;
% xlabel('Dimension');
% ylabel('Fraction of Total Generated Samples in Body');
% xlim([Dimensions(1) Dimensions(end)]);