File Exchange HZmvntest

version 1.1.0.0 (5.4 KB) by Antonio Trujillo-Ortiz

Antonio Trujillo-Ortiz (view profile)

Henze-Zirkler's Multivariate Normality Test.

Updated 01 Jan 2009

Henze and Zirkler (1990) introduce a multivariate version of the univariate There are many tests for assessing the multivariate normality in the statistical literature (Mecklin and Mundfrom, 2003). Unfortunately, there is no known uniformly most powerful test and it is recommended to perform several test to assess it. It has been found that the Henze and Zirkler test have a good overall power against alternatives to normality.

The Henze-Zirkler test is based on a nonnegative functional distance that measures the distance between two distribution functions: the characteristic function of the multivariate normality and the empirical characteristic function.

The Henze-Zirkler statistic is approximately distributed as a lognormal. The lognormal distribution is used to compute the null hypothesis probability.

According to Henze-Wagner (1997), this test has the desirable properties of,

--affine invariance
--consistency against each fixed nonnormal alternative distribution
--asymptotic power against contiguous alternatives of order n^-1/2
--feasibility for any dimension and any sample size

If the data is multivariate normality, the test statistic HZ is approximately lognormally distributed. It proceeds to calculate the mean, variance and smoothness parameter. Then, mean and variance are lognormalized and the P-value is estimated.

Also, for all the interested people, we provide the lognormal critical value.

Inputs:
X - data matrix (size of matrix must be n-by-p; data=rows,
independent variable=columns)
c - covariance normalized by n (=1, default)) or n-1 (~=1)
alpha - significance level (default = 0.05)

Output:
- Henze-Zirkler's Multivariate Normality Test

Pei Yan

Pei Yan (view profile)

Antonio Trujillo-Ortiz

Antonio Trujillo-Ortiz (view profile)

An actual Dr. Henze email address is Henze@kit.edu

Prof. Antonio Trujillo-Ortiz

Michael Jimmel

Michael Jimmel (view profile)

Dear Prof. Antonio Trujillo-Ortiz,

Thanks for your quick reply. It was helpful. Now I understand. We need to get the LN-mean and LN-standard-deviation in order to get the mean ('mu') and standard deviation ('sqrt(si2)' ) from LNRVs.

Thank you very much/Muchos gracias,

Mike

Antonio Trujillo-Ortiz

Antonio Trujillo-Ortiz (view profile)

Dear Michael,
Thanks for your interest of this m-file. As you know, we only generated the Matlab computational algorithm from the original published paper. If you have any mathematical or statistical fundamentals inquiry you must refer to the author(s). I give you the Dr. Norbert Henze's email address
henze@stoch.uni-karlsruhe.de
and/or
N.Henze@math.uni-karlsruhe.de

However, we don't need the lognormal cdf (logncdf) neither the mean and variance of the lognormal distribution (longnstat) functions. For the used mean (mu) and variance (si2), which are the Henze-Zirkler mean and variance, must, as the author establish, to be converted to the lognormal Henze-Zirkler mean and variance. As you can see, using the provied Iris data example. The mean(mu)=0.7635 and variance(si2)=0.0112. With a Henze-Zirkler lognormal mean: -0.279408 and Henze-Zirkler lognormal variance: 0.1379069. But, if you try to use te mu and si2 values by the longnstat function, you get the incorrect lognormal values of 2.1459 and 5.7768e-004, respecively.

Yours,

Prof. Antonio Trujillo-Ortiz

Michael Jimmel

Michael Jimmel (view profile)

Dear Sir Antonio,

Could you explain/describe the mean and standard deviation arguments to the Log normal cdf? Why not use 'mu' and 'sqrt(si2)' directly?

Thanks/Gracias,

Mike

Antonio Trujillo-Ortiz

Antonio Trujillo-Ortiz (view profile)

Dear Johan,

Your valuable comment (15-12-08)for this m-file runs much faster was taken into account. We thank you.
Antonio

Johan

Johan (view profile)

Nice implemetation and sample data to test the file on. However, it runs a bit slow. I recompted the variable Djk by avoiding loops and it runs much faster:
Djk = - 2*Y' + diag(Y')*ones(1,n) + ones(n,1)*diag(Y')';
/J.D.