skewness and kurtosis of a weighted distribution
43 views (last 30 days)
I cannot find an analytical formula for calculating the skewness of a weighted distribution. I know the formulae for the weighted mean and weighted variance, but I don't think simply plugging these into the formula for the non-weighted skewness would be equivalent to a weighted skewness.
And for that matter, the weighted kurtosis.
John D'Errico on 15 Jul 2021
Edited: John D'Errico on 15 Jul 2021
Sorry for being late to the party, but I had to answer this question once I saw an answer posted that does not actually help.
This is not as difficult as it may seem. We can logically work it out, despite that I've not looked this up online to find the formulas. Just working from basic principles here.
First, what is the formula for a weighted mean? What does a weighted mean tell us anyway? Assume we have a set of numbers, and a list of weights, one weight for each value in our set.
n = 1e7;
x = randn(1,n); % sorry for not being very creative.
w = 1 + (x > 0); % again, not that creative
I've set every number greater than 0 to have a weight of 2 here. I'm just being totally arbitrary in this, because this is a problem I can solve in theory so we can check at least some results.
% the basic, unweighted mean should be 0, pretty closely given
% this large of a set. The other parameters are as I would expect for a
% N(0,1) normally distributed set.
format long g
Now given the way I've formulated the weights, I will expect the weighted mean in this case to be close to 0.6. (I could probably compute the theoretical value for this problem, but I'm feeling too lazy right now.) But how do I compute the weighted mean? That part is simple. Sum the product of the weights with the values they are associated with. Then divide that sum by the pure sum of the weights.
wmean = @(x,w) sum(x.*w)/sum(w);
That formula is pretty simplistic. I did not do some important things, like check for negative weights, or verify the vectors are the same lengths. So sue me.
Oh, what the heck. Since I know the set is normally distributed, we know what the mean of a half normal random variable is. (Well, I'll look that up...)
Thus, a unit half Normal has mean sqrt(2)/sqrt(pi). And then the weighted mean in this case should be...
sqrt(2)/sqrt(pi)*(-1 + 2)/3
So the weighted mean is pretty close to my theoretical prediction. It tells us that I (probably) did it correctly. We can use the same idea to compute a weighted sample variance.
wvar = @(x,w) sum(w.*(x - wmean(x,w)).^2)/sum(w);
Logically though, we need to have a factor of N/(N-1) in there, much as the sample variance is biased otherwise.
wvar = @(x,w) sum(w.*(x - wmean(x,w)).^2)/sum(w)*length(x)/(length(x) - 1);
A quick test just to make sure that I did nothing wrong in there, is to see that the UNWEIGHTED sample variance is exactly what var(x) gave me before.
It agrees down to the last decimal place. (WHEW!) And now the weighted variance of our sample is:
Yes, I know, with a little effort, I could give you the theoretical value for this specific test problem, but this response is getting really long for a problem that is now 10 years old.
But the sample skewness and Kurtosis are both now simple enough. First, we compute the weighted central 3rd and 4th moments of the sample. They will look like the variance. The skewness and kurtosis are computed from those central moments, by dividing by an appropriate power of the variance.
mom3 = @(x,w) sum(w.*(x - wmean(x,w)).^3)/sum(w);
mom4 = @(x,w) sum(w.*(x - wmean(x,w)).^4)/sum(w);
Yeah, I know I was being sloppy there, since that extra factor is not in there for the bias. Someone else can give a better formula, or maybe if I have some energy one day.
wskew = @(x,w) mom3(x,w)/wvar(x,w)^1.5;
wkurt = @(x,w) mom4(x,w)/wvar(x,w)^2;
First, let me check that I got the unweighted skewness and Kurtosis correct...
They seem to be pretty good, but not quite exact. I think that factor of N/(N-1) may be my problem.
Note that some people like to subtract 3 from the Kurtosis as I recall. (In that case, the word excess is applied there.) Anyway, the weighted skewness and Kurtosis for this set is:
After the fact, here are some links you might want to read. I hope I got it right.
Anyway, this should get you pretty close.