Convolution of log-transformed random variables

2 views (last 30 days)
BACKGROUND I have a random variable RV3 = RV1 + RV2 and i would like to determine p(RV3) by convolution, i.e. p(RV3) = conv(p(RV1), p(RV2)). Importantly, p(RV1) and p(RV2) are very skewed with extremely long tails. The set up is thus as follows:
X=linspace(0,1e5,1e6) % domain of Y1 and Y2
Y1=normhist(RV1,X) % pdf 1 (sum==1)
Y2=normhist(RV2,X) % pdf 2 (sum==1)
Y3=conv(Y1,Y2) % convolution, sum==1 if everything goes fine
PROBLEM Due to the skewness it is hard/impossible to choose an X that always works ("works" in this context means that sum(Y3)==1.00): i need both a fine spacing and a huge domain (for those interested: the RVs are likelihood ratios; about half the data is in [0,1] and the other half in [0,Inf)). For reasons that are not worth going into here, renormalizing after conv() is not an option.
QUESTION Since log(RV1) and log(RV2) are roughly normally distributed, I probably wouldn't get numerical problems if i could do the convolution on log-transformed RVs and then somehow "correct" for the log transform after the convolution, i.e.,
X_log=logspace(-10,10,1e5);
Y1_log=normhist(log(RV1),X_log);
Y2_log=normhist(log(RV2),X_log);
Y3_log=conv(Y1_log,Y2_log);
Y3_trans=do_some_magic(Y3_log)
So the essence of my question is: does a function do_some_magic() exist such that Y3_trans is identical to Y3 in the first snippet of code (but without the numerical problems)?
PS: Other solutions are welcome too of course PPS: I would also be happy knowing the distribution of log(RV3) instead

Answers (1)

Matt J
Matt J on 7 Sep 2015
Edited: Matt J on 7 Sep 2015
I think we need to scrutinize the premise of the question a little more closely. I don't really understand why you are not reliably getting sum(Y3)=1.00. Maybe it has something to do with the fact that I'm using FFT-based convolution, but even with 1e7 elements and uniformly randomized Y1, Y2 (can't get heavier tails than that), the convolution output is well-normalized inherently:
>> Y1=rand(1,1e7); Y1=Y1/sum(Y1); Y2=rand(1,1e7); Y2=Y2/sum(Y2);
>> Y3=ifft(fft(Y1,2e7).*fft(Y2,2e7),1e7,'symmetric');
>> sum(Y3)-1
ans =
-1.3656e-14
  2 Comments
Ronald van den Berg
Ronald van den Berg on 8 Sep 2015
Edited: Ronald van den Berg on 8 Sep 2015
Matt, thanks for your response. We seem to have different definitions of "heavy tails". In your example, Y1 and Y2 only take values in [0,1], but what i meant was a distribution in which most of the probability mass is at low values (e.g. range [0,1]) but also some at extremely high values (likelihood ratios can easily go to Inf). E.g., Y1=Y2=hist(lognrnd(3,7,1,1e5),X).
To clarify matters, what i really need is a function Y3 = do_conv(X,Y1,Y2) with the following properties:
  • Y1 and Y2 are probability distributions with long tails, discretized (binned) on X and normalized to 1
  • Y3 is the convolution of these distributions, normalized and on the same domain X
Even when using fft-based convolution, it seems impossible to choose a proper choice for X, because:
  • X needs to cover 0 to near-infinity
  • X needs to be finely spaced, because large part of the probability mass is in [0,1]
If it's possible at all, i think they only way would be by intermediately either working on log-transformed Y1 and Y2 or using log-spaced X instead of linearly, but i don't see how this can be done :(
Matt J
Matt J on 8 Sep 2015
Edited: Matt J on 8 Sep 2015
Could you attach a .mat file with an example of data that "doesn't work"?

Sign in to comment.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!