How can I determine if my data follows a lognormal distribution?

My x-data includes arrival time for cells and my y-data includes their velocities. How can I determine if this data set follows a lognormal distribution?
I've already tried QQ-plots and histograms, but am utterly lost on how to approach this.
Thanks.

 Accepted Answer

If all your data are positive, that’s a good start. I’m not sure what you’re studying, but I always associate ‘arrival times’ with the Poisson distribution (that ‘looks’ a lot like the lognormal distribution). The velocities may well be lognormally distrbuted. I suggest using the histfit function for both. Another option is to use the chi2gof function to perform a Chi-square goodness-of-fit test.

10 Comments

Yes, all my data is positive. Using the histfit function that you suggested, it looks like both my x-data and my y-data have lognormal distributions (independently of each other). However, that does not mean that they have the same normal distribution, correct? How can I determine this?
Thank you for your help.
My pleasure!
There are a few ways of determining if they have the same distribution. (You have already determined they are not normally distributed, so I don’t know what you mean by ‘the same normal distribution’. I assume you mean ‘the same distribution’.)
One way is to fitdist and then paramci. If the respective parameter confidence intervals don’t overlap, they don’t share the same distribution parameters.
Another way is to do the Chi-squared goodness-of-fit test on the distributions of the two data sets (arrival time and velocity). That would be my choice.
Can't you just take the log of your x data and the log of your y data? Then if the means and standard deviations of the log of the data are "the same" or pretty close then they have pretty much the same distribution.
@Image Analyst — That’s the idea behind my suggestion to use fitdist abd paramci. I would certainly agree with your approach if more statistically robust methods weren’t so easily available in the Statistics Toolbox.
Yes, it's always better to use fully tested, debugged, and validated methods if you have them. I've added the Statistics Toolbox to the products list above.
The problem in log-transforming the data and then taking the mean and variance (and computing the standard errors of the estimates from them to use to determine if the two distributions are significantly different) is that it transforms the errors from additive to multiplicative. This skews the distribution of the errors.
The chi-squared test is easy enough to program, although the chi-squared distribution (to estimate the p-value) is somewhat more challenging.
I overlooked adding the product tag. Thanks!
Thank you both for your help!
I gave the fitdist and then paramci method a go.
For the arrival times (my x-data), fitdist gave me:
Lognormal distribution
mu = 6.92077 [6.8758, 6.96574]
sigma = 0.87264 [0.841985, 0.905628]
For the velocities (my y-data), fitdist gave me:
Lognormal distribution
mu = 3.66801 [3.61411, 3.7219]
sigma = 1.04586 [1.00912, 1.08539]
Since the parameter confidence intervals do not overlap, I can assume that my x-data and y-data do not share the same distribution parameters, like Star Strider said, correct?
*paramci gave me the same values as fitdist did in the square brackets, so I did not repeat them here.

Sign in to comment.

More Answers (0)

Tags

Asked:

on 22 Jun 2014

Commented:

on 28 Jun 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!