Significance of correlation in large datasets
Show older comments
Hey everyone,
I have a dataset consisting of more than N=160k samples and would like to quantify the correlation of two variables within that. The correlation is 0.554 (r^2=0.31). As to the significance test (t-test), since the p-value scales with sqrt(n) it will always be large - does that mean that the correlation coefficient is always significant for large datasets? Or does the concept of the t-test at all make sense here?
Thanks for help!
Dennis
2 Comments
It doesn't scale linearly. It is asymptotic. More data are generally good. It just means that the test is more robust to individual outliers.
Von Duesenberg
on 20 Dec 2016
In practice, because of the very nature of statistical inference, you easily get significant p-values with large datasets. What matters most is whether, as an expert in your own field, you think the magnitude of the correlation coefficient is sufficient for your goals.
Answers (0)
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!