Significance of correlation in large datasets

Hey everyone,
I have a dataset consisting of more than N=160k samples and would like to quantify the correlation of two variables within that. The correlation is 0.554 (r^2=0.31). As to the significance test (t-test), since the p-value scales with sqrt(n) it will always be large - does that mean that the correlation coefficient is always significant for large datasets? Or does the concept of the t-test at all make sense here?
Thanks for help!
Dennis

2 Comments

It doesn't scale linearly. It is asymptotic. More data are generally good. It just means that the test is more robust to individual outliers.
In practice, because of the very nature of statistical inference, you easily get significant p-values with large datasets. What matters most is whether, as an expert in your own field, you think the magnitude of the correlation coefficient is sufficient for your goals.

Sign in to comment.

Answers (0)

Asked:

on 20 Dec 2016

Commented:

on 20 Dec 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!