# Bootstrap correlation of two arrays

17 views (last 30 days)
Marianna on 18 Jul 2018
Commented: Adam Danz on 18 Jul 2018
I want to correlate two arrays (A and B) from a medical image. I expect a high correlation since they come from the same patient (acquired twice in the same session).
[rho, p] = corr(A(:), B(:)) gives me rho = 0.8321 but p = 0.1255 so the correlation is not significant.
I have read that an approach could be a bootstrap analysis and did something like:
rho_boot = bootstrp(1000,'corr',A,B) resulting with a distribution of 1000 rho values.
The question is: can I consider mean(rho_boot(:)) my new rho value? I have also read on mathworks that "(...)this evidence does not require any strong assumptions about the probability distribution of the correlation coefficient."
In fact, I have lost track of my p value.

Adam Danz on 18 Jul 2018
Edited: Adam Danz on 18 Jul 2018
This smells like p-hacking which is bad science. IMHO you are approaching the problem backward. Statistical tests are chosen based on the questions you're asking and the type of data available to address those questions. Statistical tests should not be chosen based on the result of the p-value -- that's wrong wrong wrong.
To address your question, variables A and B are a tiny set of data that represent the real world variables A_real and B_real. A_real and B_real have an exact correlation which we'll likely never know because we'll almost never collect all the data from A_real and B_real. All we've got is a tiny slice, A & B. Unless you have good reason to believe an element of A and B is the result of a bad measurement, you should use all of your data to calculate the correlation.
The bootstrapping method subsamples the data many times and this will be helpful to produce confidence intervals on your correlation value so you not only have rho but you'll measure uncertainty as well. This, in addition to your p value will help you interpret the significance of the results. Your rho_boot might differ slightly from your correlation of the entire A and B data but it shouldn't differ much. If it does, it's likely that the p is > 0.05 and your confidence intervals are large (ie, your data are noisy).
To answer your question, use all of your data to calculate the correlation if that's the statistic you've chosen. If the correlation value is not as you expected, either 1) your expectations were wrong or 2)something's wrong in how you collected or analyzed your data or 3)you don't have enough data.
Lastly, regarding "(...)this evidence does not require any strong assumptions about the probability distribution of the correlation coefficient.", bootstrapping is a nonparametric test and does not require that your data form a normal distribution (as does the interpretation of standard deviations and other similar stat tests). This is due to the CLT .
##### 2 CommentsShowHide 1 older comment
Adam Danz on 18 Jul 2018
My pleasure.
Unlike p values, there isn't a set of guidelines to decide if CIs are too big. Instead, you have to look at the data, interpret it with a critical eye, and decide how much uncertainty you're comfortable with.
More importantly, you can use the CIs to answer the questions you're asking. For example, is the mean of A significantly different from B? To answer that with CIs, you just need to determine if the CIs of A and B overlap. If they don't overlap, you can conclude they significantly differ. Another example, is the median of C significantly different from 0? That is answered by determining if the the CIs of C overlap with 0.
It's very likely that your CIs (and p value) will change as you collect more data (they could increase or decrease). More importantly, you will approach the actual values of the true correlation as you get more reliable data. That being said, it's bad science to keep collecting data until you get the statistical results you hope for.