Resample with replacement - bootstrap Kolmogorov–Smirnov test

Good afternoon,
I am trying to apply the bootstrap Kolmogorov–Smirnov test developed by Abadie (2002) to test if two series of results are from the same continuous distribution.
The first step of the methodology is the application of the ks test to the original series of results:
[ks_h,ks_p] = kstest2(results_1, results_50);
Then I need to resample n observations (in results_1 and in results_50) with replacement. I took a look at some Matlab built in functions but it seems to me that none of them do exactly what is proposed by Abadie (2002). The bootstrp function returns the result of some calculation (bootfun argument)made with the resampled series (and I just want the resampled series). The datasample function returns a subsample (and that is not what I need).
How can I resample n observations with replacement without using this kind of built in functions? Please find attached the paper with the described methodology.
Thanks in advance!

 Accepted Answer

"How can I resample n observations with replacement without using this kind of built in functions?"
Those function can be useful for your needs but to resample your data with replacement all you need is randi().
nResamp = 1000; % number of resamples; use numel(results_1) to equate sample size.
results_1_resamp = results_1(randi(numel(results_1),1,nResamp));
results_50_resamp = results_50(randi(numel(results_50),1,nResamp));
*Assuming results_1 and _50 are vectors.

6 Comments

This code returns just one vector for each line of resampling. In fact what I need is to resample the results several times (number of bootstraps) and obtain one vector with the same size of the original vector of results for each bootstrap. Using the same logic a may need a loop to do that.
Sure, you can put that in a loop and ensure the population size doesn't change.
nBootstraps = 1000; %number of bootstraps
ks_h = nan(nBootstraps,1);
ks_p = ks_h;
for i = nBootstraps
results_1_resamp = results_1(randi(size(results_1));
results_50_resamp = results_50(randi(size(results_50));
[ks_h(i),ks_p(i)] = kstest2(results_1_resamp, results_50_resamp);
end
I did a similar code and it works as well:
nbootstrap = 10;
ks_h_resamp = zeros(nbootstrap,1);
ks_p_resamp = zeros(nbootstrap,1);
for i=1:nbootstrap
i;
select = randi(size(results_1),size(results_1),1); % select can be used for both results since their size is the same
results_1_resamp = results_1(select);
results_50_resamp = results_50 (select);
[ks_h_resamp(i),ks_p_resamp(i)] = kstest2 (results_1_resamp, results_50_resamp);
end
Thank you for your help.
This isn't the correct method.
Using the same random index for both variables introduces a correlation that you definitely do not want to have.
The idea behind bootstrapping is that you're testing many different combinations of data based on your real data. But if you're using the same random index for both variables, the data will always be paired and correlated. Use different random indices for the two variables.
I already have changed that.
Thank you!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!