How to use bootstrap for something other than mean
Show older comments
I am trying to use the bootstrap function for counts as opposed to the mean. The following is an example of my data:
10 blue
13 red
9 green
20 yellow
2 orange
total: 54
As you can see, I don't have just one variable with different values, but multiple counts of different variables. I'm tasked with trying to find the statistical significance of this data. For example, yellow is the most common color, but is it statistically significant? Can I use the bootstrap function for this type of problem?
I appreciate any help anyone can give me.
2 Comments
Walter Roberson
on 31 May 2011
Everything is statistically significant -- to a sufficiently low confidence level.
Joe
on 2 Jun 2011
Answers (2)
Walter Roberson
on 2 Jun 2011
2 votes
You don't tell us whether the counts are IID (Independent and Identically Distributed). If they are, then a Poisson process would seem the most suitable. Take the raw numbers as a list and do a standard estimation of mean and standard deviation under the assumption of Poisson. You can then calculate the number of standard deviations above the mean that the count for orange is, and do an inverse gaussian calculation to find the probability.
This is, of course, subject to change if the values are not IID or if Poisson is for some reason not suitable.
1 Comment
Walter Roberson
on 2 Jun 2011
For a fixed confidence, calculate the corresponding number of standard deviations. Then having estimated the mean and stand deviation, you can calculate the upper and lower tail boundaries. Any count above the upper tail or below the lower tail meets that significance.
Yaman
on 2 Jun 2011
0 votes
You probably should do a hypothesis test. As Walter said everything can be significant depending on your CI.
Bootstrap can be used if you want but you don't have to. You can just use a Random variable generator to simulate and creating an inverse CDF for the probabilities.
4 Comments
Yaman
on 2 Jun 2011
Well what I can think right off the bat would be for bootstrap
matrix = [ones(1,10), 2*ones(1,13), 3*ones(...),...)
then sample from the matrix
randsample(matrix,n,1)
where n is the amount you want to sample, 1 for replacement.
Now don't histogram this before actually making the values as counted values.
so numel(find(output,1)) etc.
then plot and hypo test. This is your assignment so that is all I can provide.
Walter Roberson
on 2 Jun 2011
Yaman, did you mean the find to locate the first non-zero, or did you mean find(output==1) ? Doing the numel(find==1) and so on would be like using
histc(output,1:NumberOfClasses)
but then I get confused because you suggested _not_ to histogram it?
Yaman
on 2 Jun 2011
yea I just realized it too, so it should be like:
output = randsample(matrix,n,1)
result(1) = numel(find(output==1))
I believe histograming as is would result in a continuous histogram but this problem is not continuous since 1.35 wouldn't refer to any color.
That's why plotting should be done such as:
bar(1:5,result), if you are not interested in frequency then
bar(1:5,result/n)
Walter Roberson
on 2 Jun 2011
histc() does not draw the histogram, and is better suited for defining hard edges like in this case. The result of histc() can be bar()'d if one wants.
Categories
Find more on Noncentral t Distribution in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!