How to use bootstrap for something other than mean

I am trying to use the bootstrap function for counts as opposed to the mean. The following is an example of my data:
10 blue
13 red
9 green
20 yellow
2 orange
total: 54
As you can see, I don't have just one variable with different values, but multiple counts of different variables. I'm tasked with trying to find the statistical significance of this data. For example, yellow is the most common color, but is it statistically significant? Can I use the bootstrap function for this type of problem?
I appreciate any help anyone can give me.

2 Comments

Everything is statistically significant -- to a sufficiently low confidence level.
say I want it to the 95% confidence level.

Sign in to comment.

Answers (2)

You don't tell us whether the counts are IID (Independent and Identically Distributed). If they are, then a Poisson process would seem the most suitable. Take the raw numbers as a list and do a standard estimation of mean and standard deviation under the assumption of Poisson. You can then calculate the number of standard deviations above the mean that the count for orange is, and do an inverse gaussian calculation to find the probability.
This is, of course, subject to change if the values are not IID or if Poisson is for some reason not suitable.

1 Comment

For a fixed confidence, calculate the corresponding number of standard deviations. Then having estimated the mean and stand deviation, you can calculate the upper and lower tail boundaries. Any count above the upper tail or below the lower tail meets that significance.

Sign in to comment.

You probably should do a hypothesis test. As Walter said everything can be significant depending on your CI.
Bootstrap can be used if you want but you don't have to. You can just use a Random variable generator to simulate and creating an inverse CDF for the probabilities.

4 Comments

Well what I can think right off the bat would be for bootstrap
matrix = [ones(1,10), 2*ones(1,13), 3*ones(...),...)
then sample from the matrix
randsample(matrix,n,1)
where n is the amount you want to sample, 1 for replacement.
Now don't histogram this before actually making the values as counted values.
so numel(find(output,1)) etc.
then plot and hypo test. This is your assignment so that is all I can provide.
Yaman, did you mean the find to locate the first non-zero, or did you mean find(output==1) ? Doing the numel(find==1) and so on would be like using
histc(output,1:NumberOfClasses)
but then I get confused because you suggested _not_ to histogram it?
yea I just realized it too, so it should be like:
output = randsample(matrix,n,1)
result(1) = numel(find(output==1))
I believe histograming as is would result in a continuous histogram but this problem is not continuous since 1.35 wouldn't refer to any color.
That's why plotting should be done such as:
bar(1:5,result), if you are not interested in frequency then
bar(1:5,result/n)
histc() does not draw the histogram, and is better suited for defining hard edges like in this case. The result of histc() can be bar()'d if one wants.

Sign in to comment.

Asked:

Joe
on 31 May 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!