Info

This question is closed. Reopen it to edit or answer.

Indexing rows from table that contain specific proportions of values across 2 variables

1 view (last 30 days)
I have a series of tables that I am generating based on peoples' ratings of a number of items. The resulting tables are 400-500 rows long. There are two variables (PBIN and VBIN; see image), whose values can range from 1-4.
What I need to do it to see if within this table, there exist 100 rows where all unique values of PBIN and VBIN (1-4) are represented with equal frequency. More specifically, is there a way to index the rows of these tables to attempt to find a combination of 100 rows where all 4 unique values of table.PBIN occur 25 times each, AND all 4 unique values of table.VBIN occur 25 times each?
I don't think it will always be possible to draw 100 rows that meet this criteria. Thus, if there's way that the above could be programmed, how could I implement a way to get the largest possible table where these criteria are met? For example, it will first attempt to find 100 rows, if that doesn't work, it will try to find the next multiple of 4 down from 100 (96), try again, etc.
Any help would be greatly appreciated! Please let me know if there's anything I could clear up!

Answers (1)

Peter Perkins
Peter Perkins on 30 Nov 2016
This isn't really a question about tables, it's a question about data sampling. It surely will not be possible in general, and it seems like you'll just have to write a bunch of complicated logic to do what you want.
All you've mentioned is two variables in the table. You haven't said anything about other variables, nor have you said anything about unique combinations of values from the two variables. Given that, it seems like rather than trying to select rows of the table, you'd be better off randomly generating data. Why not just create a table that has exactly 100 rows, 25 with 1,1, 25 with 2,2, etc. and then permute the two variables independently.
  1 Comment
Pablo Morales
Pablo Morales on 30 Nov 2016
Why do you say this won't be possible? I feel that it absolutely is, I just am stuck as to figuring out how.
The other variables do not matter; they have already been previously accounted for in a series of prior processing steps. All that matters is whether or not I can sample from this table to meet the criteria I described above based on these two variables alone.
Randomly generating data will not work. These variables reflect differences in ratings that human participants have attributed to a set of sample stimuli. The goal of my initial question is to sample a balanced set of rows in the table (trials associated with stimuli) for use in a subsequent experiment. The stimuli that these values are attached to will be different from person to person.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!