How do I split a table into multiple sub-tables based on the frequency of a variable?

I have an existing table and would like to make subsets of this table conditionally based on how often entries appear in one of the variables. How do I do this?
For instance, if I have table, T, and in variable Var1, entries A, B, C, and D appear with frequency 1,1,2,3, respectively:
>> Var1 = ["A";"B";"C";"C";"D";"D";"D"];
>> Var2 = ["data1";"data2";"data3";"data4";"data5";"data6";"data7"];
>> T = table(Var1,Var2)
T =
  7×2 table
    Var1     Var2  
    ____    _______
     A      "data1"
     B      "data2"
     C      "data3"
     C      "data4"
     D      "data5"
     D      "data6"
     D      "data7"
I would like to obtain table which would include the rows with frequency of 1 from T (A and B), another table with the entries that had a frequency of 2 (C), and a table with entries that had a frequency of 3 (D).

 Accepted Answer

One approach to solve this issue is to use categorical variables, which will allow you to identify the frequencies of each entries and then logical index to create the subset tables. This workflow is demonstrated in the code below:
% Convert Var1 to categorical data T = convertvars(T, "Var1", 'categorical'); % Set up variables for table generation counts = countcats(T.Var1); cats = categories(T.Var1); uniqueCounts = unique(counts); % Create tables splitTables = cell(length(uniqueCounts)); for i = 1:length(uniqueCounts) currentCountCats = cats(counts == i); splitTables{i} = T(ismember(T.Var1, currentCountCats), :); end
This results in the following tables:
>> splitTables{1,1} ans = 2×2 table Var1 Var2 ____ _______ A "data1" B "data2" >> splitTables{2,1} ans = 2×2 table Var1 Var2 ____ _______ C "data3" C "data4" >> splitTables{3,1} ans = 3×2 table Var1 Var2 ____ _______ D "data5" D "data6" D "data7"
First, you can make a variable categorical when creating the initial table by using 'categorical' or, for an existing table, you can convert the variable to categorical using 'convertvars'. Documentations for these functions can be found at the links below.
Please run the below commands in the command window of installed MATLAB R2019a version to get release specific documentations for each:
>> web(fullfile(docroot, 'matlab/ref/categorical.html'))
>> web(fullfile(docroot, 'matlab/ref/convertvars.html'))
Once the entries are in a categorical variable, you can use 'categories' to get a list of the categories and 'countcats' to obtain the frequency of each category. To find the relevant release specific documentations on these functions, run the below command in the command window of installed MATLAB R2019a version:
>> web(fullfile(docroot, 'matlab/ref/categorical.categories.html'))
>> web(fullfile(docroot, 'matlab/ref/categorical.countcats.html'))
Finally, you can use logical indexing to find the categories of a certain frequency and pull the relevant rows from the original table using 'ismember'. More information on logical indexing and 'ismember' can be found in their relevant documentations which can be accessed by running the below command in the command window of installed MATLAB R2019a version:
>> web(fullfile(docroot, 'matlab/math/array-indexing.html'))
>> web(fullfile(docroot, 'matlab/ref/ismember.html'))
Note that the same process can be followed for numeric variables as well.

More Answers (0)

Categories

Products

Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!