How can I include absent group names in table grouping results?

3 views (last 30 days)
Aaron Thrasher
Aaron Thrasher on 15 Oct 2021
Commented: Aaron Thrasher on 15 Oct 2021
I am using groupsummary on table information and am finding it overly complicated to show additional data in the results. Here is an example of my situation:
teamMembers = {'Al','Betty','Charles','Denice','Edward'};
%Example Data
Name = {'Denice';'Denice';'Al';'Charles';'Al';'Denice'};
Month = [1;1;2;2;2;2];
Hours = [1;3;3;2;4;2];
T = table(Name,Month,Hours);
%Example summary (current result)
weekHours = groupsummary(T,{'Month','Name'},'sum','Hours')
weekHours = 4×4 table
Month Name GroupCount sum_Hours _____ ___________ __________ _________ 1 {'Denice' } 2 4 2 {'Al' } 2 7 2 {'Charles'} 1 2 2 {'Denice' } 1 2
%Example summary (expected result)
Month Name GroupCount sum_Hours
1 {'Al' } 0 0
1 {'Betty' } 0 0
1 {'Charles'} 0 0
1 {'Denice' } 2 4
1 {'Edward'} 0 0
2 {'Al' } 2 7
2 {'Betty' } 0 0
2 {'Charles'} 1 2
2 {'Denice' } 1 2
2 {'Edward'} 0 0
But I would like to include empty groups of the team as in the example for categorical information using 'IncludeEmptyGroups'
In the end I'd like all team members to show up in the summary which would show group count and sum of 0.
So far I have tried just inlcuding these names as an extra row in the table, but this is much more complicated in a real example with a growing number of table columns and data types to append the table with empty data just to include the names. I could always add the empty names at the end summary with 0's, I'm just looking for a more convenient way to do this. I haven't explored categorical types so I am not sure if there is a better solution there either.

Answers (2)

dpb on 15 Oct 2021
Well, unless the data are in the table, there's not going to be anything for groupsummary to operate over to tell it those other elements even exist. Don't see much of any way around that if you want groupsummary to do all the work.
Otherwise you could just take the results you get and augment the summary table with the missing names, but that function will have to know which variables are in the original in order to add the proper variable(s) to the summary table.
I don't see it as being that hard to create an empty record of the data type as the table and making the entries.

Image Analyst
Image Analyst on 15 Oct 2021
Try this:
teamMembers = {'Al','Betty','Charles','Denice','Edward'}
%Example Data
Name = {'Denice';'Denice';'Al';'Charles';'Al';'Denice'}
Hours = [1;3;3;2;4;2];
% Add in missing names
missingNames = setdiff(teamMembers, Name)'
% Add zeros for them to the lists
Name = [Name; missingNames]
Hours = [Hours; zeros(numel(missingNames), 1)]
T = table(Name,Hours);
%Example summary (current result)
weekHours = groupsummary(T,'Name','sum','Hours')
%Example summary (expected result)
weekHours =
5×3 table
Name GroupCount sum_Hours
___________ __________ _________
{'Al' } 2 7
{'Betty' } 1 0
{'Charles'} 1 2
{'Denice' } 3 6
{'Edward' } 1 0
Aaron Thrasher
Aaron Thrasher on 15 Oct 2021
I explored a little bit of using types to add new data like you described, however it got pretty complicated when trying to set the variable type, but this was without the setvartype, so this could be very useful. In the end my current solution is to have a cell array displaying the data I need with all blank answers, then using intersect from the group summary to place in the data. With having database queries of useful tables with a width more than 20 columns and multiple datapoints and up to 3 level groupings, it gets complicated quick. Thanks for all your help everyone.

Sign in to comment.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!