groupsummary
Group summary computations
Syntax
Description
returns a table containing the computed groups and the number of elements in
each group for data in a table or timetable G
= groupsummary(T
,groupvars
)T
. A group
contains the unique combinations of grouping variables in
groupvars
. For example, G =
groupsummary(T,'Gender')
returns the number of
Male
elements and the number of Female
elements in the variable Gender
.
specifies additional grouping properties using one or more namevalue pairs for
any of the previous syntaxes. For example, G
= groupsummary(___,Name,Value
)G =
groupsummary(T,'Category1','IncludeMissingGroups',false)
excludes
the group made from missing categorical
data indicated by
<undefined>
.
specifies additional grouping properties using one or more namevalue pairs for
either of the previous array syntaxes.B
= groupsummary(___,Name,Value
)
Examples
Summary Statistics
Compute summary statistics on table variables.
Create a table T
that contains information about five individuals.
Gender = ["male";"female";"female";"male";"male"]; Age = [38;43;38;40;49]; Height = [71;69;64;67;64]; Weight = [176;163;131;133;119]; T = table(Gender,Age,Height,Weight)
T=5×4 table
Gender Age Height Weight
________ ___ ______ ______
"male" 38 71 176
"female" 43 69 163
"female" 38 64 131
"male" 40 67 133
"male" 49 64 119
Compute the counts of males and females by specifying Gender
as the grouping variable.
G = groupsummary(T,"Gender")
G=2×2 table
Gender GroupCount
________ __________
"female" 2
"male" 3
Compute the mean age, height, and weight of females and males separately.
G = groupsummary(T,"Gender","mean")
G=2×5 table
Gender GroupCount mean_Age mean_Height mean_Weight
________ __________ ________ ___________ ___________
"female" 2 40.5 66.5 147
"male" 3 42.333 67.333 142.67
Still grouping by gender, compute the median height only.
G = groupsummary(T,"Gender","median","Height")
G=2×3 table
Gender GroupCount median_Height
________ __________ _____________
"female" 2 66.5
"male" 3 67
Multiple Grouping Variables
Group table data using two grouping variables.
Create a table T
that contains information about five individuals.
Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119]; T = table(Gender,Smoker,Weight)
T=5×3 table
Gender Smoker Weight
________ ______ ______
"male" true 176
"female" false 163
"male" true 131
"female" false 133
"male" true 119
Compute the mean weight, grouped by gender and smoking status. By default, two combinations of gender and smoking status are not represented in the output because they are empty groups.
G = groupsummary(T,{'Gender','Smoker'},'mean','Weight')
G=2×4 table
Gender Smoker GroupCount mean_Weight
________ ______ __________ ___________
"female" false 2 148
"male" true 3 142
Set the 'IncludeEmptyGroups'
parameter value to true
in order to see all group combinations, including the empty ones.
G = groupsummary(T,{'Gender','Smoker'},'mean','Weight','IncludeEmptyGroups',true)
G=4×4 table
Gender Smoker GroupCount mean_Weight
________ ______ __________ ___________
"female" false 2 148
"female" true 0 NaN
"male" false 0 NaN
"male" true 3 142
Specify Group Bins
Group data according to specified bins.
Create a timetable containing sales information for days within a single month.
TimeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;... 2017 3 14; 2017 3 31; 2017 3 25;... 2017 3 29; 2017 3 21; 2017 3 18]); Profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; TotalItemsSold = [14 13 8 5 10 16 8 6 7 11]'; TT = timetable(TimeStamps,Profit,TotalItemsSold)
TT=10×2 timetable
TimeStamps Profit TotalItemsSold
___________ ______ ______________
04Mar2017 2032 14
02Mar2017 3071 13
15Mar2017 1185 8
10Mar2017 2587 5
14Mar2017 1998 10
31Mar2017 2899 16
25Mar2017 3112 8
29Mar2017 909 6
21Mar2017 2619 7
18Mar2017 3085 11
Compute the mean profit grouped by the total items sold, binning the groups into intervals of item numbers.
format shorte G = groupsummary(TT,'TotalItemsSold',[0 4 8 12 16],'mean','Profit')
G=3×3 table
disc_TotalItemsSold GroupCount mean_Profit
___________________ __________ ___________
[4, 8) 3.0000e+00 2.0383e+03
[8, 12) 4.0000e+00 2.3450e+03
[12, 16] 3.0000e+00 2.6673e+03
Compute the mean profit grouped by day of the week.
G = groupsummary(TT,'TimeStamps','dayname','mean','Profit')
G=5×3 table
dayname_TimeStamps GroupCount mean_Profit
__________________ __________ ___________
Tuesday 2.0000e+00 2.3085e+03
Wednesday 2.0000e+00 1.0470e+03
Thursday 1.0000e+00 3.0710e+03
Friday 2.0000e+00 2.7430e+03
Saturday 3.0000e+00 2.7430e+03
Group Operations with Vector Data
Create a vector of dates and a vector of corresponding profit values.
timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
Compute the mean profit by day of the week. Display the means, the group names, and the number of members in each group.
format shorte [meanDailyProfit,dayOfWeek,dailyCounts] = groupsummary(profit,timeStamps,'dayname','mean')
meanDailyProfit = 5×1
2.3085
1.0470
3.0710
2.7430
2.7430
dayOfWeek = 5x1 categorical
Tuesday
Wednesday
Thursday
Friday
Saturday
dailyCounts = 5×1
2
2
1
2
3
Multiple Grouping Vectors for Vector Input
Compute the mean weights for four groups based on their gender and smoker status.
Store patient information as three vectors of different types.
Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119];
Grouping by gender and smoker status, compute the mean weights. B
contains the mean for each group (NaN
for empty groups). BG
is a cell array containing two vectors that describe the groups as you look at their elements rowwise. For instance, the first row of BG{1}
says that the patients in the first group are female, and the first row of BG{2}
says that they are nonsmokers. Finally, BC
contains the number of members in each group for the corresponding groups in BG
.
[B,BG,BC] = groupsummary(Weight,{Gender,Smoker},'mean','IncludeEmptyGroups',true); B
B = 4×1
148
NaN
NaN
142
BG{1}
ans = 4x1 string
"female"
"female"
"male"
"male"
BG{2}
ans = 4x1 logical array
0
1
0
1
BC
BC = 4×1
2
0
0
3
Method Function Handle with Multiple Inputs
Load data containing patient information and create a table describing each patient's gender, systolic and diastolic blood pressure, height, and weight.
load patients
T = table(Gender,Systolic,Diastolic,Height,Weight)
T=100×5 table
Gender Systolic Diastolic Height Weight
__________ ________ _________ ______ ______
{'Male' } 124 93 71 176
{'Male' } 109 77 69 163
{'Female'} 125 83 64 131
{'Female'} 117 75 67 133
{'Female'} 122 80 64 119
{'Female'} 121 70 68 142
{'Female'} 130 88 64 142
{'Male' } 115 82 68 180
{'Male' } 115 78 68 183
{'Female'} 118 86 66 132
{'Female'} 114 77 68 128
{'Female'} 115 68 66 137
{'Male' } 127 74 71 174
{'Male' } 130 95 72 202
{'Female'} 114 79 65 129
{'Male' } 130 92 71 181
⋮
Grouping by gender, compute the correlation between patient height and weight and the correlation between systolic and diastolic blood pressure. Use the xcov
function as the method to compute the correlation. The first two input arguments to xcov
describe the data to correlate, the third argument describes the lag size, and the fourth argument describes the type of normalization. For each group computation, the x
and y
arguments passed into xcov
are specified pairwise by variable from the two cell elements ["Height","Systolic"]
and ["Weight","Diastolic"]
.
G = groupsummary(T,"Gender",@(x,y)xcov(x,y,0,'coeff'),{["Height","Systolic"],["Weight","Diastolic"]})
G=2×4 table
Gender GroupCount fun1_Height_Weight fun1_Systolic_Diastolic
__________ __________ __________________ _______________________
{'Female'} 53 0.071278 0.48731
{'Male' } 47 0.047571 0.50254
Alternatively, if your data is in vector or matrix form instead of in a table, you can provide the data to correlate as the first input argument of groupsummary
.
[G,GR,GC] = groupsummary({[Height,Systolic],[Weight,Diastolic]},Gender,@(x,y)xcov(x,y,0,'coeff'))
G = 2×2
0.0713 0.4873
0.0476 0.5025
GR = 2x1 cell
{'Female'}
{'Male' }
GC = 2×1
53
47
Input Arguments
T
— Input data
table  timetable
Input data, specified as a table or timetable.
A
— Input array
vector  matrix  cell array
Input array, specified as a vector, matrix, or cell array of vectors or matrices.
When you specify a function handle for method
that
takes more than one input argument, the input data A
must
be a cell array of vectors or matrices. In each call to the function by
group, the input arguments are the corresponding columns of each element in
the cell array. For example:
groupsummary({x1, y1},groupvars,@(x,y) myFun(x,y))
calculatesmyFun(x1,y1)
for each group.groupsummary({[x1 x2], [y1 y2]},groupvars,@(x,y) myFun(x,y))
first calculatesmyFun(x1,y1)
for each group, and then calculatesmyFun(x2,y2)
for each group.
groupvars
— Grouping variables or vectors
scalar  vector  matrix  cell array  function handle  table vartype
subscript
Grouping variables or vectors, specified as one of these options:
For array input,
groupvars
can be either a column vector with the same number of rows asA
or a group of column vectors arranged in a matrix or cell array.For table or timetable inputs,
groupvars
indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.Option Description Examples Scalar variable name A character vector or scalar string specifying a single table variable name.
'Var1'
"Var1"
Vector of variable names A cell array of character vectors or string array where each element is a table variable name.
{'Var1' 'Var2'}
["Var1" "Var2"]
Scalar or vector of variable indices A scalar or vector of table variable indices.
1
[1 3 5]
Logical scalar or vector A logical vector whose elements each correspond to a table variable, where
true
includes the corresponding variable andfalse
excludes it.[true false true]
Function handle A function handle that takes a table variable as input and returns a logical scalar.
@isnumeric
vartype
subscriptA table subscript generated by the
vartype
function.vartype('numeric')
Example: groupsummary(T,"Var3")
method
— Computation method
'sum'
 'mean'
 'median'
 'mode'
 'var'
 'std'
 'min'
 'max'
 'range'
 'nummissing'
 'nnz'
 'all'
 function handle  cell array
Computation method, specified as one of the following:
'sum'
— sum'mean'
— mean'median'
— median'mode'
— mode'var'
— variance'std'
— standard deviation'min'
— minimum'max'
— maximum'range'
— maximum minus minimum'nummissing'
— number of missing elements'nnz'
— number of nonzero and nonNaN
elements'all'
— all computations previously listed
You also can specify method
as a function handle that
returns one entity per group whose first dimension has length 1. For table
input data, the function operates on each table variable separately.
When the input data is a table T
and you specify a
function handle for method
that takes more than one input
argument, you must specify datavars
. The
datavars
argument must be a cell array whose elements
indicate the table variables to use for each input into the method. In each
call to the function by group, the input arguments are the corresponding
table variables of the cell array elements. For example:
groupsummary(T,groupvars,@(x,y) myFun(x,y),{"x1","y1"})
calculatesmyFun(T.x1,T.y1)
for each group.groupsummary(T,groupvars,@(x,y) myFun(x,y),{["x1" "x2"],["y1" "y2"]})
first calculatesmyfun(T.x1,T.y1)
for each group, and then calculatesmyfun(T.x2,T.y2)
for each group.
When the input data is in vector or matrix form and you specify a function
handle for method
that takes more than one input
argument, the input data A
must be a cell array of
vectors or matrices. In each call to the function, the input arguments are
the corresponding columns of each element in the cell array. For example:
groupsummary({x1,y1},groupvars,@(x,y) myFun(x,y))
calculatesmyFun(x1,y1)
for each group.groupsummary({[x1 x2],[y1 y2]},groupvars,@(x,y) myFun(x,y))
first calculatesmyFun(x1,y1)
for each group, and then calculatesmyFun(x2,y2)
for each group.
To specify multiple computations at a time, list the options in a cell
array, such as {'mean','median'}
or
{myFun1,myFun2}
.
NaN
values in the input data are automatically omitted
when using the method names described here, with the exception of
'nummissing'
. To include NaN
values, consider using a function handle for the method, such as
@sum
instead of 'sum'
.
Data Types: char
 string
 cell
 function_handle
datavars
— Table variables to operate on
scalar  vector  cell array  function handle  table vartype
subscript
Table variables to operate on, specified as one of the options in this
table. datavars
indicates which variables of the input
table or timetable to apply the methods to. Other variables not specified by
datavars
are not operated on and do not pass through
to the output. When datavars
is not specified,
groupsummary
operates on each nongrouping
variable.
Option  Description  Examples 

Variable name  A character vector or scalar string specifying a single table variable name 

Vector of variable names  A cell array of character vectors or string array where each element is a table variable name 

Scalar or vector of variable indices  A scalar or vector of table variable indices 

Logical vector  A logical vector whose elements each correspond to a table variable, where


Function handle  A function handle that takes a table variable as input and returns a logical scalar 

vartype subscript  A table subscript generated by the 

When the input data is a table T
and you specify a
function handle for method
that takes more than one input
argument, you must specify datavars
. The
datavars
argument must be a cell array whose elements
are any of the options in the table. The cell array elements indicate the
table variables to use for each input into the method. In each call to the
function by group, the input arguments are the corresponding table variables
of the cell array elements. For example:
groupsummary(T,groupvars,@(x,y) myFun(x,y),{"x1", "y1"})
calculatesmyFun(T.x1,T.y1)
for each group.groupsummary(T,groupvars,@(x,y) myFun(x,y),{["x1" "x2"],["y1" "y2"]})
first calculatesmyfun(T.x1,T.y1)
for each group, and then calculatesmyfun(T.x2,T.y2)
for each group.
Example: groupsummary(T,groupvars,method,["Var1" "Var2"
"Var4"])
groupbins
— Binning scheme
'none'
(default)  vector  scalar  cell array
Binning scheme, specified as one of the following options:
'none'
, indicating no binningA list of bin edges, specified as a numeric vector, or a
datetime
vector fordatetime
grouping variables or vectorsA number of bins, specified as an integer scalar
A time duration, specified as a scalar of type
duration
orcalendarDuration
indicating bin widths (fordatetime
orduration
grouping variables or vectors only)A time bin for
datetime
andduration
grouping variables or vectors only, specified as one of the following character vectors:Value Description Data Type 'second'
Each bin is 1 second.
datetime
andduration
'minute'
Each bin is 1 minute.
datetime
andduration
'hour'
Each bin is 1 hour.
datetime
andduration
'day'
Each bin is 1 calendar day. This value accounts for Daylight Saving Time shifts.
datetime
andduration
'week'
Each bin is 1 calendar week. datetime
only'month'
Each bin is 1 calendar month. datetime
only'quarter'
Each bin is 1 calendar quarter. datetime
only'year'
Each bin is 1 calendar year. This value accounts for leap days.
datetime
andduration
'decade'
Each bin is 1 decade (10 calendar years). datetime
only'century'
Each bin is 1 century (100 calendar years). datetime
only'secondofminute'
Bins are seconds from 0 to 59.
datetime
only'minuteofhour'
Bins are minutes from 0 to 59.
datetime
only'hourofday'
Bins are hours from 0 to 23.
datetime
only'dayofweek'
Bins are days from 1 to 7. The first day of the week is Sunday.
datetime
only'dayname'
Bins are full day names such as 'Sunday'
.datetime
only'dayofmonth'
Bins are days from 1 to 31. datetime
only'dayofyear'
Bins are days from 1 to 366. datetime
only'weekofmonth'
Bins are weeks from 1 to 6. datetime
only'weekofyear'
Bins are weeks from 1 to 54. datetime
only'monthname'
Bins are full month names such as 'January'
.datetime
only'monthofyear'
Bins are months from 1 to 12.
datetime
only'quarterofyear'
Bins are quarters from 1 to 4. datetime
onlyA cell array listing binning rules for each grouping variable or vector
When multiple grouping variables are specified, you can provide a single
binning rule that is applied to all grouping variables, or a cell array
containing a binning method for each grouping variable such as
{'none',[0 2 4 Inf]}
.
NameValue Arguments
Specify optional
commaseparated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
G =
groupsummary(T,groupvars,groupbins,'IncludedEdge','right')
IncludedEdge
— Included bin edge
'left'
(default)  'right'
Included bin edge, specified as either 'left'
or
'right'
, indicating which end of the bin interval
is inclusive.
This namevalue pair can only be specified when
groupbins
is specified, and the value is applied
to all binning schemes for all grouping variables or vectors.
IncludeMissingGroups
— Missing groups indicator
true
(default)  false
Missing groups indicator, specified as true
or
false
. When the parameter value is
true
, groupsummary
displays
groups made up of missing values, such as NaN
. When
the parameter value is false
,
groupsummary
does not display the missing
groups.
Data Types: logical
IncludeEmptyGroups
— Empty groups indicator
false
(default)  true
Empty groups indicator, specified as true
or
false
. When the parameter value is
false
, groupsummary
does not
display groups with zero elements. When the parameter value is
true
, groupsummary
displays
the empty groups.
Data Types: logical
Output Arguments
G
— Output table
table
Output table, returned as a table containing the specified computations for each group.
B
— Output array
vector  matrix
Output array, returned as a vector or matrix containing the group
computations. When you specify multiple methods,
groupsummary
horizontally concatenates the
computations in the order that they were listed.
BG
— Groups
column vector  cell array of column vectors
Groups for array input data, returned as a column vector or cell array of column vectors each corresponding to a grouping vector.
When you provide more than one grouping vector, BG
is a
cell array containing column vectors of equal length. The group information
can be found by looking at the elements rowwise across all vectors in
BG
. Each group maps to the corresponding row of the
output array B
.
BC
— Group counts
column vector
Group counts for array input data, returned as a column vector containing
the number of elements in each group. The length of BC
is
the same as the length of the group column vectors returned in
BG
.
Tips
When making many calls to
groupsummary
, consider converting grouping variables to typecategorical
orlogical
when possible for improved performance. For example, if you have a grouping variable of typechar
(such asGender
with elements'Male'
and'Female'
), you can convert it to a categorical value using the commandcategorical(Gender)
.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations:
If
A
andgroupvars
are both tall matrices, then they must have the same number of rows.If the first input is a tall matrix, then
groupvars
can be a cell array containing tall grouping vectors.The
groupvars
anddatavars
arguments do not support function handles.The
'IncludeEmptyGroups'
namevalue pair is not supported.The
'median'
and'mode'
methods are not supported.For tall datetime arrays, the
'std'
method is not supported.If the
method
argument is a function handle, then it must be a valid input forsplitapply
operating on a tall array. If the function handle takes multiple inputs, then the first input togroupsummary
must be a tall table.The order of the groups might be different compared to inmemory
groupsummary
calculations.When grouping by discretized datetime arrays, the categorical group names are different compared to inmemory
groupsummary
calculations.
For more information, see Tall Arrays.
ThreadBased Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports threadbased environments. For more information, see Run MATLAB Functions in ThreadBased Environment.
See Also
grouptransform
 groupfilter
 groupcounts
 findgroups
 splitapply
 discretize
 varfun
 rowfun
 convertvars
 vartype
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)