grpstats - Summary statistics by group for dataset arrays

Class

@dataset

Syntax

B = grpstats(A,groupvars)
B = grpstats(A,groupvars,whichstats)
B = grpstats(A,groupvars,whichstats,...,'DataVars',vars)
B = grpstats(A,groupvars,whichstats,...,'VarNames',names)

Description

B = grpstats(A,groupvars) returns a dataset array B that contains the means, computed by group, for variables in the dataset array A. The optional input groupvars specifies the variables in A that define the groups. groupvars can be a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector. groupvars can also be [] or omitted to compute the means of the variables in A without grouping. Grouping variables can be vectors of categorical, logical, or numeric values, a character array of strings, or a cell vector of strings. (See Grouped Data.)

B contains the grouping variables, plus a variable giving the number of observations in A for each group, plus a variable for each of the remaining variables in A. B contains one observation for each group of observations in A.

grpstats treats NaNs as missing values, and removes them.

B = grpstats(A,groupvars,whichstats) returns a dataset array B with variables for each of the statistics specified in whichstats, applied to each of the nongrouping variables in A. whichstats can be a single function handle or name, or a cell array containing multiple function handles or names. The names can be chosen from among the following:

Each function included in whichstats must accept a subset of the rows of a dataset variable, and compute column-wise descriptive statistics for it. A function should typically return a value that has one row but is otherwise the same size as its input data. For example, @median and @skewness are suitable functions to apply to a numeric dataset variable.

A summary statistic function may also return values with more than one row, provided the return values have the same number of rows each time grpstats applies the function to different subsets of data from a given dataset variable. For a dataset variable that is nobs-by-m-by-... if a summary statistic function returns values that are nvals-by-m-by-... then the corresponding summary statistic variable in B is ngroups-by-m-by-...-by-nvals, where ngroups is the number of groups in A.

B = grpstats(A,groupvars,whichstats,...,'DataVars',vars) specifies the variables in A to which the functions in whichstats should be applied. The output dataset arrays contain one summary statistic variable for each of the specified variables. vars is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector.

B = grpstats(A,groupvars,whichstats,...,'VarNames',names) specifies the names of the variables in B. By default, grpstats uses the names from A for the grouping variables, and constructs names for the summary statistic variables based on the function name and the data variable names. The number of variables in B is ngroupvars + 1 + ndatavars*nfuns, where ngroupvars is the number of variables specified in groupvars, ndatavars is the number of variables specified in vars, and nfuns is the number of summary statistics specified in whichstats.

Example

Compute blood pressure statistics for the data in hospital.mat, by sex and smoker status:

load hospital
grpstats(hospital,...
         {'Sex','Smoker'},...
         {@median,@iqr},...
         'DataVars','BloodPressure')
ans = 
             Sex       Smoker    GroupCount
 Female_0    Female    false     40        
 Female_1    Female    true      13        
 Male_0      Male      false     26        
 Male_1      Male      true      21        

             median_BloodPressure
 Female_0    119.5            79 
 Female_1      129            91 
 Male_0        119            79 
 Male_1        129            92 

             iqr_BloodPressure
 Female_0     6.5          5.5
 Female_1       8          5.5
 Male_0         7            6
 Male_1      10.5          4.5

See Also

grpstats, summary

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS