| Contents | Index |
means = grpstats(X)
means = grpstats(X,group)
grpstats(X,group,alpha)
dsstats = grpstats(ds,groupvars)
[A,B,...] = grpstats(X,group,whichstats)
[...] = grpstats(...,whichstats,'Param1',VAL1,'Param2',VAL2,...)
means = grpstats(X) computes the mean of the entire sample without grouping, where X is a matrix of observations.
means = grpstats(X,group) returns the means of each column of X by group. The array, group defines the grouping such that two elements of X are in the same group if their corresponding group values are the same. (See Grouped Data.) The grouping variable group can be a categorical variable, vector, string array, or cell array of strings. It can also be a cell array containing several grouping variables (such as {g1 g2 g3}) to group the values in X by each unique combination of grouping variable values.
grpstats(X,group,alpha) displays a plot of the means versus index with 100(1-alpha)% confidence intervals around each mean.
dsstats = grpstats(ds,groupvars), when ds is a dataset array, returns a dataset dsstats that contains the mean, computed by group, for variables in ds. groupvars specifies the grouping variables in ds that define the groups, and is a positive integer, a vector of positive integers, the name of a dataset variable, a cell array containing one or more dataset variable names, or a logical vector. A grouping variable may be a vector of categorical, logical, or numeric values, a character array of strings, or a cell vector of strings. dsstats contains those grouping variables, plus one variable giving the number of observations in ds for each group, as well as one variable for each of the remaining dataset variables in ds. These variables must be numeric or logical. dsstats contains one observation for each group of observations in ds. groupvars can be [] or omitted to compute the mean of each variable across the entire dataset without grouping.
grpstats treats NaNs as missing values, and removes them.
grpstats ignores empty group names.
[A,B,...] = grpstats(X,group,whichstats) returns the statistics specified in whichstats. The input whichstats can be a single function handle or name, or a cell array containing multiple function handles or names. The number of outputs (A,B, ...) must match the number function handles and names in whichstats. Acceptable names are as follows:
'mean' — mean
'sem' — standard error of the mean
'numel' — count, or number of non-NaN elements
'gname' — group name
'std' — standard deviation
'var' — variance
'min' — minimum
'max' — maximum
'range' — maximum - minimum
'meanci' — 95% confidence interval for the mean
'predci' — 95% prediction interval for a new observation
Each function included in whichstats must accept a column vector of data and compute a descriptive statistic for it. For example, @median and @skewness are suitable functions to apply to a numeric input. A function must return the same size output each time grpstats calls it, even if the input for some groups is empty. The function typically returns a scalar value, but may return an nvals-by-1 column vector if the descriptive statistic is not a scalar (a confidence interval, for example). The size of each output A, B, ... is ngroups-by-ncols-by-nvals, where ngroups is the number of groups, ncols is the number of columns in the data X, and nvals is the number of values returned by the function for data from a single group in one column of X. If X is a vector of data, then the size of each output A, B, .... is ngroups-by-nvals.
A function included in whichstats may also be written to accept a matrix of data and compute a descriptive statistic for each column. The function should return either a row vector, or an nvals-by-ncols matrix if the descriptive statistic is not a scalar.
For the case when data are contained in a numeric matrix X, a function specified in whichstats may also be written to accept a matrix of data and ompute a descriptive statistic for each column. The function should return either a row vector, or an nvals-by-ncols matrix if the descriptive statistic is not a scalar.
[...] = grpstats(...,whichstats,'Param1',VAL1,'Param2',VAL2,...) specifies additional parameter name/value pairs chosen from the following:
| 'Alpha' | A value from 0 to 1 that specifies the confidence level as 100(1-alpha)% for the 'meanci' and 'predci' options. Default is 0.05. |
| 'DataVars' | The names of the variables in ds to which the functions in whichstats should be applied. dsstats contains one summary statistic variable for each of these data variables. datavars is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector. |
| 'VarNames' | The names of the variables in dsstats. By default, grpstats uses the names from ds for the grouping variable names, and constructs names for the summary statistic variables based on the function name and the data variable names from ds. |
dsstats contains ngroupvars + 1 + ndatavars*nfuns variables, where ngroupvars is the number of variables specified in groupvars, ndatavars is the number of variables specified in datavars, and nfuns is the number of summary statistics specified in whichstats.
load carsmall
[m,p,g] = grpstats(Weight,Model_Year,...
{'mean','predci','gname'})
n = length(m)
errorbar((1:n)',m,p(:,2)-m)
set(gca,'xtick',1:n,'xticklabel',g)
title('95% prediction intervals for mean weight by year')

dataset.grpstats | grp2idx | gscatter
| © 1984-2012- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |