Split data into groups and apply function
Y = splitapply(func,X,G)
Y = splitapply(func,X1,...,XN,G)
Y = splitapply(func,T,G)
[Y1,...,YM] = splitapply(___)
Y = splitapply(
X into groups specified by
applies the function
func to each group.
Y as an array that
contains the concatenated outputs from
func for the groups split
X. The input argument
G is a vector of
positive integers that specifies the groups to which corresponding elements of
X belong. If
splitapply omits the
corresponding values in
X when it splits
into groups. To create
G, you can use the
splitapply combines two steps in the Split-Apply-Combine Workflow.
[Y1,...,YM] = splitapply(___) splits
variables into groups and applies
func to each
func returns multiple output arguments.
the concatenated outputs from
func for the groups
split out of the input data variables.
return output arguments that belong to different classes, but the
class of each output must be the same each time
called. You can use this syntax with any of the input arguments of
the previous syntaxes.
The number of output arguments from
not be the same as the number of input arguments specified by
Calculate the mean heights by gender for groups of patients and display the results.
Load patient heights and genders from the data file
load patients whos Gender Height
Name Size Bytes Class Attributes Gender 100x1 12212 cell Height 100x1 800 double
Specify groups by gender with
G = findgroups(Gender);
Height into groups specified by
G. Calculate the mean height by gender. The first row of the output argument is the mean height of the female patients, and the second row is the mean height of the male patients.
ans = 2×1 65.1509 69.2340
Calculate the variances of the differences in blood pressure readings for groups of patients, and display the results. The blood pressure readings are contained in two data variables. To calculate the differences, use a function that takes two input arguments.
Load blood pressure readings and smoking data for 100 patients from the data file
load patients whos Systolic Diastolic Smoker
Name Size Bytes Class Attributes Diastolic 100x1 800 double Smoker 100x1 100 logical Systolic 100x1 800 double
func as a function that calculates the variances of the differences between systolic and diastolic blood-pressure readings for smokers and nonsmokers.
func requires two input arguments.
func = @(x,y) var(x-y);
splitapply to split the patient data into groups and calculate the variances of the differences.
findgroups also returns group identifiers in
splitapply function calls
func once per group, with
Diastolic as the two input arguments.
[G,smokers] = findgroups(Smoker); varBP = splitapply(func,Systolic,Diastolic,G)
varBP = 2×1 44.4459 48.6783
Create a table that contains the variances of the differences, with the number of patients in each group.
numPatients = splitapply(@numel,Smoker,G); T = table(smokers,numPatients,varBP)
T=2×3 table smokers numPatients varBP _______ ___________ ______ false 66 44.446 true 34 48.678
Calculate the minimum, median, and maximum weights for groups of patients and return these results as arrays for each group.
splitapply concatenates the output arguments so that you can distinguish output for each group from output for the other groups.
Define a function that returns the minimum, median, and maximum as a row vector.
mystats = @(x)[min(x) median(x) max(x)];
Load patient weights, genders, and status as smokers from
load patients whos Weight Gender Smoker
Name Size Bytes Class Attributes Gender 100x1 12212 cell Smoker 100x1 100 logical Weight 100x1 800 double
splitapply to split the patient weights into groups and calculate statistics for each group.
G = findgroups(Gender,Smoker); Y = splitapply(mystats,Weight,G)
Y = 4×3 111.0000 131.0000 147.0000 115.0000 131.0000 146.0000 158.0000 181.5000 194.0000 164.0000 181.0000 202.0000
In this example, you can return nonscalar output as row vectors because the data and grouping variables are column vectors. Each row of
Y contains statistics for a different group of patients.
Calculate the mean body-mass-index (BMI) from tables of patient data. Group the patients by gender and status as smokers or nonsmokers.
Load patient data and grouping variables into tables.
load patients DT = table(Height,Weight); GT = table(Gender,Smoker);
Define a function that calculates mean BMI from the weights and heights of groups or patients.
meanBMIFcn = @(h,w)mean((w ./ (h.^2)) * 703);
Create a table that contains the mean BMI for each group.
[G,results] = findgroups(GT); meanBMI = splitapply(meanBMIFcn,DT,G); results.meanBMI = meanBMI
results=4×3 table Gender Smoker meanBMI ________ ______ _______ 'Female' false 21.672 'Female' true 21.669 'Male' false 26.578 'Male' true 26.458
Calculate the minimum, mean, and maximum heights for groups of patients and return results in a table.
Define a function in a file named
multiStats.m that accepts an input vector and returns the minimum, mean, and maximum values of the vector.
% Copyright 2015 The MathWorks, Inc. function [lo,avg,hi] = multiStats(x) lo = min(x); avg = mean(x); hi = max(x); end
Load patient data into a table.
load patients T = table(Gender,Height); summary(T)
Variables: Gender: 100x1 cell array of character vectors Height: 100x1 double Values: Min 60 Median 67 Max 72
Group patient heights by gender. Create a table that contains the outputs from
multiStats for each group.
[G,gender] = findgroups(T.Gender); [minHeight,meanHeight,maxHeight] = splitapply(@multiStats,T.Height,G); result = table(gender,minHeight,meanHeight,maxHeight)
result = 2x4 table gender minHeight meanHeight maxHeight ________ _________ __________ _________ 'Female' 60 65.151 70 'Male' 66 69.234 72
func— Function to apply to groups of data
Function to apply to groups of data, specified as a function handle.
func returns a nonscalar output argument,
then the argument must be oriented so that
concatenate the output arguments from successive calls to
For example, if the input data variables are column vectors, then
return either a scalar or a row vector as an output argument.
Y = splitapply(@sum,X,G) returns
the sums of the groups of data in
X— Data variable
Data variable, specified as a vector, matrix, or cell array.
The elements of
X belong to groups specified
by the corresponding elements of
X is a matrix,
each column or row as a separate data variable. The orientation of
splitapply treats the columns or rows of
G— Group numbers
Group numbers, specified as a vector of positive integers.
X is a vector or cell array,
G must be the same length as
X is a matrix, then the length
G must be equal to the number of columns or
X, depending on the orientation of
If the input argument is table
G must be a column vector. The length of
be equal to the number of rows of
T— Data variables
Data variables, specified as a table.
each table variable as a separate data variable.
The Split-Apply-Combine workflow
is common in data analysis. In this workflow, the analyst splits the
data into groups, applies a function to each group, and combines the
results. The diagram shows a typical example of the workflow and the
parts of the workflow implemented by
Usage notes and limitations:
The specified function must not rely on any state, such as
persistent variables or random number functions like
For more information, see Tall Arrays.