Skip to Main Content Skip to Search
Product Documentation

stack - Class: dataset

Stack data from multiple variables into single variable

Syntax

tall = stack(wide,datavars)
[tall,iwide] = stack(wide,datavars)
tall = stack(wide,datavars,Parameter,value)

Description

tall = stack(wide,datavars) converts a wide-format dataset array into a tall-format array, by stacking multiple variables in wide into a single variable in tall. In general, tall contains fewer variables but more observations than wide.

datavars specifies a group of m data variables in wide. stack creates a single data variable in tall by interleaving their values, and if wide has n observations, then tall has m-by-n observations. In other words, stack takes the m data values from each observation in wide and stacks them up to create m observations in tall. datavars is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector. stack also creates a grouping variable in tall to indicate which of the m data variables in wide each observation in tall corresponds to.

stack assigns values for the "per-variable properties (e.g., Units and VarDescription) for the new data variable in tall from the corresponding property values for the first variable listed in datavars.

stack copies the remaining variables from wide to tall without stacking, by replicating each of their values m times. These variables are typically grouping variables. Because their values are constant across each group of m observations in tall, they identify which observation in wide an observation in tall came from.

[tall,iwide] = stack(wide,datavars) returns an index vector iwide indicating the correspondence between observations in tall and those in wide. stack creates tall(j,:) using wide(iwide(j),datavarss).

For more information on grouping variables, see Grouping Variables.

Input Arguments

tall = stack(wide,datavars,Parameter,value) uses the following parameter name/value pairs to control how stack converts variables in wide to variables in tall:

'ConstVars'Variables in wide to copy to tall without stacking. ConstVars is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector. The default is all variables in wide not specified in datavars.
'NewDataVarName'A name for the data variable to be created in tall. The default is a concatenation of the names of the m variables that are stacked up.
'IndVarName'A name for the grouping variable to create in tall to indicate the source of each value in the new data variable. The default is based on the 'NewDataVarName' parameter.

You can also specify multiple groups of data variables in wide, each of which becomes a variable in tall. All groups must contain the same number of variables. Use a cell array to contain multiple parameter values for datavars, and a cell array of strings to contain multiple 'NewDataVarName'.

Examples

Convert a wide format data set to tall format, and then back to a different wide format:

load flu
 
% FLU has a 'Date' variable, and 10 variables for estimated
% influenza rates (in 9 different regions, estimated from
% Google searches, plus a nationwide extimate from the
% CDC). Combine those 10 variables into a "tall" array that
% has a single data variable, 'FluRate', and an indicator
% variable, 'Region', that says which region each estimate
% is from.
[flu2,iflu] = stack(flu, 2:11, 'NewDataVarName','FluRate', ...
    'IndVarName','Region')
 
% The second observation in FLU is for 10/16/2005.  Find the
% observations in FLU2 that correspond to that date.
flu(2,:)
flu2(iflu==2,:)
 
% Use the 'Date' variable from that tall array to split
% 'FluRate' into 52 separate variables, each containing the
% estimated influenza rates for each unique date.  The new
% "wide" array has one observation for each region.  In
% effect, this is the original array FLU "on its side".
dateNames = cellstr(datestr(flu.Date,'mmm_DD_YYYY'));
[flu3,iflu2] = unstack(flu2, 'FluRate', 'Date', ...
    'NewDataVarNames',dateNames)
 
% Since observations in FLU3 represent regions, IFLU2
% indicates the first occurrence in FLU2 of each region.
flu2(iflu2,:)

See Also

dataset.grpstats | dataset.join | dataset.unstack

How To

  


 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS