Quantcast

Documentation Center

  • Trial Software
  • Product Updates

unstack

Class: dataset

Unstack data from single variable into multiple variables

The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax

wide = unstack(tall,datavar,indvar)
[wide,itall] = unstack(tall,datavar,indvar)
wide = unstack(tall,datavar,indvar,'Parameter',value)

Description

wide = unstack(tall,datavar,indvar) converts a dataset array tall to an equivalent dataset array wide that is in wide format, by unstacking a single variable in tall into multiple variables in wide. In general wide contains more variables, but fewer observations, than tall.

datavar specifies the data variable in tall to unstack. indvar specifies an indicator variable in tall that determines which variable in wide each value in datavar is unstacked into. unstack treats the remaining variables in tall as grouping variables. Each unique combination of their values defines a group of observations in tall that will be unstacked into a single observation in wide.

unstack creates m data variables in wide, where m is the number of group levels in indvar. The values in indvar indicate which of those m variables receive which values from datavar. The j-th data variable in wide contains the values from datavar that correspond to observations whose indvar value was the j-th of the m possible levels. Elements of those m variables for which no corresponding data value in tall exists contain a default value.

datavar is a positive integer, a variable name, or a logical vector containing a single true value. indvar is a positive integer, a variable name, or a logical vector containing a single true value.

[wide,itall] = unstack(tall,datavar,indvar) returns an index vector itall indicating the correspondence between observations in wide and those in tall. For each observation in wide, itall contains the index of the first in the corresponding group of observations in tall.

For more information on grouping variables, see Grouping Variables.

Input Arguments

wide = unstack(tall,datavar,indvar,'Parameter',value) uses the following parameter name/value pairs to control how unstack converts variables in tall to variables in wide:

'GroupVars'Grouping variables in tall that define groups of observations. groupvars is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector. The default is all variables in tall not listed in datavar or indvar.
'NewDataVarNames'A cell array of strings containing names for the data variables unstack should create in wide. Default is the group names of the grouping variable specified in indvar.
'AggregationFun'A function handle that accepts a subset of values from datavar and returns a single value. stack applies this function to observations from the same group that have the same value of indvar. The function must aggregate the data values into a single value, and in such cases it is not possible to recover tall from wide using stack. The default is @sum for numeric data variables. For non-numeric variables, there is no default, and you must specify 'AggregationFun' if multiple observations in the same group have the same values of indvar.
'ConstVars'Variables in tall to copy to wide without unstacking. The values for these variables in wide are taken from the first observation in each group in tall, so these variables should typically be constant within each group. ConstVars is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector. The default is no variables.

You can also specify more than one data variable in tall, each of which becomes a set of m variables in wide. In this case, specify datavar as a vector of positive integers, a cell array containing variable names, or a logical vector. You may specify only one variable with indvar. The names of each set of data variables in wide are the name of the corresponding data variable in tall concatenated with the names specified in 'NewDataVarNames'. The function specified in 'AggregationFun' must return a value with a single row.

Examples

Convert a "wide format" data set to "tall format", and then back to a different "wide format":

load flu
 
% FLU has a 'Date' variable, and 10 variables for estimated
% influenza rates (in 9 different regions, estimated from
% Google searches, plus a nationwide extimate from the
% CDC). Combine those 10 variables into a "tall" array that
% has a single data variable, 'FluRate', and an indicator
% variable, 'Region', that says which region each estimate
% is from.
[flu2,iflu] = stack(flu, 2:11, 'NewDataVarName','FluRate', ...
    'IndVarName','Region')
 
% The second observation in FLU is for 10/16/2005.  Find the
% observations in FLU2 that correspond to that date.
flu(2,:)
flu2(iflu==2,:)
 
% Use the 'Date' variable from that tall array to split
% 'FluRate' into 52 separate variables, each containing the
% estimated influenza rates for each unique date.  The new
% "wide" array has one observation for each region.  In
% effect, this is the original array FLU "on its side".
dateNames = cellstr(datestr(flu.Date,'mmm_DD_YYYY'));
[flu3,iflu2] = unstack(flu2, 'FluRate', 'Date', ...
    'NewDataVarNames',dateNames)
 
% Since observations in FLU3 represent regions, IFLU2
% indicates the first occurrence in FLU2 of each region.
flu2(iflu2,:)

See Also

|

How To

Was this topic helpful?