| Contents | Index |
Arrays for statistical data
Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing column-oriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc.
Dataset arrays can contain different kinds of variables, including numeric, logical, character, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable.
You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.
Use the dataset constructor to create a dataset array from variables in the MATLAB workspace. You can also create a dataset array by reading data from a text or spreadsheet file. You can access each variable in a dataset array much like fields in a structure, using dot subscripting. See the following section for a list of operations available for dataset arrays.
| dataset | Construct dataset array |
| cat | Concatenate dataset arrays |
| cellstr | Create cell array of strings from dataset array |
| dataset2cell | Convert dataset array to cell array |
| datasetfun | Apply function to dataset array variables |
| disp | Display dataset array |
| display | Display dataset array |
| double | Convert dataset variables to double array |
| end | Last index in indexing expression for dataset array |
| export | Write dataset array to file |
| get | Access dataset array properties |
| grpstats | Summary statistics by group for dataset arrays |
| horzcat | Horizontal concatenation for dataset arrays |
| isempty | True for empty dataset array |
| join | Merge observations |
| length | Length of dataset array |
| ndims | Number of dimensions of dataset array |
| numel | Number of elements in dataset array |
| replacedata | Replace dataset variables |
| set | Set and display properties |
| single | Convert dataset variables to single array |
| size | Size of dataset array |
| sortrows | Sort rows of dataset array |
| stack | Stack data from multiple variables into single variable |
| subsasgn | Subscripted assignment to dataset array |
| subsref | Subscripted reference for dataset array |
| summary | Print summary of dataset array |
| unique | Unique observations in dataset array |
| unstack | Unstack data from single variable into multiple variables |
| vertcat | Vertical concatenation for dataset arrays |
A dataset array D has properties that store metadata (information about your data). Access or assign to a property using P = D.Properties.PropName or D.Properties.PropName = P, where PropName is one of the following:
| Description | String describing data set |
| DimNames | Two-element cell array of strings giving names of dimensions of data set |
| ObsNames | Cell array of nonempty, distinct strings giving names of observations in data set |
| Units | Units of variables in data set |
| UserData | Variable containing additional information associated with data set |
| VarDescription | Cell array of strings giving descriptions of variables in data set |
| VarNames | Cell array giving names of variables in data set |
Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB Object-Oriented Programming documentation.
Load a dataset array from a .mat file and create some simple subsets:
load hospital
h1 = hospital(1:10,:)
h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'})
% Access and modify metadata
hospital.Properties.Description
hospital.Properties.VarNames{4} = 'Wgt'
% Create a new dataset variable from an existing one
hospital.AtRisk = hospital.Smoker | (hospital.Age > 40)
% Use individual variables to explore the data
boxplot(hospital.Age,hospital.Sex)
h3 = hospital(hospital.Age<30,...
{'LastName' 'Age' 'Sex' 'Smoker'})
% Sort the observations based on two variables
h4 = sortrows(hospital,{'Sex','Age'})genvarname | tdfread | textscan | xlsread
| © 1984-2012- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |