dataset class

Arrays for statistical data

The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Description

Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing column-oriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc.

Dataset arrays can contain different kinds of variables, including numeric, logical, character, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable.

You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.

Construction

Use the dataset constructor to create a dataset array from variables in the MATLAB workspace. You can also create a dataset array by reading data from a text or spreadsheet file. You can access each variable in a dataset array much like fields in a structure, using dot subscripting. See the following section for a list of operations available for dataset arrays.

datasetConstruct dataset array

Methods

catConcatenate dataset arrays
cellstrCreate cell array of strings from dataset array
dataset2cellConvert dataset array to cell array
dataset2structConvert dataset array to structure
datasetfunApply function to dataset array variables
dispDisplay dataset array
displayDisplay dataset array
doubleConvert dataset variables to double array
endLast index in indexing expression for dataset array
exportWrite dataset array to file
getAccess dataset array properties
horzcatHorizontal concatenation for dataset arrays
intersectSet intersection for dataset array observations
isemptyTrue for empty dataset array
ismemberDataset array elements that are members of set
ismissingFind dataset array elements with missing values
joinMerge observations
lengthLength of dataset array
ndimsNumber of dimensions of dataset array
numelNumber of elements in dataset array
replacedataReplace dataset variables
replaceWithMissingInsert missing data indicators into a dataset array
setSet and display properties
setdiffSet difference for dataset array observations
setxorSet exclusive or for dataset array observations
singleConvert dataset variables to single array
sizeSize of dataset array
sortrowsSort rows of dataset array
stackStack data from multiple variables into single variable
subsasgnSubscripted assignment to dataset array
subsrefSubscripted reference for dataset array
summaryPrint summary of dataset array
unionSet union for dataset array observations
uniqueUnique observations in dataset array
unstackUnstack data from single variable into multiple variables
vertcatVertical concatenation for dataset arrays

Properties

A dataset array D has properties that store metadata (information about your data). Access or assign to a property using P = D.Properties.PropName or D.Properties.PropName = P, where PropName is one of the following:

DescriptionString describing data set
DimNamesTwo-element cell array of strings giving names of dimensions of data set
ObsNamesCell array of nonempty, distinct strings giving names of observations in data set
UnitsUnits of variables in data set
UserDataVariable containing additional information associated with data set
VarDescriptionCell array of strings giving descriptions of variables in data set
VarNamesCell array giving names of variables in data set

Copy Semantics

Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB Object-Oriented Programming documentation.

Examples

Load a dataset array from a .mat file and create some simple subsets:

load hospital
h1 = hospital(1:10,:)
h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'})

% Access and modify metadata
hospital.Properties.Description
hospital.Properties.VarNames{4} = 'Wgt'

% Create a new dataset variable from an existing one
hospital.AtRisk = hospital.Smoker | (hospital.Age > 40)

% Use individual variables to explore the data
boxplot(hospital.Age,hospital.Sex)
h3 = hospital(hospital.Age<30,...
   {'LastName' 'Age' 'Sex' 'Smoker'})

% Sort the observations based on two variables
h4 = sortrows(hospital,{'Sex','Age'})
Was this topic helpful?