dataset class

(Not Recommended) Arrays for statistical data

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Description

Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing column-oriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc.

Dataset arrays can contain different kinds of variables, including numeric, logical, character, string, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable.

You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.

Construction

Use the dataset constructor to create a dataset array from variables in the MATLAB workspace. You can also create a dataset array by reading data from a text or spreadsheet file. You can access each variable in a dataset array much like fields in a structure, using dot subscripting. See the following section for a list of operations available for dataset arrays.

dataset(Not Recommended) Construct dataset array

Methods

cat(Not Recommended) Concatenate dataset arrays
cellstr(Not Recommended) Create cell array of character vectors from dataset array
dataset2cell(Not Recommended) Convert dataset array to cell array
dataset2struct(Not Recommended) Convert dataset array to structure
datasetfun(Not Recommended) Apply function to dataset array variables
disp(Not Recommended) Display dataset array
display(Not Recommended) Display dataset array
double(Not Recommended) Convert dataset variables to double array
end(Not Recommended) Last index in indexing expression for dataset array
export(Not Recommended) Write dataset array to file
get(Not Recommended) Access dataset array properties
horzcat(Not Recommended) Horizontal concatenation for dataset arrays
intersect(Not Recommended) Set intersection for dataset array observations
isempty(Not Recommended) True for empty dataset array
ismember(Not Recommended) Dataset array elements that are members of set
ismissing(Not Recommended) Find dataset array elements with missing values
join(Not Recommended) Merge observations
length(Not Recommended) Length of dataset array
ndims(Not Recommended) Number of dimensions of dataset array
numel(Not Recommended) Number of elements in dataset array
replaceWithMissing(Not Recommended) Insert missing data indicators into a dataset array
replacedata(Not Recommended) Replace dataset variables
set(Not Recommended) Set and display properties
setdiff(Not Recommended) Set difference for dataset array observations
setxor(Not Recommended) Set exclusive or for dataset array observations
single(Not Recommended) Convert dataset variables to single array
size(Not Recommended) Size of dataset array
sortrows(Not Recommended) Sort rows of dataset array
stack(Not Recommended) Stack data from multiple variables into single variable
subsasgn(Not Recommended) Subscripted assignment to dataset array
subsref(Not Recommended) Subscripted reference for dataset array
summary(Not Recommended) Print summary of dataset array
union(Not Recommended) Set union for dataset array observations
unique(Not Recommended) Unique observations in dataset array
unstack(Not Recommended) Unstack data from single variable into multiple variables
vertcat(Not Recommended) Vertical concatenation for dataset arrays

Properties

A dataset array D has properties that store metadata (information about your data). Access or assign to a property using P = D.Properties.PropName or D.Properties.PropName = P, where PropName is one of the following:

Description(Not Recommended) Character vector describing data set
DimNames(Not Recommended) Two-element cell array of character vectors giving names of dimensions of data set
ObsNames(Not Recommended) Cell array of nonempty, distinct character vectors giving names of observations in data set
Units(Not Recommended) Units of variables in data set
UserData(Not Recommended) Variable containing additional information associated with data set
VarDescription(Not Recommended) Cell array of character vectors giving descriptions of variables in data set
VarNames(Not Recommended) Cell array giving names of variables in data set

Copy Semantics

Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes (MATLAB) in the MATLAB Object-Oriented Programming documentation.

Examples

Load a dataset array from a .mat file and create some simple subsets:

load hospital
h1 = hospital(1:10,:)
h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'})

% Access and modify metadata
hospital.Properties.Description
hospital.Properties.VarNames{4} = 'Wgt'

% Create a new dataset variable from an existing one
hospital.AtRisk = hospital.Smoker | (hospital.Age > 40)

% Use individual variables to explore the data
boxplot(hospital.Age,hospital.Sex)
h3 = hospital(hospital.Age<30,...
   {'LastName' 'Age' 'Sex' 'Smoker'})

% Sort the observations based on two variables
h4 = sortrows(hospital,{'Sex','Age'})