Main Content

dataset class

(Not Recommended) Arrays for statistical data

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Description

Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing column-oriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc.

Dataset arrays can contain different kinds of variables, including numeric, logical, character, string, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable.

You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.

Construction

Use the dataset constructor to create a dataset array from variables in the MATLAB workspace. You can also create a dataset array by reading data from a text or spreadsheet file. You can access each variable in a dataset array much like fields in a structure, using dot subscripting. See the following section for a list of operations available for dataset arrays.

dataset(Not Recommended) Construct dataset array

Methods

cat(Not Recommended) Concatenate dataset arrays
cellstr(Not Recommended) Create cell array of character vectors from dataset array
dataset2cell(Not Recommended) Convert dataset array to cell array
dataset2struct(Not Recommended) Convert dataset array to structure
datasetfun(Not Recommended) Apply function to dataset array variables
disp(Not Recommended) Display dataset array
display(Not Recommended) Display dataset array
double(Not Recommended) Convert dataset variables to double array
end(Not Recommended) Last index in indexing expression for dataset array
export(Not Recommended) Write dataset array to file
get(Not Recommended) Access dataset array properties
horzcat(Not Recommended) Horizontal concatenation for dataset arrays
intersect(Not Recommended) Set intersection for dataset array observations
isempty(Not Recommended) True for empty dataset array
ismember(Not Recommended) Dataset array elements that are members of set
ismissing(Not Recommended) Find dataset array elements with missing values
join(Not Recommended) Merge dataset array observations
length(Not Recommended) Length of dataset array
ndims(Not Recommended) Number of dimensions of dataset array
numel(Not Recommended) Number of elements in dataset array
replaceWithMissing(Not Recommended) Insert missing data indicators into a dataset array
replacedata(Not Recommended) Replace dataset variables
set(Not Recommended) Set and display dataset array properties
setdiff(Not Recommended) Set difference for dataset array observations
setxor(Not Recommended) Set exclusive or for dataset array observations
single(Not Recommended) Convert dataset variables to single array
size(Not Recommended) Size of dataset array
sortrows(Not Recommended) Sort rows of dataset array
stack(Not Recommended) Stack dataset array from multiple variables into single variable
subsasgn(Not Recommended) Subscripted assignment to dataset array
subsref(Not Recommended) Subscripted reference for dataset array
summary(Not Recommended) Print summary of dataset array
union(Not Recommended) Set union for dataset array observations
unique(Not Recommended) Unique observations in dataset array
unstack(Not Recommended) Unstack dataset array from single variable into multiple variables
vertcat(Not Recommended) Vertical concatenation for dataset arrays

Properties

A dataset array D has properties that store metadata (information about your data). Access or assign to a property using P = D.Properties.PropName or D.Properties.PropName = P, where PropName is one of the following:

Description

Description is a character vector describing the dataset array. The default is an empty character vector.

DimNames

A two-element cell array of character vectors giving the names of the two dimensions of the dataset array. The default is {'Observations' 'Variables'}.

ObsNames

A cell array of nonempty, distinct character vectors giving the names of the observations in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of observations.

Units

A cell array of character vectors giving the units of the variables in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of variables. Any individual character vector may be empty for a variable that does not have units defined. The default is an empty cell array.

UserData

Any variable containing additional information to be associated with the dataset array. The default is an empty array.

VarDescription

A cell array of character vectors giving the descriptions of the variables in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of variables. Any individual character vector may be empty for a variable that does not have a description defined. The default is an empty cell array.

VarNames

A cell array of nonempty, distinct character vectors giving the names of the variables in the dataset array. The number of character vectors must equal the number of variables. The default is the cell array of names for the variables used to create the data set.

Copy Semantics

Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB Object-Oriented Programming documentation.

Examples

Load a dataset array from a .mat file and create some simple subsets:

load hospital
h1 = hospital(1:10,:)
h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'})

% Access and modify metadata
hospital.Properties.Description
hospital.Properties.VarNames{4} = 'Wgt'

% Create a new dataset variable from an existing one
hospital.AtRisk = hospital.Smoker | (hospital.Age > 40)

% Use individual variables to explore the data
boxplot(hospital.Age,hospital.Sex)
h3 = hospital(hospital.Age<30,...
   {'LastName' 'Age' 'Sex' 'Smoker'})

% Sort the observations based on two variables
h4 = sortrows(hospital,{'Sex','Age'})