dataset class
(Not Recommended) Arrays for statistical data
The dataset
data type is not recommended. To work with heterogeneous data,
use the MATLAB®
table
data type instead. See MATLAB
table
documentation for more information.
Description
Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing column-oriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc.
Dataset arrays can contain different kinds of variables, including numeric, logical, character, string, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable.
You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.
Construction
Use the dataset
constructor to create a dataset array from variables
in the MATLAB workspace. You can also create a dataset array by reading data from a text or
spreadsheet file. You can access each variable in a dataset array much like fields in a
structure, using dot subscripting. See the following section for a list of operations
available for dataset arrays.
dataset | (Not Recommended) Construct dataset array |
Methods
cat | (Not Recommended) Concatenate dataset arrays |
cellstr | (Not Recommended) Create cell array of character vectors from dataset array |
dataset2cell | (Not Recommended) Convert dataset array to cell array |
dataset2struct | (Not Recommended) Convert dataset array to structure |
datasetfun | (Not Recommended) Apply function to dataset array variables |
disp | (Not Recommended) Display dataset array |
display | (Not Recommended) Display dataset array |
double | (Not Recommended) Convert dataset variables to double array |
end | (Not Recommended) Last index in indexing expression for dataset array |
export | (Not Recommended) Write dataset array to file |
get | (Not Recommended) Access dataset array properties |
horzcat | (Not Recommended) Horizontal concatenation for dataset arrays |
intersect | (Not Recommended) Set intersection for dataset array observations |
isempty | (Not Recommended) True for empty dataset array |
ismember | (Not Recommended) Dataset array elements that are members of set |
ismissing | (Not Recommended) Find dataset array elements with missing values |
join | (Not Recommended) Merge dataset array observations |
length | (Not Recommended) Length of dataset array |
ndims | (Not Recommended) Number of dimensions of dataset array |
numel | (Not Recommended) Number of elements in dataset array |
replaceWithMissing | (Not Recommended) Insert missing data indicators into a dataset array |
replacedata | (Not Recommended) Replace dataset variables |
set | (Not Recommended) Set and display dataset array properties |
setdiff | (Not Recommended) Set difference for dataset array observations |
setxor | (Not Recommended) Set exclusive or for dataset array observations |
single | (Not Recommended) Convert dataset variables to single array |
size | (Not Recommended) Size of dataset array |
sortrows | (Not Recommended) Sort rows of dataset array |
stack | (Not Recommended) Stack dataset array from multiple variables into single variable |
subsasgn | (Not Recommended) Subscripted assignment to dataset array |
subsref | (Not Recommended) Subscripted reference for dataset array |
summary | (Not Recommended) Print summary of dataset array |
union | (Not Recommended) Set union for dataset array observations |
unique | (Not Recommended) Unique observations in dataset array |
unstack | (Not Recommended) Unstack dataset array from single variable into multiple variables |
vertcat | (Not Recommended) Vertical concatenation for dataset arrays |
Properties
A dataset array D
has properties that store metadata (information
about your data). Access or assign to a property using P =
D.Properties.PropName
or D.Properties.PropName = P
, where
PropName
is one of the following:
|
|
|
A two-element cell array of character vectors giving the names of the two
dimensions of the dataset array. The default is |
|
A cell array of nonempty, distinct character vectors giving the names of the observations in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of observations. |
|
A cell array of character vectors giving the units of the variables in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of variables. Any individual character vector may be empty for a variable that does not have units defined. The default is an empty cell array. |
|
Any variable containing additional information to be associated with the dataset array. The default is an empty array. |
|
A cell array of character vectors giving the descriptions of the variables in the dataset array. This property may be empty, but if not empty, the number of character vectors must equal the number of variables. Any individual character vector may be empty for a variable that does not have a description defined. The default is an empty cell array. |
|
A cell array of nonempty, distinct character vectors giving the names of the variables in the dataset array. The number of character vectors must equal the number of variables. The default is the cell array of names for the variables used to create the data set. |
Copy Semantics
Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes in the MATLAB Object-Oriented Programming documentation.
Examples
Load a dataset array from a .mat file and create some simple subsets:
load hospital h1 = hospital(1:10,:) h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'}) % Access and modify metadata hospital.Properties.Description hospital.Properties.VarNames{4} = 'Wgt' % Create a new dataset variable from an existing one hospital.AtRisk = hospital.Smoker | (hospital.Age > 40) % Use individual variables to explore the data boxplot(hospital.Age,hospital.Sex) h3 = hospital(hospital.Age<30,... {'LastName' 'Age' 'Sex' 'Smoker'}) % Sort the observations based on two variables h4 = sortrows(hospital,{'Sex','Age'})