Class: dataset
(Not Recommended) Construct dataset array
The dataset data type is not recommended. To work with heterogeneous data,
use the MATLAB®
table data type instead. See MATLAB
table documentation for more information.
A = dataset(varspec,'ParamName',Value)
A = dataset('File',filename,'ParamName',Value)
A = dataset('XLSFile',filename,'ParamName',Value)
A = dataset('XPTFile',xptfilename,'ParamName',Value)
A = dataset(
creates dataset array varspec,'ParamName',Value)A using the workspace variable input method
varspec and one or more optional name/value pairs (see
Parameter Name/Value Pairs).
The input method varspec can be one or more of the following:
VAR — a workspace variable.
dataset uses the workspace name for the variable name in
A. To include multiple variables, specify
VAR_1,VAR_2,...,VAR_N.
Variables can be arrays of any size, but all variables must have the same number
of rows. VAR can also be an expression. In this case,
dataset creates a default name automatically.
{VAR,name} — a
workspace variable, VAR and a variable name,
name . dataset uses
name as the variable name. To include multiple
variables and names, specify
{VAR_1,name_1},
{VAR_2,name_2},...,
{VAR_N,name_N}.
{VAR,name_1,...,name_m}
— an m-columned workspace variable,
VAR. dataset uses the names
name_1, ...,
name_m as variable names. You must include a name
for every column in VAR. Each column becomes a separate
variable in A.
You can combine these input methods to include as many variables and names as needed. Names must be valid, unique MATLAB identifiers. For example input combinations, see Examples. For optional name/value pairs see Inputs.
To convert numeric arrays, cell arrays, structure arrays, or tables to dataset arrays, you can also use (respectively):
Note
Dataset arrays may contain built-in types or array objects as variables. Array objects must implement each of the following:
Standard MATLAB parenthesis indexing of the form var(i,...),
where i is a numeric or logical vector corresponding to rows of
the variable
A size method with a dim
argument
A vertcat method
A = dataset('File',
creates dataset array filename,'ParamName',Value)A from column-oriented data in the text file
specified by filename. Variables in A are of type
double if data in the corresponding column of the file, following the
column header, are entirely numeric; otherwise the variables in A are
cell arrays of character vectors. dataset converts empty fields to
either NaN (for a numeric variable) or the empty
character vector (for a character-valued variable). dataset ignores
insignificant white space in the file. You cannot specify both a file and workspace
variables as input. See Name/Value Pairs for more information.
A = dataset('XLSFile',
creates dataset array filename,'ParamName',Value)A from column-oriented data in the Excel® spreadsheet specified by filename. Variables in
A are of type double if data in the corresponding
column of the spreadsheet, following the column header, are entirely numeric; otherwise the
variables in A are cell arrays of character vectors. See Name/Value
Pairs for more information.
A = dataset('XPTFile',
creates a dataset array from a SAS® XPORT format file. Variable names from the XPORT format file are preserved.
Numeric data types in the XPORT format file are preserved but all other data types are
converted to cell arrays of character vectors. The XPORT format allows for 28 missing data
types. xptfilename,'ParamName',Value)dataset represents these in the file by an upper case letter,
'.' or '_'. dataset converts
all missing data to NaN values in A. See Name/Value
Pairs for more information.
Specify one or more of the following name/value pairs when constructing a dataset:
|
A string array or cell array |
|
A string array or cell array |
Name/value pairs available when using text files as inputs:
|
A character vector or string scalar indicating the character separating columns in the file. Values are
|
|
A format character vector or string scalar, as accepted by |
|
Numeric value indicating the number of lines to skip at the beginning of a file. Default: |
|
Specifies characters to treat as the empty character vector in a numeric
column. Values may be a character array, a string array, or a cell array of
character vectors. The parameter applies only to numeric columns in the file;
|
Name/value pairs available when using text files or Excel spreadsheets as inputs:
|
A logical value indicating whether ( |
|
A logical value indicating whether ( When reading from an |
Name/value pairs available when using Excel spreadsheets as input:
|
A positive scalar value of type |
|
A character vector or string scalar of the form |
Create a dataset array from workspace variables, including observation names:
load cereal cereal = dataset(Calories,Protein,Fat,Sodium,Fiber,Carbo,... Sugars,'ObsNames',Name) cereal.Properties.VarDescription = Variables(4:10,2);
Create a dataset array from a single, multi-columned workspace variable, designating variable names for each column:
load cities
categories = cellstr(categories);
cities = dataset({ratings,categories{:}},...
'ObsNames',cellstr(names))Load data from a text or spreadsheet file
patients = dataset('File','hospital.dat',...
'Delimiter',',','ReadObsNames',true)
patients2 = dataset('XLSFile','hospital.xls',...
'ReadObsNames',true)Load patient data from the CSV file hospital.dat and store
the information in a dataset array with observation names given
by the first column in the data (patient identification):
patients = dataset('file','hospital.dat', ...
'format','%s%s%s%f%f%f%f%f%f%f%f%f', ...
'Delimiter',',','ReadObsNames',true);
You can also load the data without specifying a format.
dataset will automatically create dataset
variables that are either double arrays or cell arrays of
character vectors, depending on the contents of the
file:
patients = dataset('file','hospital.dat',...
'delimiter',',',...
'ReadObsNames',true);Make the {0,1}-valued variable smoke nominal, and change the
labels to 'No' and 'Yes':
patients.smoke = nominal(patients.smoke,{'No','Yes'});
Add new levels to smoke as placeholders for more detailed
histories of smokers:
patients.smoke = addlevels(patients.smoke,...
{'0-5 Years','5-10 Years','LongTerm'});
Assuming the nonsmokers have never smoked, relabel the 'No'
level:
patients.smoke = setlabels(patients.smoke,'Never','No');
Drop the undifferentiated 'Yes' level from
smoke:
patients.smoke = droplevels(patients.smoke,'Yes'); Warning: OLDLEVELS contains categorical levels that were present in A, caused some array elements to have undefined levels.
Note that smokers now have an undefined level.
Set each smoker to one of the new levels, by observation name:
patients.smoke('YPL-320') = '5-10 Years';cell2dataset | mat2dataset | struct2dataset | tdfread | textscan | xlsread