| Products & Services | Industries | Academia | Support | User Community | Company |
| Download Product Updates | | | Get Pricing | | | Trial Software |
| Documentation → Statistics Toolbox |
| Contents | Index |
| Learn more about Statistics Toolbox |
Construct dataset array
A = dataset(VAR1,VAR2,...)
A = dataset(...,{VAR,name},...)
A = dataset(...,{VAR,name_1,...,name_m},...)
A = dataset(...,'VarNames',{name_1,...,name_m},...)
A = dataset(...,'ObsNames',{name_1,...,name_n},...)
A = dataset('File',filename,param1,val1,param2,val2,...)
A = dataset('XLSFile',filename,param1,val1,param2,val2,...)
A = dataset('XPTFile',xptfilename, ...)
A = dataset(VAR1,VAR2,...) creates dataset array A from workspace variables VAR1, VAR2, ... using the workspace variable names for the names of the variables in A. Variables can be arrays of any size, but all variables must be the same size along dimension 1 (rows).
A = dataset(...,{VAR,name},...) creates a variable in dataset A from the workspace variable VAR and assigns it the name name in A. Names must be valid, unique MATLAB identifier strings.
A = dataset(...,{VAR,name_1,...,name_m},...), where VAR is an array with size m along dimension 2 (columns), creates m variables in dataset A from the columns of the workspace variable VAR and assigns them the names name_1, ..., name_m in A.
A = dataset(...,'VarNames',{name_1,...,name_m},...) names the m variables in A with the specified variable names. Names must be valid, unique MATLAB identifier strings. The number of names must equal the number of variables in A. You cannot use the 'VarNames' parameter if you provide names for individual variables using {VAR,name} pairs.
A = dataset(...,'ObsNames',{name_1,...,name_n},...) names the n observations in A with the specified observation names. The names need not be valid MATLAB identifier strings, but must be unique. The number of names must equal the number of observations (rows) in A.
Note Dataset arrays may contain built-in types or array objects as variables. Array objects must implement each of the following:
|
A = dataset('File',filename,param1,val1,param2,val2,...) creates dataset array A from column-oriented data in the text file specified by the string filename. Variables in A are of type double if data in the corresponding column of the file, following the column header, are entirely numeric; otherwise the variables in A are cell arrays of strings. dataset converts empty fields to either NaN (for a numeric variable) or the empty string (for a string-valued variable). dataset ignores insignificant white space in the file.
The following optional parameter name/value pairs are available:
| 'Delimiter' | A string indicating the character separating columns in the file. Values are
|
| 'Format' | A format string, as accepted by textscan. dataset reads the file using textscan, and creates variables in A according to the conversion specifiers in the format string. You may also provide any parameter/value pairs accepted by textscan. Using the 'format' parameter is much faster for large files. |
| 'ReadVarNames' | A logical value indicating whether (true) or not (false) to read variable names from the first row of the file. The default is true. If 'ReadVarNames' is true, variable names in the column headers of the file cannot be empty. |
| 'ReadObsNames' | A logical value indicating whether (true) or not (false) to read observation names from the first column of the file. The default is false. If 'ReadObsNames' and 'ReadVarNames' are both true, dataset saves the header of the first column in the file as the name of the first dimension in A.Properties.DimNames. |
| 'TreatAsEmpty' | Specifies strings to treat as the empty string in a numeric column. Values may be a character string or a cell array of strings. The parameter applies only to numeric columns in the file; dataset does not accept numeric literals such as '-99'. |
A = dataset('XLSFile',filename,param1,val1,param2,val2,...) creates dataset array A from column-oriented data in the Excel® spreadsheet specified by the string filename. Variables in A are of type double if data in the corresponding column of the spreadsheet, following the column header, are entirely numeric; otherwise the variables in A are cell arrays of strings. Optional parameter name/value pairs are as follows:
| 'Sheet' | A positive scalar value of type double indicating the sheet number, or a quoted string indicating the sheet name. |
| 'Range' | A string of the form 'C1:C2' where C1 and C2 are the names of cells at opposing corners of a rectangular region to be read, as for xlsread. By default, the rectangular region extends to the right-most column containing data. If the spreadsheet contains empty columns between columns of data, or if the spreadsheet contains figures or other non-tabular information, specify a range that contains only data. |
| 'ReadVarNames' | A logical value indicating whether (true) or not (false) to read variable names from the first row of the range. The default is true. If 'ReadVarNames' is true, variable names in the column headers of the range cannot be empty. |
| 'ReadObsNames' | A logical value indicating whether (true) or not (false) to read observation names from the first column of the range. The default is false. If 'ReadObsNames' and 'ReadVarNames' are both true, the header of the first column in the range is saved as the name of the first dimension in A.Properties.DimNames. |
A = dataset('XPTFile',xptfilename, ...) creates a dataset array from a SAS XPORT format file. Variable names from the XPORT format file are preserved. Numeric data types in the XPORT format file are preserved but all other data types are converted to cell arrays of strings. The XPORT format allows for 28 missing data types. dataset represents these in the file by an upper case letter, '.' or '_'. dataset converts all missing data to NaN values in A. However, if you need the specific missing types you can use the xptread function to recover the information.
When reading from an XPT format file, the 'ReadObsNames' parameter name/value pair determines whether or not to try to use the first variable in the file as observation names. Specify as a logical value (default false). If the contents of the first variable are not valid observation names then the variable will be read into a variable of the dataset array and observation names will not be set.
Create a dataset array to contain Fisher's iris data:
load fisheriris
NumObs = size(meas,1);
NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d'));
iris = dataset({nominal(species),'species'},...
{meas,'SL','SW','PL','PW'},...
'ObsNames',NameObs);
iris(1:5,:)
ans =
species SL SW PL PW
Obs1 setosa 5.1 3.5 1.4 0.2
Obs2 setosa 4.9 3 1.4 0.2
Obs3 setosa 4.7 3.2 1.3 0.2
Obs4 setosa 4.6 3.1 1.5 0.2
Obs5 setosa 5 3.6 1.4 0.2Load patient data from the CSV file hospital.dat and store the information in a dataset array with observation names given by the first column in the data (patient identification):
patients = dataset('file','hospital.dat',...
'delimiter',',',...
'ReadObsNames',true);
Make the {0,1}-valued variable smoke nominal, and change the labels to 'No' and 'Yes':
patients.smoke = nominal(patients.smoke,{'No','Yes'});
Add new levels to smoke as placeholders for more detailed histories of smokers:
patients.smoke = addlevels(patients.smoke,...
{'0-5 Years','5-10 Years','LongTerm'});
Assuming the nonsmokers have never smoked, relabel the 'No' level:
patients.smoke = setlabels(patients.smoke,'Never','No');
Drop the undifferentiated 'Yes' level from smoke:
patients.smoke = droplevels(patients.smoke,'Yes'); Warning: OLDLEVELS contains categorical levels that were present in A, caused some array elements to have undefined levels.
Note that smokers now have an undefined level.
Set each smoker to one of the new levels, by observation name:
patients.smoke('YPL-320') = '5-10 Years';![]() | dataset class | datasetfun (dataset) | ![]() |

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.
| © 1984-2009- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |