Note: The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.
|On this page…|
Statistics Toolbox™ has dataset arrays for storing variables with heterogeneous data types. For example, you can combine numeric data, logical data, cell arrays of strings, and categorical arrays in one dataset array variable.
Within a dataset array, each variable (column) must be one homogeneous data type, but the different variables can be of heterogeneous data types. A dataset array is usually interpreted as a set of variables measured on many units of observation. That is, each row in a dataset array corresponds to an observation, and each column to a variable. In this sense, a dataset array organizes data like a typical spreadsheet.
Dataset arrays are a unique data type, with a corresponding set of valid operations. Even if a dataset array contains only numeric variables, you cannot operate on the dataset array like a numeric variable. The valid operations for dataset arrays are the methods of the dataset class.
You can create a dataset array by combining variables that exist in the MATLAB workspace, or directly importing data from a file, such as a text file or spreadsheet. This table summarizes the functions you can use to create dataset arrays.
|Data Source||Conversion to Dataset Array|
|Data from a file||dataset|
|Heterogeneous collection of workspace variables||dataset|
You can export dataset arrays to text or spreadsheet files using export. To convert a dataset array to a cell array or structure array, use dataset2cell or dataset2struct. To convert a dataset array to a table, use dataset2table.
In addition to storing data in a dataset array, you can store metadata such as:
Variable and observation names
Units of measurement
This information is stored as dataset array properties. For a dataset array named ds, you can view the dataset array metadata by entering ds.Properties at the command line. You can access a specific property, such as variable names—property VarNames—using ds.Properties.VarNames. You can both retrieve and modify property values using this syntax.
Variable and observation names are included in the display of a dataset array. Variable names display across the top row, and observation names, if present, appear in the first column. Note that variable and observation names do not affect the size of a dataset array.