MATLAB Examples

Create a Dataset Array from a Numeric Array

This example shows how to create a dataset array from a numeric array existing in the MATLAB® workspace.

Contents

Load sample data.

load fisheriris

Two variables load into the workspace: meas, a 150-by-4 numeric array, and species, a 150-by-1 cell array of species labels.

Create a dataset array.

Use mat2dataset to convert the numeric array, meas, into a dataset array.

ds = mat2dataset(meas);
ds(1:10,:)
ans = 

    meas1    meas2    meas3    meas4
    5.1      3.5      1.4      0.2  
    4.9        3      1.4      0.2  
    4.7      3.2      1.3      0.2  
    4.6      3.1      1.5      0.2  
      5      3.6      1.4      0.2  
    5.4      3.9      1.7      0.4  
    4.6      3.4      1.4      0.3  
      5      3.4      1.5      0.2  
    4.4      2.9      1.4      0.2  
    4.9      3.1      1.5      0.1  

The array, meas, has four columns, so the dataset array, ds, has four variables. The default variable names are the array name, meas, with column numbers appended.

You can specify your own variable or observation names using the name-value pair arguments VarNames and ObsNames, respectively.

If you use dataset to convert a numeric array to a dataset array, by default, the resulting dataset array has one variable that is an array instead of separate variables for each column.

Examine the dataset array.

Return the size of the dataset array, ds.

size(ds)
ans =

   150     4

The dataset array, ds, is the same size as the numeric array, meas. Variable names and observation names do not factor into the size of a dataset array.

Explore dataset array metadata.

Return the metadata properties of the dataset array, ds.

ds.Properties
ans = 

  struct with fields:

       Description: ''
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {'meas1'  'meas2'  'meas3'  'meas4'}

You can also access the properties individually. For example, you can retrieve the variable names using ds.Properties.VarNames.

Access data in a dataset array variable.

You can use variable names with dot indexing to access the data in a dataset array. For example, find the minimum value in the first variable, meas1.

min(ds.meas1)
ans =

    4.3000

Change variable names.

The four variables in ds are actually measurements of sepal length, sepal width, petal length, and petal width. Modify the variable names to be more descriptive.

ds.Properties.VarNames = {'SLength','SWidth','PLength','PWidth'};

Add description.

you can add a description for the dataset array.

ds.Properties.Description = 'Fisher iris data';
ds.Properties
ans = 

  struct with fields:

       Description: 'Fisher iris data'
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {'SLength'  'SWidth'  'PLength'  'PWidth'}

The dataset array properties are updated with the new variable names and description.

Add a variable to the dataset array.

The variable species is a cell array of species labels. Add species to the dataset array, ds, as a nominal array named Species. Display the first five observations in the dataset array.

ds.Species = nominal(species);
ds(1:5,:)
ans = 

    SLength    SWidth    PLength    PWidth    Species
    5.1        3.5       1.4        0.2       setosa 
    4.9          3       1.4        0.2       setosa 
    4.7        3.2       1.3        0.2       setosa 
    4.6        3.1       1.5        0.2       setosa 
      5        3.6       1.4        0.2       setosa 

The dataset array, ds, now has the fifth variable, Species.