Note
The dataset data type is not recommended. To
work with heterogeneous data, use the MATLAB®
table data type instead. See MATLAB
table documentation for more information.
This example shows how to create a dataset array from the contents of a tab-delimited text file.
Create a dataset array using default settings.
Import the text file hospitalSmall.txt as a dataset array using the default
settings.
ds = dataset('File',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.txt'))
ds =
name sex age wgt smoke
'SMITH' 'm' 38 176 1
'JOHNSON' 'm' 43 163 0
'WILLIAMS' 'f' 38 131 0
'JONES' 'f' 40 133 0
'BROWN' 'f' 49 119 0
'DAVIS' 'f' 46 142 0
'MILLER' 'f' 33 142 1
'WILSON' 'm' 40 180 0
'MOORE' 'm' 28 183 0
'TAYLOR' 'f' 31 132 0
'ANDERSON' 'f' 45 128 0
'THOMAS' 'f' 42 137 0
'JACKSON' 'm' 25 174 0
'WHITE' 'm' 39 202 1 By default, dataset uses the first row of
the text file for variable names. If the first row does not contain
variable names, you can specify the optional name-value pair argument 'ReadVarNames',false to
change the default behavior.
The dataset array contains heterogeneous variables. The variables id, name,
and sex are cell arrays of character vectors, and
the other variables are numeric.
Summarize the dataset array.
You can see the data type and other descriptive statistics for
each variable by using summary to summarize the
dataset array.
summary(ds)
name: [14x1 cell array of character vectors]
sex: [14x1 cell array of character vectors]
age: [14x1 double]
min 1st quartile median 3rd quartile max
25 33 39.5 43 49
wgt: [14x1 double]
min 1st quartile median 3rd quartile max
119 132 142 176 202
smoke: [14x1 double]
min 1st quartile median 3rd quartile max
0 0 0 0 1 Import observation names.
Import the text file again, this time specifying that the first column contains observation names.
ds = dataset('File',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.txt'),'ReadObsNames',true)
ds =
sex age wgt smoke
SMITH 'm' 38 176 1
JOHNSON 'm' 43 163 0
WILLIAMS 'f' 38 131 0
JONES 'f' 40 133 0
BROWN 'f' 49 119 0
DAVIS 'f' 46 142 0
MILLER 'f' 33 142 1
WILSON 'm' 40 180 0
MOORE 'm' 28 183 0
TAYLOR 'f' 31 132 0
ANDERSON 'f' 45 128 0
THOMAS 'f' 42 137 0
JACKSON 'm' 25 174 0
WHITE 'm' 39 202 1 The elements of the first column in the text file, last names,
are now observation names. Observation names and row names are dataset
array properties. You can always add or change the observation names
of an existing dataset array by modifying the property ObsNames.
Change dataset array properties.
By default, the DimNames property of the
dataset array has name as the descriptor of the
observation (row) dimension. dataset got this name
from the first row of the first column in the text file.
Change the first element of DimNames to LastName.
ds.Properties.DimNames{1} = 'LastName';
ds.Propertiesans =
Description: ''
VarDescription: {}
Units: {}
DimNames: {'LastName' 'Variables'}
UserData: []
ObsNames: {14x1 cell}
VarNames: {'sex' 'age' 'wgt' 'smoke'}Index into dataset array.
You can use observation names to index into a dataset array.
For example, return the data for the patient with last name BROWN.
ds('BROWN',:)ans =
sex age wgt smoke
BROWN 'f' 49 119 0 Note that observation names must be unique.
This example shows how to create a dataset array from the contents of a comma-separated text file.
Create a dataset array.
Import the file hospitalSmall.csv as a dataset array,
specifying the comma-delimited format.
ds = dataset('File',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.csv'),'Delimiter',',')
ds =
id name sex age wgt smoke
'YPL-320' 'SMITH' 'm' 38 176 1
'GLI-532' 'JOHNSON' 'm' 43 163 0
'PNI-258' 'WILLIAMS' 'f' 38 131 0
'MIJ-579' 'JONES' 'f' 40 133 0
'XLK-030' 'BROWN' 'f' 49 119 0
'TFP-518' 'DAVIS' 'f' 46 142 0
'LPD-746' 'MILLER' 'f' 33 142 1
'ATA-945' 'WILSON' 'm' 40 180 0
'VNL-702' 'MOORE' 'm' 28 183 0
'LQW-768' 'TAYLOR' 'f' 31 132 0
'QFY-472' 'ANDERSON' 'f' 45 128 0
'UJG-627' 'THOMAS' 'f' 42 137 0
'XUE-826' 'JACKSON' 'm' 25 174 0
'TRW-072' 'WHITE' 'm' 39 202 1 By default, dataset uses the first row in the text file
as variable names.
Add observation names.
Use the unique identifiers in the variable id as
observation names. Then, delete the variable id from
the dataset array.
ds.Properties.ObsNames = ds.id; ds.id = []
ds =
name sex age wgt smoke
YPL-320 'SMITH' 'm' 38 176 1
GLI-532 'JOHNSON' 'm' 43 163 0
PNI-258 'WILLIAMS' 'f' 38 131 0
MIJ-579 'JONES' 'f' 40 133 0
XLK-030 'BROWN' 'f' 49 119 0
TFP-518 'DAVIS' 'f' 46 142 0
LPD-746 'MILLER' 'f' 33 142 1
ATA-945 'WILSON' 'm' 40 180 0
VNL-702 'MOORE' 'm' 28 183 0
LQW-768 'TAYLOR' 'f' 31 132 0
QFY-472 'ANDERSON' 'f' 45 128 0
UJG-627 'THOMAS' 'f' 42 137 0
XUE-826 'JACKSON' 'm' 25 174 0
TRW-072 'WHITE' 'm' 39 202 1 Delete observations.
Delete any patients with the last name BROWN.
You can use strcmp to match 'BROWN' with
the elements of the variable containing last names, name.
toDelete = strcmp(ds.name,'BROWN');
ds(toDelete,:) = []ds =
name sex age wgt smoke
YPL-320 'SMITH' 'm' 38 176 1
GLI-532 'JOHNSON' 'm' 43 163 0
PNI-258 'WILLIAMS' 'f' 38 131 0
MIJ-579 'JONES' 'f' 40 133 0
TFP-518 'DAVIS' 'f' 46 142 0
LPD-746 'MILLER' 'f' 33 142 1
ATA-945 'WILSON' 'm' 40 180 0
VNL-702 'MOORE' 'm' 28 183 0
LQW-768 'TAYLOR' 'f' 31 132 0
QFY-472 'ANDERSON' 'f' 45 128 0
UJG-627 'THOMAS' 'f' 42 137 0
XUE-826 'JACKSON' 'm' 25 174 0
TRW-072 'WHITE' 'm' 39 202 1 One patient having last name BROWN is deleted
from the dataset array.
Return size of dataset array.
The array now has 13 observations.
size(ds)
ans =
13 5Note that the row and column corresponding to variable and observation
names, respectively, are not included in the size of a dataset array.
This example shows how to create a dataset array from the contents of an Excel® spreadsheet file.
Create a dataset array.
Import the data from the first worksheet in the file
hospitalSmall.xlsx, specifying that the data file is
an Excel spreadsheet.
ds = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'))
ds =
id name sex age wgt smoke
'YPL-320' 'SMITH' 'm' 38 176 1
'GLI-532' 'JOHNSON' 'm' 43 163 0
'PNI-258' 'WILLIAMS' 'f' 38 131 0
'MIJ-579' 'JONES' 'f' 40 133 0
'XLK-030' 'BROWN' 'f' 49 119 0
'TFP-518' 'DAVIS' 'f' 46 142 0
'LPD-746' 'MILLER' 'f' 33 142 1
'ATA-945' 'WILSON' 'm' 40 180 0
'VNL-702' 'MOORE' 'm' 28 183 0
'LQW-768' 'TAYLOR' 'f' 31 132 0
'QFY-472' 'ANDERSON' 'f' 45 128 0
'UJG-627' 'THOMAS' 'f' 42 137 0
'XUE-826' 'JACKSON' 'm' 25 174 0
'TRW-072' 'WHITE' 'm' 39 202 1 By default, dataset creates variable names using the
contents of the first row in the spreadsheet.
Specify which worksheet to import.
Import the data from the second worksheet into a new dataset array.
ds2 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet',2)
ds2 =
id name sex age wgt smoke
'TRW-072' 'WHITE' 'm' 39 202 1
'ELG-976' 'HARRIS' 'f' 36 129 0
'KOQ-996' 'MARTIN' 'm' 48 181 1
'YUZ-646' 'THOMPSON' 'm' 32 191 1
'XBR-291' 'GARCIA' 'f' 27 131 1
'KPW-846' 'MARTINEZ' 'm' 37 179 0
'XBA-581' 'ROBINSON' 'm' 50 172 0
'BKD-785' 'CLARK' 'f' 48 133 0