This example shows how to merge dataset arrays
using join.
Import the data from the first worksheet in
hospitalSmall.xlsx into a dataset array, then keep only a
few of the variables.
ds1 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx')); ds1 = ds1(:,{'id','name','sex','age'})
ds1 =
id name sex age
'YPL-320' 'SMITH' 'm' 38
'GLI-532' 'JOHNSON' 'm' 43
'PNI-258' 'WILLIAMS' 'f' 38
'MIJ-579' 'JONES' 'f' 40
'XLK-030' 'BROWN' 'f' 49
'TFP-518' 'DAVIS' 'f' 46
'LPD-746' 'MILLER' 'f' 33
'ATA-945' 'WILSON' 'm' 40
'VNL-702' 'MOORE' 'm' 28
'LQW-768' 'TAYLOR' 'f' 31
'QFY-472' 'ANDERSON' 'f' 45
'UJG-627' 'THOMAS' 'f' 42
'XUE-826' 'JACKSON' 'm' 25
'TRW-072' 'WHITE' 'm' 39 The dataset array, ds1, has 14 observations (rows) and 4
variables (columns).
Import the data from the worksheet Heights2 in hospitalSmall.xlsx.
ds2 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet','Heights2')
ds2 =
id hgt
'LPD-746' 61
'PNI-258' 62
'XUE-826' 71
'ATA-945' 72
'XLK-030' 63 ds2 has height measurements for a subset
of five individuals from the first dataset array, ds1.
Use join to merge the two dataset arrays, ds1 and ds2,
keeping only the subset of observations that are in ds2.
JoinSmall = join(ds2,ds1)
JoinSmall =
id hgt name sex age
'LPD-746' 61 'MILLER' 'f' 33
'PNI-258' 62 'WILLIAMS' 'f' 38
'XUE-826' 71 'JACKSON' 'm' 25
'ATA-945' 72 'WILSON' 'm' 40
'XLK-030' 63 'BROWN' 'f' 49
In JoinSmall, the variable id only
appears once. This is because it is the key variable—the variable
that links observations between the two dataset arrays—and
has the same variable name in both ds1 and ds2.
Merge ds1 and ds2 keeping
all observations in the larger ds1.
joinAll = join(ds2,ds1,'type','rightouter','mergekeys',true)
joinAll =
id hgt name sex age
'ATA-945' 72 'WILSON' 'm' 40
'GLI-532' NaN 'JOHNSON' 'm' 43
'LPD-746' 61 'MILLER' 'f' 33
'LQW-768' NaN 'TAYLOR' 'f' 31
'MIJ-579' NaN 'JONES' 'f' 40
'PNI-258' 62 'WILLIAMS' 'f' 38
'QFY-472' NaN 'ANDERSON' 'f' 45
'TFP-518' NaN 'DAVIS' 'f' 46
'TRW-072' NaN 'WHITE' 'm' 39
'UJG-627' NaN 'THOMAS' 'f' 42
'VNL-702' NaN 'MOORE' 'm' 28
'XLK-030' 63 'BROWN' 'f' 49
'XUE-826' 71 'JACKSON' 'm' 25
'YPL-320' NaN 'SMITH' 'm' 38 ds1 without corresponding height
measurements in ds2 has height value NaN.
Also, because there is no id value in ds2 for
each observation in ds1, you need to merge the
keys using the option 'MergeKeys',true. This merges
the key variable, id.When using join, it is not necessary for
the key variable to have the same name in the dataset arrays to be
merged. Import the data from the worksheet named Heights3 in hospitalSmall.xlsx.
ds3 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet','Heights3')
ds3 =
identifier hgt
'GLI-532' 69
'QFY-472' 62
'MIJ-579' 61
'VNL-702' 68
'XLK-030' 63
'LPD-746' 61
'TFP-518' 60
'YPL-320' 71
'ATA-945' 72
'LQW-768' 64
'PNI-258' 62
'UJG-627' 61
'XUE-826' 71
'TRW-072' 69 ds3 has height measurements for each observation
in ds1. This dataset array has the same patient
identifiers as ds1, but they are under the variable
name identifier, instead of id (and
in a different order).
You can easily change the variable name of the key variable
in ds3 by setting d3.Properties.VarNames or
using the Variables editor, but it is not required to perform a merge.
Instead, you can specify the name of the key variable in each dataset
array using LeftKeys and RightKeys.
joinDiff = join(ds3,ds1,'LeftKeys','identifier','RightKeys','id')
joinDiff =
identifier hgt name sex age
'GLI-532' 69 'JOHNSON' 'm' 43
'QFY-472' 62 'ANDERSON' 'f' 45
'MIJ-579' 61 'JONES' 'f' 40
'VNL-702' 68 'MOORE' 'm' 28
'XLK-030' 63 'BROWN' 'f' 49
'LPD-746' 61 'MILLER' 'f' 33
'TFP-518' 60 'DAVIS' 'f' 46
'YPL-320' 71 'SMITH' 'm' 38
'ATA-945' 72 'WILSON' 'm' 40
'LQW-768' 64 'TAYLOR' 'f' 31
'PNI-258' 62 'WILLIAMS' 'f' 38
'UJG-627' 61 'THOMAS' 'f' 42
'XUE-826' 71 'JACKSON' 'm' 25
'TRW-072' 69 'WHITE' 'm' 39 The merged dataset array, joinDiff, has the
same key variable order and name as the first dataset array input
to join, ds3.