Main Content

**Class: **dataset

(Not Recommended) Merge dataset array observations

**The dataset data type is not recommended. To work with heterogeneous data,
use the MATLAB ^{®}
table data type instead. See MATLAB
table documentation for more information.**

`C = join(A,B)`

C = join(A,B,keys)

C = join(A,B,* param1*,

`val1`

`param2`

`val2`

[C,IB] = join(...)

C = join(A,B,'Type',TYPE,...)

C = join(A,B,'Type',TYPE,'MergeKeys',true,...)

[C,IA,IB] = join(A,B,'Type',TYPE,...)

`C = join(A,B)`

creates a dataset array
`C`

by merging observations from the two dataset arrays
`A`

and `B`

. `join`

performs
the merge by first finding *key variables*, that is, pairs of dataset
variables, one in `A`

and one in `B`

, that share the
same name. Each observation in `B`

must contain a unique combination of
values in the key variables, and must contain all combinations of values that are
present in the keys from `A`

. `join`

then uses these
key variables to define a many-to-one correspondence between observations in
`A`

and those in `B`

. `join`

uses this correspondence to replicate the observations in `B`

and
combine them with the observations in `A`

to create
`C`

.

`C = join(A,B,keys)`

performs the merge using
the variables specified by `keys`

as the key variables in both
`A`

and `B`

. `keys`

is a positive
integer, a vector of positive integers, a character vector, a string array, a cell array
of character vectors, or a logical vector.

`C`

contains one observation for each observation in
`A`

. Variables in `C`

include all of the variables
from `A`

, as well as one variable corresponding to each variable in
`B`

(except for the keys from `B`

). If
`A`

and `B`

contain variables with identical
names, `join`

adds the suffix `'_left'`

and
`'_right'`

to the corresponding variables in
`C`

.

`C = join(A,B,`

specifies optional parameter name/value pairs to control how the dataset variables in
* param1*,

`val1`

`param2`

`val2`

`A`

and `B`

are used in the merge. Parameters
are:`'Keys'`

— Specifies the variables to use as keys in both`A`

and`B`

.`'LeftKeys'`

— Specifies the variables to use as keys in`A`

.`'RightKeys'`

— Specifies the variables to use as keys in`B`

.

You may provide either the `'Keys'`

parameter, or both the
`'LeftKeys'`

and `'RightKeys'`

parameters. The
value for these parameters is a positive integer, a vector of positive integers, a
character vector, a string array, a cell array of character vectors, or a logical
vector. `'LeftKeys'`

or `'RightKeys'`

must both
specify the same number of key variables, and `join`

pairs the left and
right keys in the order specified.

`'LeftVars'`

— Specifies which variables from`A`

to include in`C`

. By default,`join`

includes all variables from`A`

.`'RightVars'`

— Specifies which variables from`B`

to include in`C`

. By default,`join`

includes all variables from`B`

except the key variables.

You can use `'LeftVars'`

or `'RightVars'`

to include
or exclude key variables as well as data variables. The value for these parameters is a
positive integer, a vector of positive integers, a character vector, a string array, a
cell array of character vectors, or a logical vector.

`[C,IB] = join(...)`

returns an index vector
`IB`

, where `join`

constructs `C`

by horizontally concatenating `A(:,LeftVars)`

and
`B(IB,RightVars)`

. `join`

can also perform more
complicated inner and outer join operations that allow a many-to-many correspondence
between `A`

and `B`

, and allow unmatched observations
in either `A`

or `B`

.

`C = join(A,B,'Type',TYPE,...)`

performs the
join operation specified by `TYPE`

. `TYPE`

is one of
`'inner'`

, `'leftouter'`

,
`'rightouter'`

, `'fullouter'`

, or
`'outer'`

(which is a synonym for `'fullouter'`

).
For an inner join, `C`

only contains observations corresponding to a
combination of key values that occurred in both `A`

and
`B`

. For a left (or right) outer join, `C`

also
contains observations corresponding to keys in `A`

(or
`B`

) that did not match any in `B`

(or
`A`

). Variables in `C`

taken from
`A`

(or `B`

) contain null values in those
observations. A full outer join is equivalent to a left and right outer join.
`C`

contains variables corresponding to the key variables from both
`A`

and `B`

, and `join`

sorts the
observations in `C`

by the key values.

For inner and outer joins, `C`

contains variables corresponding to
the key variables from both `A`

and `B`

by default, as
well as all the remaining variables. `join`

sorts the observations in
the result `C`

by the key values.

`C = join(A,B,'Type',TYPE,'MergeKeys',true,...)`

includes a single
variable in `C`

for each key variable pair from `A`

and `B`

, rather than including two separate variables. For outer joins,
`join`

creates the single variable by merging the key values from
`A`

and `B`

, taking values from
`A`

where a corresponding observation exists in
`A`

, and from `B`

otherwise. Setting the
`'MergeKeys'`

parameter to `true`

overrides
inclusion or exclusion of any key variables specified via the
`'LeftVars'`

or `'RightVars'`

parameter. Setting
the `'MergeKeys'`

parameter to `false`

is equivalent
to not passing in the `'MergeKeys'`

parameter.

`[C,IA,IB] = join(A,B,'Type',TYPE,...)`

returns
index vectors `IA`

and `IB`

indicating the
correspondence between observations in `C`

and those in
`A`

and `B`

. For an inner join,
`join`

constructs `C`

by horizontally
concatenating `A(IA,LeftVars)`

and `B(IB,RightVars)`

.
For an outer join, `IA`

or `IB`

may also contain
zeros, indicating the observations in `C`

that do not correspond to
observations in `A`

or `B`

, respectively.

Create a dataset array from Fisher's iris data:

load fisheriris NumObs = size(meas,1); NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d')); iris = dataset({nominal(species),'species'},... {meas,'SL','SW','PL','PW'},... 'ObsNames',NameObs);

Create a separate dataset array with the diploid chromosome counts for each species of iris:

snames = nominal({'setosa';'versicolor';'virginica'}); CC = dataset({snames,'species'},{[38;108;70],'cc'}) CC = species cc setosa 38 versicolor 108 virginica 70

Broadcast the data in `CC`

to the rows of `iris`

using the key variable `species`

in each dataset:

iris2 = join(iris,CC); iris2([1 2 51 52 101 102],:) ans = species SL SW PL PW cc Obs1 setosa 5.1 3.5 1.4 0.2 38 Obs2 setosa 4.9 3 1.4 0.2 38 Obs51 versicolor 7 3.2 4.7 1.4 108 Obs52 versicolor 6.4 3.2 4.5 1.5 108 Obs101 virginica 6.3 3.3 6 2.5 70 Obs102 virginica 5.8 2.7 5.1 1.9 70

Create two datasets and join them using the `'MergeKeys'`

flag:

% Create two data sets that both contain the key variable % 'Key1'. The two arrays contain observations with common % values of Key1, but each array also contains observations % with values of Key1 not present in the other. a = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',... 'VarNames',{'Key1' 'Var1'}) b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',... 'VarNames',{'Key1' 'Var2'}) % Combine a and b with an outer join, which matches up % observations with common key values, but also retains % observations whose key values don't have a match. % Keep the key values as separate variables in the result. couter = join(a,b,'key','Key1','Type','outer') % Join a and b, merging the key values as a single variable % in the result. coutermerge = join(a,b,'key','Key1','Type','outer',... 'MergeKeys',true) % Join a and b, retaining only observations whose key % values match. cinner = join(a,b,'key','Key1','Type','inner',... 'MergeKeys',true) a = Key1 Var1 'a' 1 'b' 2 'c' 3 'e' 11 'h' 17 b = Key1 Var2 'a' 4 'b' 5 'd' 6 'e' 7 couter = Key1_left Var1 Key1_right Var2 'a' 1 'a' 4 'b' 2 'b' 5 'c' 3 '' NaN '' NaN 'd' 6 'e' 11 'e' 7 'h' 17 '' NaN coutermerge = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'c' 3 NaN 'd' NaN 6 'e' 11 7 'h' 17 NaN cinner = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'e' 11 7