**The dataset data type might be removed in
a future release. To work with heterogeneous data, use the MATLAB ^{®} table data
type instead. See MATLAB table documentation
for more information.**

`C = join(A,B)`

C = join(A,B,keys)

C = join(A,B,* param1*,

`val1`

`param2`

`val2`

[C,IB] = join(...)

C = join(A,B,'Type',TYPE,...)

C = join(A,B,'Type',TYPE,'MergeKeys',true,...)

[C,IA,IB] = join(A,B,'Type',TYPE,...)

`C = join(A,B)`

creates a
dataset array `C`

by merging observations from the
two dataset arrays `A`

and `B`

. `join`

performs
the merge by first finding *key variables*, that
is, pairs of dataset variables, one in `A`

and one
in `B`

, that share the same name. Each observation
in `B`

must contain a unique combination of values
in the key variables, and must contain all combinations of values
that are present in the keys from `A`

. `join`

then
uses these key variables to define a many-to-one correspondence between
observations in `A`

and those in `B`

. `join`

uses
this correspondence to replicate the observations in `B`

and
combine them with the observations in `A`

to create `C`

.

`C = join(A,B,keys)`

performs
the merge using the variables specified by `keys`

as
the key variables in both `A`

and `B`

. `keys`

is
a positive integer, a vector of positive integers, a variable name,
a cell array of variable names, or a logical vector.

`C`

contains one observation for each observation
in `A`

. Variables in `C`

include
all of the variables from `A`

, as well as one variable
corresponding to each variable in `B`

(except for
the keys from `B`

). If `A`

and `B`

contain
variables with identical names, `join`

adds the suffix `'_left'`

and `'_right'`

to
the corresponding variables in `C`

.

`C = join(A,B,`

specifies
optional parameter name/value pairs to control how the dataset variables
in * param1*,

`val1`

`param2`

`val2`

`A`

and `B`

are used in the merge.
Parameters are:`'Keys'`

— Specifies the variables to use as keys in both`A`

and`B`

.`'LeftKeys'`

— Specifies the variables to use as keys in`A`

.`'RightKeys'`

— Specifies the variables to use as keys in`B`

.

You may provide either the `'Keys'`

parameter,
or both the `'LeftKeys'`

and `'RightKeys'`

parameters.
The value for these parameters is a positive integer, a vector of
positive integers, a variable name, a cell array containing variable
names, or a logical vector. `'LeftKeys'`

or `'RightKeys'`

must
both specify the same number of key variables, and `join`

pairs
the left and right keys in the order specified.

`'LeftVars'`

— Specifies which variables from`A`

to include in`C`

. By default,`join`

includes all variables from`A`

.`'RightVars'`

— Specifies which variables from`B`

to include in`C`

. By default,`join`

includes all variables from`B`

except the key variables.

You can use `'LeftVars'`

or `'RightVars'`

to
include or exclude key variables as well as data variables. The value
for these parameters is a positive integer, a vector of positive integers,
a variable name, a cell array containing one or more variable names,
or a logical vector.

`[C,IB] = join(...)`

returns
an index vector `IB`

, where `join`

constructs `C`

by
horizontally concatenating `A(:,LeftVars)`

and `B(IB,RightVars)`

. `join`

can
also perform more complicated inner and outer join operations that
allow a many-to-many correspondence between `A`

and `B`

,
and allow unmatched observations in either `A`

or `B`

.

`C = join(A,B,'Type',TYPE,...)`

performs
the join operation specified by `TYPE`

. `TYPE`

is
one of `'inner'`

, `'leftouter'`

, `'rightouter'`

, `'fullouter'`

,
or `'outer'`

(which is a synonym for `'fullouter'`

).
For an inner join, `C`

only contains observations
corresponding to a combination of key values that occurred in both `A`

and `B`

.
For a left (or right) outer join, `C`

also contains
observations corresponding to keys in `A`

(or `B`

)
that did not match any in `B`

(or `A`

).
Variables in `C`

taken from `A`

(or `B`

)
contain null values in those observations. A full outer join is equivalent
to a left and right outer join. `C`

contains variables
corresponding to the key variables from both `A`

and `B`

,
and `join`

sorts the observations in `C`

by
the key values.

For inner and outer joins, `C`

contains variables
corresponding to the key variables from both `A`

and `B`

by
default, as well as all the remaining variables. `join`

sorts
the observations in the result `C`

by the key values.

`C = join(A,B,'Type',TYPE,'MergeKeys',true,...)`

includes
a single variable in `C`

for each key variable pair
from `A`

and `B`

, rather than including
two separate variables. For outer joins, `join`

creates
the single variable by merging the key values from `A`

and `B`

,
taking values from `A`

where a corresponding observation
exists in `A`

, and from `B`

otherwise.
Setting the `'MergeKeys'`

parameter to `true`

overrides
inclusion or exclusion of any key variables specified via the `'LeftVars'`

or `'RightVars'`

parameter.
Setting the `'MergeKeys'`

parameter to `false`

is
equivalent to not passing in the `'MergeKeys'`

parameter.

`[C,IA,IB] = join(A,B,'Type',TYPE,...)`

returns
index vectors `IA`

and `IB`

indicating
the correspondence between observations in `C`

and
those in `A`

and `B`

. For an inner
join, `join`

constructs `C`

by horizontally
concatenating `A(IA,LeftVars)`

and `B(IB,RightVars)`

.
For an outer join, `IA`

or `IB`

may
also contain zeros, indicating the observations in `C`

that
do not correspond to observations in `A`

or `B`

,
respectively.

Create a dataset array from Fisher's iris data:

load fisheriris NumObs = size(meas,1); NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d')); iris = dataset({nominal(species),'species'},... {meas,'SL','SW','PL','PW'},... 'ObsNames',NameObs);

Create a separate dataset array with the diploid chromosome counts for each species of iris:

snames = nominal({'setosa';'versicolor';'virginica'}); CC = dataset({snames,'species'},{[38;108;70],'cc'}) CC = species cc setosa 38 versicolor 108 virginica 70

Broadcast the data in `CC`

to the rows of `iris`

using
the key variable `species`

in each dataset:

iris2 = join(iris,CC); iris2([1 2 51 52 101 102],:) ans = species SL SW PL PW cc Obs1 setosa 5.1 3.5 1.4 0.2 38 Obs2 setosa 4.9 3 1.4 0.2 38 Obs51 versicolor 7 3.2 4.7 1.4 108 Obs52 versicolor 6.4 3.2 4.5 1.5 108 Obs101 virginica 6.3 3.3 6 2.5 70 Obs102 virginica 5.8 2.7 5.1 1.9 70

Create two datasets and join them using the `'MergeKeys'`

flag:

% Create two data sets that both contain the key variable % 'Key1'. The two arrays contain observations with common % values of Key1, but each array also contains observations % with values of Key1 not present in the other. a = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',... 'VarNames',{'Key1' 'Var1'}) b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',... 'VarNames',{'Key1' 'Var2'}) % Combine a and b with an outer join, which matches up % observations with common key values, but also retains % observations whose key values don't have a match. % Keep the key values as separate variables in the result. couter = join(a,b,'key','Key1','Type','outer') % Join a and b, merging the key values as a single variable % in the result. coutermerge = join(a,b,'key','Key1','Type','outer',... 'MergeKeys',true) % Join a and b, retaining only observations whose key % values match. cinner = join(a,b,'key','Key1','Type','inner',... 'MergeKeys',true) a = Key1 Var1 'a' 1 'b' 2 'c' 3 'e' 11 'h' 17 b = Key1 Var2 'a' 4 'b' 5 'd' 6 'e' 7 couter = Key1_left Var1 Key1_right Var2 'a' 1 'a' 4 'b' 2 'b' 5 'c' 3 '' NaN '' NaN 'd' 6 'e' 11 'e' 7 'h' 17 '' NaN coutermerge = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'c' 3 NaN 'd' NaN 6 'e' 11 7 'h' 17 NaN cinner = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'e' 11 7

Was this topic helpful?