Accelerating the pace of engineering and science

Documentation Center

• Trial Software

join

Class: dataset

Merge observations

The dataset data type might be removed in a future release. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax

C = join(A,B)
C = join(A,B,keys)
C = join(A,B,param1,val1,param2,val2,...)
[C,IB] = join(...)
C = join(A,B,'Type',TYPE,...)
C = join(A,B,'Type',TYPE,'MergeKeys',true,...)
[C,IA,IB] = join(A,B,'Type',TYPE,...)

Description

C = join(A,B) creates a dataset array C by merging observations from the two dataset arrays A and B. join performs the merge by first finding key variables, that is, pairs of dataset variables, one in A and one in B, that share the same name. Each observation in B must contain a unique combination of values in the key variables, and must contain all combinations of values that are present in the keys from A. join then uses these key variables to define a many-to-one correspondence between observations in A and those in B. join uses this correspondence to replicate the observations in B and combine them with the observations in A to create C.

C = join(A,B,keys) performs the merge using the variables specified by keys as the key variables in both A and B. keys is a positive integer, a vector of positive integers, a variable name, a cell array of variable names, or a logical vector.

C contains one observation for each observation in A. Variables in C include all of the variables from A, as well as one variable corresponding to each variable in B (except for the keys from B). If A and B contain variables with identical names, join adds the suffix '_left' and '_right' to the corresponding variables in C.

C = join(A,B,param1,val1,param2,val2,...) specifies optional parameter name/value pairs to control how the dataset variables in A and B are used in the merge. Parameters are:

• 'Keys' — Specifies the variables to use as keys in both A and B.

• 'LeftKeys' — Specifies the variables to use as keys in A.

• 'RightKeys' — Specifies the variables to use as keys in B.

You may provide either the 'Keys' parameter, or both the 'LeftKeys' and 'RightKeys' parameters. The value for these parameters is a positive integer, a vector of positive integers, a variable name, a cell array containing variable names, or a logical vector. 'LeftKeys' or 'RightKeys' must both specify the same number of key variables, and join pairs the left and right keys in the order specified.

• 'LeftVars' — Specifies which variables from A to include in C. By default, join includes all variables from A.

• 'RightVars' — Specifies which variables from B to include in C. By default, join includes all variables from B except the key variables.

You can use 'LeftVars' or 'RightVars' to include or exclude key variables as well as data variables. The value for these parameters is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector.

[C,IB] = join(...) returns an index vector IB, where join constructs C by horizontally concatenating A(:,LeftVars) and B(IB,RightVars). join can also perform more complicated inner and outer join operations that allow a many-to-many correspondence between A and B, and allow unmatched observations in either A or B.

C = join(A,B,'Type',TYPE,...) performs the join operation specified by TYPE. TYPE is one of 'inner', 'leftouter', 'rightouter', 'fullouter', or 'outer' (which is a synonym for 'fullouter'). For an inner join, C only contains observations corresponding to a combination of key values that occurred in both A and B. For a left (or right) outer join, C also contains observations corresponding to keys in A (or B) that did not match any in B (or A). Variables in C taken from A (or B) contain null values in those observations. A full outer join is equivalent to a left and right outer join. C contains variables corresponding to the key variables from both A and B, and join sorts the observations in C by the key values.

For inner and outer joins, C contains variables corresponding to the key variables from both A and B by default, as well as all the remaining variables. join sorts the observations in the result C by the key values.

C = join(A,B,'Type',TYPE,'MergeKeys',true,...) includes a single variable in C for each key variable pair from A and B, rather than including two separate variables. For outer joins, join creates the single variable by merging the key values from A and B, taking values from A where a corresponding observation exists in A, and from B otherwise. Setting the 'MergeKeys' parameter to true overrides inclusion or exclusion of any key variables specified via the 'LeftVars' or 'RightVars' parameter. Setting the 'MergeKeys' parameter to false is equivalent to not passing in the 'MergeKeys' parameter.

[C,IA,IB] = join(A,B,'Type',TYPE,...) returns index vectors IA and IB indicating the correspondence between observations in C and those in A and B. For an inner join, join constructs C by horizontally concatenating A(IA,LeftVars) and B(IB,RightVars). For an outer join, IA or IB may also contain zeros, indicating the observations in C that do not correspond to observations in A or B, respectively.

Examples

Create a dataset array from Fisher's iris data:

```load fisheriris
NumObs = size(meas,1);
NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d'));
iris = dataset({nominal(species),'species'},...
{meas,'SL','SW','PL','PW'},...
'ObsNames',NameObs);
```

Create a separate dataset array with the diploid chromosome counts for each species of iris:

```snames = nominal({'setosa';'versicolor';'virginica'});
CC = dataset({snames,'species'},{[38;108;70],'cc'})
CC =
species       cc
setosa         38
versicolor    108
virginica      70```

Broadcast the data in CC to the rows of iris using the key variable species in each dataset:

```iris2 = join(iris,CC);
iris2([1 2 51 52 101 102],:)
ans =
species       SL     SW     PL     PW     cc
Obs1      setosa        5.1    3.5    1.4    0.2     38
Obs2      setosa        4.9      3    1.4    0.2     38
Obs51     versicolor      7    3.2    4.7    1.4    108
Obs52     versicolor    6.4    3.2    4.5    1.5    108
Obs101    virginica     6.3    3.3      6    2.5     70
Obs102    virginica     5.8    2.7    5.1    1.9     70```

Create two datasets and join them using the 'MergeKeys' flag:

```% Create two data sets that both contain the key variable
% 'Key1'. The two arrays contain observations with common
% values of Key1, but each array also contains observations
% with values of Key1 not present in the other.
a = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',...
'VarNames',{'Key1' 'Var1'})
b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',...
'VarNames',{'Key1' 'Var2'})

% Combine a and b with an outer join, which matches up
% observations with common key values, but also retains
% observations whose key values don't have a match.
% Keep the key values as separate variables in the result.
couter = join(a,b,'key','Key1','Type','outer')

% Join a and b, merging the key values as a single variable
% in the result.
coutermerge = join(a,b,'key','Key1','Type','outer',...
'MergeKeys',true)

% Join a and b, retaining only observations whose key
% values match.
cinner = join(a,b,'key','Key1','Type','inner',...
'MergeKeys',true)

a =

Key1       Var1
'a'         1
'b'         2
'c'         3
'e'        11
'h'        17

b =

Key1       Var2
'a'        4
'b'        5
'd'        6
'e'        7

couter =

Key1_left    Var1    Key1_right    Var2
'a'            1     'a'             4
'b'            2     'b'             5
'c'            3     ''            NaN
''           NaN     'd'             6
'e'           11     'e'             7
'h'           17     ''            NaN

coutermerge =

Key1       Var1    Var2
'a'          1       4
'b'          2       5
'c'          3     NaN
'd'        NaN       6
'e'         11       7
'h'         17     NaN

cinner =

Key1       Var1    Var2
'a'         1      4
'b'         2      5
'e'        11      7  ```