join - Merge observations

Class

@dataset

Syntax

C = join(A,B)
C = join(A,B,key)
C = join(A,B,param1,val1,param2,val2,...)
[C,idx] = join(...)

Description

C = join(A,B) creates a dataset array C by merging observations from the two dataset arrays A and B. join performs the merge by first finding key variables, that is, a pair of dataset variables, one in A and one in B, that share the same name. The key from B must contain unique values, and must contain all the values that are present in the key from A. join then uses these key variables to define a many-to-one correspondence between observations in A and those in B. join uses this correspondence to replicate the observations in B and combine them with the observations in A to create C.

C contains one observation for each observation in A. Variables in C include all of the variables from A, as well as one variable corresponding to each variable in B (except for the key from B).

C = join(A,B,key) performs the merge using the variable specified by key as the key variable in both A and B. key is a positive integer, a variable name, a cell array containing a variable name, or a logical vector with one true entry.

C = join(A,B,param1,val1,param2,val2,...) specifies optional parameter name/value pairs to control how the dataset variables in A and B are used in the merge. Parameters are:

You may provide either the 'Key' parameter, or both the 'LeftKey' and 'RightKey' parameters. The value for these parameters is a positive integer, a variable name, a cell array containing a variable name, or a logical vector with one true entry.

The value for these parameters is a positive integer, a vector of positive integers, a variable name, a cell array containing one or more variable names, or a logical vector.

[C,idx] = join(...) returns an index vector idx, where the observations in C are constructed by horizontally concatenating A(:,leftvars) and B(idx,rightvars).

Example

Create a dataset array from Fisher's iris data:

load fisheriris
NumObs = size(meas,1);
NameObs = strcat({'Obs'},num2str((1:NumObs)','%d'));
iris = dataset({nominal(species),'species'},...
               {meas,'SL','SW','PL','PW'},...
               'ObsNames',NameObs);

Create a separate dataset array with the diploid chromosome counts for each species of iris:

snames = nominal({'setosa';'versicolor';'virginica'});
CC = dataset({snames,'species'},{[38;108;70],'cc'})
CC = 
    species       cc 
    setosa         38
    versicolor    108
    virginica      70

Broadcast the data in CC to the rows of iris using the key variable species in each dataset:

iris2 = join(iris,CC);
iris2([1 2 51 52 101 102],:)
ans = 
           species       SL     SW     PL     PW     cc 
 Obs1      setosa        5.1    3.5    1.4    0.2     38
 Obs2      setosa        4.9      3    1.4    0.2     38
 Obs51     versicolor      7    3.2    4.7    1.4    108
 Obs52     versicolor    6.4    3.2    4.5    1.5    108
 Obs101    virginica     6.3    3.3      6    2.5     70
 Obs102    virginica     5.8    2.7    5.1    1.9     70

See Also

sortrows

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS