Align Two Data Sets by Date and Location Where Both Data Sets Are Missing Data Points at Random Dates and Locations?

Question

Dylan on 26 Mar 2016

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/275504-align-two-data-sets-by-date-and-location-where-both-data-sets-are-missing-data-points-at-random-date

Commented: Dylan on 26 Mar 2016

Howdy. I have two data sets which I've read into two different arrays and sorted by the date and station of sampling. Dates are in column 1, are arranged in ascending order, and are formatted as a double that looks like yyyymmdd (in other words, May 5 2005 looks like 20050505). Column 2 contains the station at which that sample was taken. There are 7 stations, and at each time point the stations are arranged in ascending order from 1-7. The problem is at certain times/stations I have data in one data set but not in the other. This has resulted in one array with 517 data points (rows) and the second with 551 data points. I should also mention that the two arrays contain different numbers of columns (variables) and that my final goal is to look at correlations between the variables in each data set. In other words both matrices have the following form, but are missing certain (and different) dates/stations: [20050505, 1, data; 20050505, 2, data; 20050505, 3, data; 20050505, 4, data; 20050505, 5, data; 20060615, 1, data; 20060615, 2, data; 20060615, 3, data; 20060615, 4, data;], etc.

I've been trying to find ways to find the indices where the first two columns of the two arrays differ and then do one of two things: either 1. insert a row of NaNs where one array is missing a data point or 2. delete the data points in one array where there is no data for that date and station in the other. I have searched extensively for ways to solve this problem and have not found anything. I have also played around quite a bit with the "union" and "ismember" functions but have had no luck.

I would greatly appreciate any advice on other functions or code which may help accomplish this task.

Thanks!!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Peter Perkins on 26 Mar 2016

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/275504-align-two-data-sets-by-date-and-location-where-both-data-sets-are-missing-data-points-at-random-date#answer_215142

Dylan, if I understand your description correctly, this is a pretty classic example of an outer join:

First cook up some fake data like (I think) yours:

>> date = [1 1 1 2 2 2 3 3 3]';
>> station = [1 2 3 1 2 3 1 2 3]';
>> x = randn(size(date));
>> y = randn(size(date));
>> t1 = table(date,station,x,y,z);
>> z = randn(size(date));
>> u = randn(size(date));
>> v = randn(size(date));
>> t2 = table(date,station,u,v);
>> t1 = t1([1:4 6:end],:)
t1 = 
    date    station       x            y           z    
    ____    _______    ________    _________    ________
    1       1           0.03348     -0.83137      1.7119
    1       2           -1.3337     -0.97921    -0.19412
    1       3            1.1275      -1.1564     -2.1384
    2       1           0.35018     -0.53356    -0.83959
    2       3           0.02289      0.96423     -1.0722
    3       1            -0.262      0.52006     0.96095
    3       2           -1.7502    -0.020028     0.12405
    3       3          -0.28565    -0.034771      1.4367
>> t2 = t2([1:6 8:end],:)
t2 = 
    date    station       u           v    
    ____    _______    ________    ________
    1       1           -1.1201     -2.3299
    1       2             2.526     -1.4491
    1       3            1.6555     0.33351
    2       1           0.30754     0.39135
    2       2           -1.2571     0.45168
    2       3          -0.86547    -0.13028
    3       2           0.79142    -0.47615
    3       3            -1.332     0.86202

t1 and t2 are missing one row each, in different places. Now join on the station and time:

>> t = outerjoin(t1,t2,'Keys',{'date' 'station'},'MergeKeys',true)
t = 
    date    station       x            y           z           u           v    
    ____    _______    ________    _________    ________    ________    ________
    1       1           0.03348     -0.83137      1.7119     -1.1201     -2.3299
    1       2           -1.3337     -0.97921    -0.19412       2.526     -1.4491
    1       3            1.1275      -1.1564     -2.1384      1.6555     0.33351
    2       1           0.35018     -0.53356    -0.83959     0.30754     0.39135
    2       2               NaN          NaN         NaN     -1.2571     0.45168
    2       3           0.02289      0.96423     -1.0722    -0.86547    -0.13028
    3       1            -0.262      0.52006     0.96095         NaN         NaN
    3       2           -1.7502    -0.020028     0.12405     0.79142    -0.47615
    3       3          -0.28565    -0.034771      1.4367      -1.332     0.86202

Hope this helps.

1 Comment
Show -1 older commentsHide -1 older comments

Dylan on 26 Mar 2016

oh my goodness this is fantastic. so simple! thank you so much!!

Sign in to comment.

Answer 2

Kelly Kearney on 26 Mar 2016

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/275504-align-two-data-sets-by-date-and-location-where-both-data-sets-are-missing-data-points-at-random-date#answer_215081

The ismember and unique functions, with the 'rows' option in both, should help:

data1 = [...
    20050505           1  1
    20050505           2  2
    20060615           1  3
    20060615           2  4
    20060615           3  5
    20060615           4  6];
data2 = [...
    20050505           1  7
    20050505           2  8
    20060615           1  9
    20060615           2  10
    20060615           3  11
    20060616           4  12];
unq = unique([data1(:,1:2); data2(:,1:2)], 'rows');
[~, idx1] = ismember(data1(:,1:2), unq, 'rows');
[~, idx2] = ismember(data2(:,1:2), unq, 'rows');
newdata = [unq nan(size(unq,1),2)];
newdata(idx1,3) = data1(:,3);
newdata(idx2,4) = data2(:,3);

Result:

newdata =
    20050505           1           1           7
    20050505           2           2           8
    20060615           1           3           9
    20060615           2           4          10
    20060615           3           5          11
    20060615           4           6         NaN
    20060616           4         NaN          12

1 Comment
Show -1 older commentsHide -1 older comments

Dylan on 26 Mar 2016

Ah, I see. You've helped me find my mistake, I was doing something similar, but the unique function was not working for me. Apparently it does not recognize NaN's as equivalent and so was not eliminating identical rows. Thanks for the help!

Sign in to comment.

Align Two Data Sets by Date and Location Where Both Data Sets Are Missing Data Points at Random Dates and Locations?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

Align Two Data Sets by Date and Location Where Both Data Sets Are Missing Data Points at Random Dates and Locations?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments