Align two data sets

Question

0 votes

Hi,

I am trying to align two datasets log and data matrices (examples attached bellow data sezs of trials). The files have different sizes. Logfile is in ms and the data file is in frames. I need to align them together and downsample the log file to the size of the data file.

The columns in the logfile log the stimuli representation in time (an example, attached below ). The rows are time and the columns are events. The 3d column logs the ID and the duration of the trials. Trial 1=1, Trial2=2 etc. The trials in the log file start at the same time as in the datafile but they are a bit longer than in the datafile. I need to cut each trial at the end to the length of the trial's in the data file, downsample them to the size of the data file and concatenate them back. The trial length in the data file = size(datafile,1)./number of trials. At the moment, I am doing it using the simple code attached below which is limited to 10 trials. I would like to put it in the loop to run it automatically on the files with a large number of trials independent of the number of trials. Could anyone help with this?

1 Comment
Show -1 older comments Hide -1 older comments

MarKf on 5 Aug 2023

That is not well-written nor working, code and explanation. The unnecessairly long logfile.xlsx seems to have every millisecond logged, associated with the trial ID (or lack thereof for the first 7 ms it seems?), the trial time would be enough (like the variable Trial_IDS) but I guess you need something like that and you need to downsample it.

On failing line 9: tp2=size(datafile,1)./10;% length of the data for one trial I assume that's when your pseudocode starts, and that you mean to extract something from the variable Data, but not the size of the variable like you do in the code, but the trial length. That part is not clear. So not sure how much of the trial you need to cut, and if it's a different time duration for different tirals based on the info in datafile.xlsx.

As for downsampling then just take x(1:50:end,3:4) to get only 1sec/20 = 50ms matrix with tiral ID and time vector, but it won't capture the trial beginning at 7ms.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Voss on 5 Aug 2023

Edited: Voss on 5 Aug 2023

Open in MATLAB Online

0 votes

x = readmatrix('logfile.xlsx');
Data = readmatrix('datafile.xlsx');
frame_rate = 20;
n = 1000/frame_rate;
x(x(:,3) == 0,:) = [];
[g,g_id] = findgroups(x(:,3));
trial_length = size(Data,1)/numel(g_id)*n;
Trials = splitapply(@(t)cut_and_downsample(t,trial_length,n),x,g);
Logfile_concatinated = vertcat(Trials{:});
function out = cut_and_downsample(in,len,n)
% out = {downsample(in(1:len,:),n)};
out = {in(1:n:len,:)};
end

10 Comments
Show 8 older comments Hide 8 older comments

Voss on 5 Aug 2023

Open in MATLAB Online

In your script create_log, I think that this part:

tr1 = x(id1:id2,:);% split logfile in to single trials
tr2 = x(id2:id3,:);
tr3 = x(id3:id4,:);
tr4 = x(id4:id5,:);
tr5 = x(id5:end,:);

Should be like this:

tr1 = x(id1+1:id2,:);% split logfile in to single trials
tr2 = x(id2+1:id3,:);
tr3 = x(id3+1:id4,:);
tr4 = x(id4+1:id5,:);
tr5 = x(id5+1:end,:);

Because tr1, for instance, the way you are constructing it, includes as its first row the last row of the "0th" trial:

x = readmatrix('logfile.xlsx');
frame_rate = 20;
n = 1000/frame_rate; % 1000ms/sampling rate
ind = x (:,3);% Epoch ID column
trial_IDS = find(diff(ind));%findtrials
id1=trial_IDS (1,:)% ids of trials in logfile
id1 = 7
id2=trial_IDS (2,:)
id2 = 58001
id3=trial_IDS (3,:)
id3 = 116001
id4=trial_IDS (4,:)
id4 = 174001
id5=trial_IDS (5,:)
id5 = 232001
tr1 = x(id1:id2,:);% split logfile in to single trials
tr2 = x(id2:id3,:);
tr3 = x(id3:id4,:);
tr4 = x(id4:id5,:);
tr5 = x(id5:end,:);
tr1
tr1 = 57995×4
     0     0     0     7
     0     0     1     8
     0     0     1     9
     0     0     1    10
     0     0     1    11
     0     0     1    12
     0     0     1    13
     0     0     1    14
     0     0     1    15
     0     0     1    16

Similarly, tr2 includes the last row of trial "1" on its first row:

tr2
tr2 = 58001×4
           0           0           1       58001
           0           0           2       58002
           0           0           2       58003
           0           0           2       58004
           0           0           2       58005
           0           0           2       58006
           0           0           2       58007
           0           0           2       58008
           0           0           2       58009
           0           0           2       58010

etc., for all the trials. Adding one to the first index when you split x into the separate trials fixes that.

If you include the +1 to correct that, the results are the same as you get by the code in my answer, which uses findgroups and splitapply and will work for any number of trials.

To show that the results are the same on this data set between the two methods:

% run your script, but with the +1 modification described above:
create_log_modified
% save the variable Logfile_concatinated with another name:
Logfile_concatinated_save = Logfile_concatinated;
% run my version:
create_log_voss
% check whether Logfile_concatinated from my version is the same as
% Logfile_concatinated_save calculated from your (modified) version:
isequal(Logfile_concatinated_save,Logfile_concatinated)
ans = logical
   1

Voss on 5 Aug 2023

Edited: Voss on 5 Aug 2023

Open in MATLAB Online

You can try the code below, with your other data sets and frame-rates.

If it doesn't work (i.e., you get an error message or the results are not as expected), then upload a data set with a non-integer (or not a factor of 1000) frame-rate, and I'll see what has to be done to accommodate it.

x = readmatrix('logfile.xlsx');
Data = readmatrix('datafile.xlsx');
frame_rate = 20.058913;
n = floor(1000/frame_rate);
x(x(:,3) == 0,:) = [];
[g,g_id] = findgroups(x(:,3));
trial_length = floor(size(Data,1)/numel(g_id)*n);
Trials = splitapply(@(t)cut_and_downsample(t,trial_length,n),x,g);
Logfile_concatinated = vertcat(Trials{:})
Logfile_concatinated = 5700×4
     0     0     1     8
     0     0     1    57
     0     0     1   106
     0     0     1   155
     0     0     1   204
     0     0     1   253
     0     0     1   302
     0     0     1   351
     0     0     1   400
     0     0     1   449
function out = cut_and_downsample(in,len,n)
out = {downsample(in(1:len,:),n)};
% out = {in(1:n:len,:)};
end

Voss on 6 Aug 2023

Doesn't frame_rate have to be the actual frame rate that the data was captured at, if you want the two to line up?

EK on 6 Aug 2023

yes, the frame rate should be the actual one otherwise the alignment won't be precise. The example I sent is not the best one. If the frame rate is 20.04 and round it to 20 I do not see much difference on small data sets. Maybe there is a small shift in a few milliseconds that is not that critical. But if I have the frame rate let's say 17.5 or so it becomes a problem

Sign in to comment.

Align two data sets

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

10 Comments
Show 8 older comments Hide 8 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Align two data sets

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

10 Comments Show 8 older comments Hide 8 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

10 Comments
Show 8 older comments Hide 8 older comments