Align two data sets

Hi,
I am trying to align two datasets log and data matrices (examples attached bellow data sezs of trials). The files have different sizes. Logfile is in ms and the data file is in frames. I need to align them together and downsample the log file to the size of the data file.
The columns in the logfile log the stimuli representation in time (an example, attached below ). The rows are time and the columns are events. The 3d column logs the ID and the duration of the trials. Trial 1=1, Trial2=2 etc. The trials in the log file start at the same time as in the datafile but they are a bit longer than in the datafile. I need to cut each trial at the end to the length of the trial's in the data file, downsample them to the size of the data file and concatenate them back. The trial length in the data file = size(datafile,1)./number of trials. At the moment, I am doing it using the simple code attached below which is limited to 10 trials. I would like to put it in the loop to run it automatically on the files with a large number of trials independent of the number of trials. Could anyone help with this?

1 Comment

That is not well-written nor working, code and explanation. The unnecessairly long logfile.xlsx seems to have every millisecond logged, associated with the trial ID (or lack thereof for the first 7 ms it seems?), the trial time would be enough (like the variable Trial_IDS) but I guess you need something like that and you need to downsample it.
On failing line 9: tp2=size(datafile,1)./10;% length of the data for one trial I assume that's when your pseudocode starts, and that you mean to extract something from the variable Data, but not the size of the variable like you do in the code, but the trial length. That part is not clear. So not sure how much of the trial you need to cut, and if it's a different time duration for different tirals based on the info in datafile.xlsx.
As for downsampling then just take x(1:50:end,3:4) to get only 1sec/20 = 50ms matrix with tiral ID and time vector, but it won't capture the trial beginning at 7ms.

Sign in to comment.

 Accepted Answer

Voss
Voss on 5 Aug 2023
Edited: Voss on 5 Aug 2023
x = readmatrix('logfile.xlsx');
Data = readmatrix('datafile.xlsx');
frame_rate = 20;
n = 1000/frame_rate;
x(x(:,3) == 0,:) = [];
[g,g_id] = findgroups(x(:,3));
trial_length = size(Data,1)/numel(g_id)*n;
Trials = splitapply(@(t)cut_and_downsample(t,trial_length,n),x,g);
Logfile_concatinated = vertcat(Trials{:});
function out = cut_and_downsample(in,len,n)
% out = {downsample(in(1:len,:),n)};
out = {in(1:n:len,:)};
end

10 Comments

EK
EK on 5 Aug 2023
Thanks a lot!
In your script create_log, I think that this part:
tr1 = x(id1:id2,:);% split logfile in to single trials
tr2 = x(id2:id3,:);
tr3 = x(id3:id4,:);
tr4 = x(id4:id5,:);
tr5 = x(id5:end,:);
Should be like this:
tr1 = x(id1+1:id2,:);% split logfile in to single trials
tr2 = x(id2+1:id3,:);
tr3 = x(id3+1:id4,:);
tr4 = x(id4+1:id5,:);
tr5 = x(id5+1:end,:);
Because tr1, for instance, the way you are constructing it, includes as its first row the last row of the "0th" trial:
x = readmatrix('logfile.xlsx');
frame_rate = 20;
n = 1000/frame_rate; % 1000ms/sampling rate
ind = x (:,3);% Epoch ID column
trial_IDS = find(diff(ind));%findtrials
id1=trial_IDS (1,:)% ids of trials in logfile
id1 = 7
id2=trial_IDS (2,:)
id2 = 58001
id3=trial_IDS (3,:)
id3 = 116001
id4=trial_IDS (4,:)
id4 = 174001
id5=trial_IDS (5,:)
id5 = 232001
tr1 = x(id1:id2,:);% split logfile in to single trials
tr2 = x(id2:id3,:);
tr3 = x(id3:id4,:);
tr4 = x(id4:id5,:);
tr5 = x(id5:end,:);
tr1
tr1 = 57995×4
0 0 0 7 0 0 1 8 0 0 1 9 0 0 1 10 0 0 1 11 0 0 1 12 0 0 1 13 0 0 1 14 0 0 1 15 0 0 1 16
Similarly, tr2 includes the last row of trial "1" on its first row:
tr2
tr2 = 58001×4
0 0 1 58001 0 0 2 58002 0 0 2 58003 0 0 2 58004 0 0 2 58005 0 0 2 58006 0 0 2 58007 0 0 2 58008 0 0 2 58009 0 0 2 58010
etc., for all the trials. Adding one to the first index when you split x into the separate trials fixes that.
If you include the +1 to correct that, the results are the same as you get by the code in my answer, which uses findgroups and splitapply and will work for any number of trials.
To show that the results are the same on this data set between the two methods:
% run your script, but with the +1 modification described above:
create_log_modified
% save the variable Logfile_concatinated with another name:
Logfile_concatinated_save = Logfile_concatinated;
% run my version:
create_log_voss
% check whether Logfile_concatinated from my version is the same as
% Logfile_concatinated_save calculated from your (modified) version:
isequal(Logfile_concatinated_save,Logfile_concatinated)
ans = logical
1
Voss
Voss on 5 Aug 2023
You're welcome! Let me know if you have any questions.
EK
EK on 5 Aug 2023
yes, you are right x(id1+1:id2,:) is correct! I have an other question, in this dataset I have frame rate= 20 but it is not always the case. Sometime my frame rate is 20.5 or 17.6 etc and I have problem with downsampling. Is there any easy solution for that?
You can try the code below, with your other data sets and frame-rates.
If it doesn't work (i.e., you get an error message or the results are not as expected), then upload a data set with a non-integer (or not a factor of 1000) frame-rate, and I'll see what has to be done to accommodate it.
x = readmatrix('logfile.xlsx');
Data = readmatrix('datafile.xlsx');
frame_rate = 20.058913;
n = floor(1000/frame_rate);
x(x(:,3) == 0,:) = [];
[g,g_id] = findgroups(x(:,3));
trial_length = floor(size(Data,1)/numel(g_id)*n);
Trials = splitapply(@(t)cut_and_downsample(t,trial_length,n),x,g);
Logfile_concatinated = vertcat(Trials{:})
Logfile_concatinated = 5700×4
0 0 1 8 0 0 1 57 0 0 1 106 0 0 1 155 0 0 1 204 0 0 1 253 0 0 1 302 0 0 1 351 0 0 1 400 0 0 1 449
function out = cut_and_downsample(in,len,n)
out = {downsample(in(1:len,:),n)};
% out = {in(1:n:len,:)};
end
EK
EK on 5 Aug 2023
Hi thanks a lot! I have tried to run it on the same files using exact frame rate 20.058913.
I am getting error:
Error using splitapply
Applying the function '@(t)cut_and_downsample(t,trial_length,n)' to the 1st group of data generated the following error:
Expected N to be integer-valued.
Voss
Voss on 5 Aug 2023
I have edited the code in my previous comment to make n an integer. Does that give the expected result?
EK
EK on 6 Aug 2023
I tried another dataset with a frame rate of 20.126029. I did not get errors but the alignment is not exact anymore. If I plot the first column of the log file (stimuli ) and compare it with the corresponding second column of the data file (response starting from the 3d till the 10th trial) they are not aligned well. The stimuli in each trial are shifting a bit in time to about 100-500ms. I do not see that if I round the frame rate to 20.
The floor rounding n ( n =49.68 ) to 49. Maybe try to use interpolation instead?
Voss
Voss on 6 Aug 2023
Doesn't frame_rate have to be the actual frame rate that the data was captured at, if you want the two to line up?
EK
EK on 6 Aug 2023
yes, the frame rate should be the actual one otherwise the alignment won't be precise. The example I sent is not the best one. If the frame rate is 20.04 and round it to 20 I do not see much difference on small data sets. Maybe there is a small shift in a few milliseconds that is not that critical. But if I have the frame rate let's say 17.5 or so it becomes a problem

Sign in to comment.

More Answers (0)

Products

Release

R2022a

Asked:

EK
on 5 Aug 2023

Commented:

EK
on 6 Aug 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!