MATLAB Answers

LO
0

How to loop fopen/fread/fseek to analyze large binary files in shorter segments ?

Asked by LO
on 13 Sep 2018 at 20:29
Latest activity Edited by dpb
on 16 Sep 2018 at 16:30
Accepted Answer by dpb

I have a large bin file (10Gb) which contains binary data from 16 channels in a single array, plus one value for the time variable (basically a single sample is made by 17 binary values). The size of each sample should be 8bit.

I am already able to extract a segment of this file and analyze it channel by channel. I achieved this by loading the all file, reading it and creating a sub-array sized properly.

[logfname, pathname] = uigetfile('*.mat','Pick a log file');
logfullpathname = [pathname logfname];
load(logfullpathname);
datafullpathname = [pathname datafilename];
FID = fopen(datafullpathname,'r');
fwrite(FID,[(start_seg*60*Fs):(end_seg*60*Fs-1)]);  %Fs = sampling freq = 20Khz (20000)
%% load selected data segment and select channel to analyze
currData = fread(FID,'double');  %% <- THIS IS THE COMMAND THAT SLOWS EVERYTHING DOWN
time = currData((1:(n_channels+1):end), 1);
SIGNAL = currData(((activechannel+1):(n_channels+1):end), 1);
trace = [time SIGNAL];
crop = trace((start_seg*60*Fs):(end_seg*60*Fs-1),:); % select data range of active channel to analyze
seg = transpose(crop(:, 2)); % should output column 2 values (SIGNAL) within the time range defined by "crop" 

But this takes a minute or so for each segment (each segment is 1 minute).

what I would like to do is just loading a segment at a time and loop the analysis (the analysis per se does not take long).

however I am stuck: I am using ftell(FID) to get the indexes of my bin file just for testing purposes but it gives me always the same number and it does not loop.

this is my code for a 1hour recording file in which I select a segment from the 45th to 46th minute, I have tried both a "while" and a "for" cycle:

this is the WHILE cycle

%% WHILE CYCLE
file=('C:\Users\Admin\Documents\MATLAB\EOD examples\Examples 16ch\copy.mat');
fileID = fopen(file,'r');
feof(fileID)
index = 0;
n_channels = 16;
Fs = 20000; %sampling frequency
seg = Fs*60*(n_channels+1); %in the bin file data are organized in a vector [t(i),Ch(1)..Ch(i)], with t = time
size_of_double = 8;
while ~feof(fileID)
fseek(fileID,index*size_of_double,'bof'); %this should look for the first data point in the file
position = ftell(fileID); %this should report the current index 
position     
currData = fread(fileID, seg,'double');
currData
index=index+seg;
end

this is the FOR cycle %% FOR CYCLE

file=('C:\Users\Admin\Documents\MATLAB\EOD examples\Examples 16ch\copy.mat');
Fs = 20000;
n_channels = 16;                        %number of active channels
index=2;                                %index of the first data point
segsize = 60*Fs*(n_channels+1);         % this is the length of 1 min segment, 20.000 samples per second
% get filesize
fileID = fopen(file,'r');
fseek(fileID, 0, 1);  % move to end
file_length_in_byte = ftell(fileID); % read end position
size_of_double = 8;
file_length_in_double_elements = file_length_in_byte / size_of_double;
feof(fileID);
%step = 1;  % in elements
for i = 1:segsize:file_length_in_double_elements-1;  % until end of file
fileindex = (i-1)*segsize*size_of_double;  % this is where we wanna go
fseek(fileID,fileindex,'bof');  % go to i-th data point
current_index = ftell(fileID);  % get file index
current_index  % this is where we went
currData = fread(fileID, 2,'double');
currData  % here is what we read
end
fclose(fileID);

Basically, before adding the analysis part I would like to have as an output the indexes in the file corresponding to the starting points of each segment. I am stuck and I do not know where the issue is: the first cycle reports just a sequence of values that does not look to have the right interval (which should be 20000*60*17, if the segment is 1 minute long). The second just prompts one number, as if the loop would just run one time. Thanks for your help in advance !

  5 Comments

Thanks for your comments! @dpb: some lines here and there might be leftovers of something that was changed and simply I did not remove completely. I am new to Matlab, I apologize if some lines make no sense. It was 8 BYTES, not bits, I correct the typo sorry.

@Walter: I tried to apply what I found on the online documentation material. Probably I did apply it wrong. If that's the case I would appreciate if you could underline the mistakes and provide eventually a solution for them. This while cycle does not go to infinite, I tried. It just gives as output a list of ca. 5040 numbers (if I am not wrong). However, this list should be of 60 numbers, as I am trying to divide 1 hour recording file in 1 minute segments (for each minute I have 20.000*17*60 samples: sampling rate = 20.000 Hz, with 16 channels plus time = 17, 1 min = 60 sec).

I think in principle these cycles should work but perhaps there is something wrong I am doing with the size of the segment (in samples)/size of the file (in bytes).

Walter, I do not really need to use the while loop... I just tried to approach two different solutions for the same problem.. I will choose what works best :) thanks anyway

@Livio Oboti: What exactly is the problem?

 How to loop fopen/fread/fseek to analyze large binary files in
 shorter segments?

This should be easy. With segsize = 60*Fs*(n_channels+1) it should be trivial to use fseek(fid, n * segsize * 8) to move the file pointer to the wanted position.

I know :) but if I run my code (I think is the same as you wrote), I don't get 60 numbers as output, just way more.

this is my code

file=('C:\Users\Admin\Documents\MATLAB\EOD examples\Examples 16ch\copy.mat');
fileID = fopen(file,'r');
feof(fileID)
index = 0;
n_channels = 16;
Fs = 20000; %sampling frequency
seg = Fs*60*(n_channels+1); %in the bin file data are organized in a vector [t(i),Ch(1)..Ch(i)], with t = time
size_of_double = 8;
while ~feof(fileID)
fseek(fileID,index*size_of_double*seg,'bof'); %this should look for the first data point in the file
position = ftell(fileID); %this should report the current index 
position     
currData = fread(fileID, seg,'double');
currData
index=index+seg;
end

Sign in to comment.

1 Answer

Answer by dpb
on 14 Sep 2018 at 13:29
Edited by dpb
on 16 Sep 2018 at 13:26
 Accepted Answer

file=('C:\Users\Admin\Documents\MATLAB\EOD examples\Examples 16ch\copy.mat');
fid=fopen(file,'r');
n_ch=16;                  % number channels (exclusive of timestamp)
Fs = 20000;               % sampling frequency
index = 0;
while ~feof(fid)
  index=index+1;          % minute number
  currData=fread(fid,[n_ch+1,Fs*60],'double').';  % one minute of data in 2D array
  % NB: must orient [time+N Channels X TimeSteps] and then transpose to
  % order data properly by column-major order internally.  
  disp(sprintf('Processing minute: %d',index))
  % process one minute worth of data here
  ...
end
fid=fclose(fid);

The key is to just read 20k*60 by 1+16 chnls each pass through the loop and tell fread to return in that orientation. Time vector will be currData(:,1), the channels currData(:,2:end)

Might look at memmapfile object as a higher-level abstraction if intending to read other than sequentially.

ADDENDUM/ERRATUM

Corrected initial above [sizeA} argument order to properly reflect data are in channel-order by time step in the file so must read the partial file indicating the nCH+1 dimension first for the desired number of time steps, then transpose to have as column-wise in internal data storage order.

If were reading the whole file, Matlab would do that automagically but a portion of the file must be treated as a separate grouping of the time plus data channels.

  16 Comments

OK I just added this line before the end of the while loop and it works !

if index > 59, break, end 

thanks a lot DB !

OH! One oversight/mistake on my part...since the data are written sequentially by time step, reverse the order of the indices in the [sizeA] argument and transpose to get the 2D array in proper sequence in memory.

currData=fread(FID,[(n_channels+1),Fs*60],'double').';

The

while ~feof(fid)

loop construct I used should work to read the full file. If you want only a counted number of time steps then you'll have to ensure there are sufficient data in the file to read.

that's what indeed I did, as in my comment. thanks :)

Sign in to comment.