Splitting a file into multiple files based on trigger words in the first column.

4 views (last 30 days)
I've got a large set of data (.dat) that I need to split whenever a specific text string is mentioned. For example, I've got:
Dataset_1_1 Set Number
1234 1234
.... ....
Dataset_1_2 Set Number2
5678 5678
.... ....
[I need to make the split here]
Dataset_2_1 Set Number
1234 1234
.... ....
Dataset_2_2 Set Number2
5678 5678
.... ....
etc, etc, etc. I need to keep all of the "Dataset_1" sets together, meaning "Dataset_1_1" needs to be with "Dataset_1_34" but the split needs to be made as soon as "Dataset_2_1" is detected/read. Unfortunately, the number of rows between "Dataset_1" and "Dataset_2" isn't known (millions of rows) and each Dataset is differently sized, so I need to primarily split them up based on names.
Can Matlab "read" the first column of lines, find where "Dataset_1_1", "Dataset_2_1", "Dataset_3_1", etc. is and split them at those points and then save each to a new dat file?

Answers (1)

dpb
dpb on 2 Aug 2021
Edited: dpb on 2 Aug 2021
This is where a filter is probably best given size of file and unknown numbers between sections...and since don't need to have anything but a single record at a time...
fid=fopen('inputfile.dat');
fnum=1;
fout=compose("Dataset%04d.dat",fnum); % initial output file
fod=fopen(fout,'w'); % open it for writing
fnum=fnum+1; % ready for next file
linechk=compose("Dataset_%d",fnum); % next set indicator string
while ~feof(fid)
l=fgets(fid); % get input line w/ \n
if contains(l,linechk) % found the new test record
fod=fclose(fod); % close the finished test file
fout=compose("Dataset%04d.dat",fnum); % next output file
fod=fopen(fout,'w'); % open it for writing
fnum=fnum+1; % get ready for next file
linechk=compose("Dataset_%d",fnum); % next set indicator string
end
fprintf(fod,'%s',l); % echo line from input to output file
end
fclose('all') % close both files
"Air code", untested but think it's close...

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!