Splitting a file into multiple files based on trigger words in the first column.

2 views (last 30 days)
Scott  Spurgeon
Scott Spurgeon on 2 Aug 2021
Edited: dpb on 2 Aug 2021
I've got a large set of data (.dat) that I need to split whenever a specific text string is mentioned. For example, I've got:
Dataset_1_1 Set Number
1234 1234
.... ....
Dataset_1_2 Set Number2
5678 5678
.... ....
[I need to make the split here]
Dataset_2_1 Set Number
1234 1234
.... ....
Dataset_2_2 Set Number2
5678 5678
.... ....
etc, etc, etc. I need to keep all of the "Dataset_1" sets together, meaning "Dataset_1_1" needs to be with "Dataset_1_34" but the split needs to be made as soon as "Dataset_2_1" is detected/read. Unfortunately, the number of rows between "Dataset_1" and "Dataset_2" isn't known (millions of rows) and each Dataset is differently sized, so I need to primarily split them up based on names.
Can Matlab "read" the first column of lines, find where "Dataset_1_1", "Dataset_2_1", "Dataset_3_1", etc. is and split them at those points and then save each to a new dat file?

Answers (1)

dpb
dpb on 2 Aug 2021
Edited: dpb on 2 Aug 2021
This is where a filter is probably best given size of file and unknown numbers between sections...and since don't need to have anything but a single record at a time...
fid=fopen('inputfile.dat');
fnum=1;
fout=compose("Dataset%04d.dat",fnum); % initial output file
fod=fopen(fout,'w'); % open it for writing
fnum=fnum+1; % ready for next file
linechk=compose("Dataset_%d",fnum); % next set indicator string
while ~feof(fid)
l=fgets(fid); % get input line w/ \n
if contains(l,linechk) % found the new test record
fod=fclose(fod); % close the finished test file
fout=compose("Dataset%04d.dat",fnum); % next output file
fod=fopen(fout,'w'); % open it for writing
fnum=fnum+1; % get ready for next file
linechk=compose("Dataset_%d",fnum); % next set indicator string
end
fprintf(fod,'%s',l); % echo line from input to output file
end
fclose('all') % close both files
"Air code", untested but think it's close...

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!