how to separate a large text file into individual text files based on date, but also export headers?

Hi There,
I have a large text file with some data with the following headings, these run from 1991 to 2019, and i want to extract and generate individual text files , but keep the headings for later analysis. I also want the file name to be saved in a certain format based on the date.
I'm still pretty new to Matlab so any suggestions/ guidence would be great !
Easting Northing Elevation C hainage FC Profile_reg_ID Survey date
258574.81 309931.38 5.033 -65.24 ZZ 20 02/01/1992
Many thanks,
Alex

6 Comments

Hi,
Here's an example of one year, I need to seperate them by dates, so each individual date is stored as a seperate text file with the file name surveyID_YYYYMMDD.
Thank you in advance for any suggestions.
A
hello
in the text file provided , the dates appear with a strange format , not what you originaly posted above
did you make some export from excel (xlsx) to txt file ?
Easting Northing Elevation Chainage FC Profile_reg_ID Survey date
259354.73 319311.53 3.841 -58.24 ZZ 31 33588
oh yes, the original file didn't have the coulumns in the order i needed, so I moved them and exported this example.
Would the original txt be better ?
Oh wait, that just seems to be the odd one out,
here is one for 1992
", I need to seperate them by dates, so each individual date is stored as a seperate text file..."
That's easily-enough done, but I'd ask "Why?" create a zillion different files to have to process with all that extra overhead and code to deal with instead of just processing the data by whatever combination of variables needed? findgroups and groupsummary or rowfun are extremely powerful for such tasks...

Sign in to comment.

 Accepted Answer

Try something like this —
files = {'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1276100/Gwynedd_1991.txt';'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1276165/Gwynedd_1992.txt'}
files = 2×1 cell array
{'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1276100/Gwynedd_1991.txt'} {'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1276165/Gwynedd_1992.txt'}
for k1 = 1:numel(files)
% fprintf('k1 = %d',k1)
T{k1} = readtable(files{k1}, 'VariableNamingRule','preserve');
if isnumeric(T{k1}.('Survey date'))
Date = datetime(T{k1}.('Survey date'), 'ConvertFrom','excel');
else
Date = T{k1}.('Survey date');
end
[G,Y,M,D] = findgroups(year(Date),month(Date),day(Date));
Gu = unique(G);
for k2 = 1:numel(Gu)
% fprintf('k2 = %d',k2)
SurvID = T{k1}.Profile_reg_ID(G==k2,:);
Dk = Date(G==k2);
[y,m,d] = ymd(Dk(1));
fn{k1,k2} = sprintf('%02d_%4d%02d%02d.txt',SurvID(1,1),y,m,d);
writetable(T{k1}(G==k2,:),fn{k1,k2})
fprintf('Written: %s\n',fn{k1,k2})
end
end
Written: 31_19911216.txt Written: 28_19911217.txt Written: 26_19911218.txt Written: 24_19911219.txt Written: 22_19911220.txt Written: 33_19911223.txt Written: 20_19920102.txt Written: 38_19920103.txt Written: 41_19920106.txt Written: 34_19920114.txt Written: 36_19920115.txt Written: 43_19920116.txt Written: 46_19920117.txt Written: 18_19920128.txt Written: 16_19920129.txt Written: 14_19920130.txt Written: 11_19920203.txt Written: 10_19920204.txt Written: 01_19920205.txt Written: 06_19920213.txt Written: 03_19920214.txt Written: 01_19920908.txt Written: 03_19920909.txt Written: 11_19920910.txt Written: 18_19920911.txt Written: 20_19920914.txt Written: 24_19920915.txt Written: 17_19920919.txt Written: 26_19920921.txt Written: 31_19920922.txt Written: 43_19920923.txt Written: 40_19920925.txt
TR1 = readtable(fn{1,1}, 'VariableNamingRule','preserve')
TR1 = 41×7 table
Easting Northing Elevation Chainage FC Profile_reg_ID Survey date __________ __________ _________ ________ ______ ______________ ___________ 2.5935e+05 3.1931e+05 3.841 -58.24 {'ZZ'} 31 33588 2.5934e+05 3.193e+05 3.533 -40.51 {'ZZ'} 31 33588 2.5933e+05 3.193e+05 3.775 -28.43 {'ZZ'} 31 33588 2.5931e+05 3.1929e+05 3.739 -13.47 {'ZZ'} 31 33588 2.5931e+05 3.1929e+05 4.051 -4.85 {'ZZ'} 31 33588 2.593e+05 3.1929e+05 5.093 -2.09 {'ZZ'} 31 33588 2.593e+05 3.1929e+05 4.099 -0.6 {'ZZ'} 31 33588 2.593e+05 3.1929e+05 4.05 0 {'ZZ'} 31 33588 2.5929e+05 3.1928e+05 3.165 11.28 {'ZZ'} 31 33588 2.5929e+05 3.1928e+05 3.467 11.9 {'ZZ'} 31 33588 2.5928e+05 3.1928e+05 3.311 19.42 {'ZZ'} 31 33588 2.5928e+05 3.1928e+05 2.539 20.95 {'ZZ'} 31 33588 2.5927e+05 3.1928e+05 2.168 28.86 {'ZZ'} 31 33588 2.5926e+05 3.1927e+05 1.454 46.7 {'ZZ'} 31 33588 2.5924e+05 3.1926e+05 1.086 65.01 {'ZZ'} 31 33588 2.5924e+05 3.1926e+05 1.264 66.57 {'ZZ'} 31 33588
TR6 = readtable(fn{2,6}, 'VariableNamingRule','preserve')
TR6 = 89×7 table
Easting Northing Elevation Chainage FC Profile_reg_ID Survey date __________ __________ _________ ________ ______ ______________ ___________ 2.5742e+05 3.2977e+05 8.71 -3.82 {'ZZ'} 43 16/01/1992 2.5742e+05 3.2977e+05 9.317 -3.74 {'ZZ'} 43 16/01/1992 2.5742e+05 3.2977e+05 9.07 0 {'ZZ'} 43 16/01/1992 2.5742e+05 3.2977e+05 8.972 1.75 {'ZZ'} 43 16/01/1992 2.5742e+05 3.2977e+05 9.059 1.75 {'ZZ'} 43 16/01/1992 2.5742e+05 3.2977e+05 9.06 2.34 {'ZZ'} 43 16/01/1992 2.5742e+05 3.2977e+05 8.147 2.4 {'ZZ'} 43 16/01/1992 2.5741e+05 3.2977e+05 4.703 4.29 {'ZZ'} 43 16/01/1992 2.5741e+05 3.2977e+05 3.441 10.86 {'ZZ'} 43 16/01/1992 2.574e+05 3.2977e+05 3.02 15.11 {'ZZ'} 43 16/01/1992 2.5739e+05 3.2977e+05 2.33 27.44 {'ZZ'} 43 16/01/1992 2.5738e+05 3.2978e+05 1.821 41.4 {'ZZ'} 43 16/01/1992 2.5736e+05 3.2978e+05 1.49 54.89 {'ZZ'} 43 16/01/1992 2.5735e+05 3.2978e+05 1.158 73.2 {'ZZ'} 43 16/01/1992 2.5733e+05 3.2978e+05 0.905 93.29 {'ZZ'} 43 16/01/1992 2.5731e+05 3.2979e+05 0.579 113.08 {'ZZ'} 43 16/01/1992
EDIT — (27 Jan 2023 at 16:05)
Adapted original code to accommodate both original files.
.

2 Comments

As always, my pleasure!
I edited my original Answer to include and process both the original and added files.
.

Sign in to comment.

More Answers (1)

So here we go
A code that is certainly not as refined as StarStrider's ....
filename = 'Gwynedd_1992.txt';
[outdata] = readcell(filename, 'DateTimeType', 'text');
[m,n] = size(outdata);
% extract header line and data
header_line = outdata(1,:);
data = outdata(2:m,:);
dates = string(data(:,n));
datesarray = datetime(dates);
% find unique dates
[dates_unic,ia,ic] = unique(datesarray);
% split and save individual data blocks (one file per date)
for ck = 1:numel(dates_unic)
start = ia(ck);
if ck == numel(dates_unic)
stop = m-1;
else
stop = ia(ck+1)-1;
end
thisdate = strrep(dates(start),'/','-');
data_out = [header_line; data(start:stop,:)];
% write to txt file
filename = strcat("surveyID_",thisdate,".txt");
writecell(data_out,filename,"Delimiter","tab");
end

Asked:

on 27 Jan 2023

Commented:

on 27 Jan 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!