MATLAB Answers

EL
0

How do I spit a .txt and save the individual split files in a folder with numerical suffixes?

Asked by EL
on 23 Aug 2019
Latest activity Commented on by dpb
on 24 Aug 2019
EDIT: I know it's long, but I'm just tyring to give the scope of everything. Essentially I'm trying to figure out how to;
1) create loop to automatically split multiple files (each not comnpletely divisible by 72,000,000) by 72,000,000 lines, with any file smaller than that being ignored. These would be saved sequentially in a new subfolder, with numerical suffixes.
2.) Create loop to then read every file that was split in order and have another script read it. The file name to save them would also happen in a new subfolder with sequencial suffixes.
Thanks!
----------------------------------------------------------------------------------------------------------------------------
Hey guys,
I'm pretty new to MatLab, and I'm starting to understand the structure of how code is written but I still don't understand most of the 'words' of the language. I was looking for some help to do the following. Any help is greatly appreciated, thanks!
PS: Attached is a portion of a file I'm looking to split.
Step 1) Select files to use
The first step of this process is to use the following to generate a list that can be easily loaded into a linux system.
clear
close all
clc
[FileNames PathNames]=uigetfile('Y:\FOLDER\*.txt', 'Choose files to load:','MultiSelect','on');
prompt = 'Enter save-name according to: file_mmddyyyy_signal ';
Filenamesave = input(prompt,'s');
Filenamesave = strcat(PathNames,Filenamesave,'.mat');
PathNames=strrep(PathNames,'L:','LabData');
PathNames=strrep(PathNames,'\','/');
PathNamesSave=strcat('/',PathNames);
save(Filenamesave,'FileNames','PathNames','PathNamesSave');
This one works, so step 1 is good to go.
Step 2) Select generated file from step 1 in MatLab in Linux system remotely from command line, and split very large single column .txt file into smaller .txt files that are small enough to be opened and processed
I cannot seem to find a command that splits a .txt column by lines. I intend to have the following questions answer what the size of the file will be
%% Retrieve Files
prompt = 'Enter .mat file to be loaded'; %This is what's generated in "Step 1)"
files = input(prompt,'s');
load(files);
%% Display the files selected in order. Confirm their Correct
for i=1:numel(FileNames)
clear ii
c=class(FileNames{i});
if c=='char'
FileNames{i}=cellstr(FileNames{i});
end
for ii=1:numel(FileNames{i})
disp(['User selected ', fullfile(PathNames{i}, FileNames{i}{ii})])
end
end
%% Save Files
prompt = 'Enter new file name (mmddyyy_bug_media_oC)';
Filenamesave=cellstr(input(prompt, 's'));
%%Split Files
prompt = 'Enter the bin time for file splitting in minutes [60min]';
filesplit=input(prompt, 's');
if isempty (filesplit)
filesplit=60;
end
So there will be multiple files loaded, anwhere between 1-->infinity. Each infividual file will have a teeny tiny bit of excess data that I don't need (I let my experiments run just slightly longer, just in case).
Here's what I'm thinking. The files loaded would be split, one file at a time, by lines according to the following;
(60*20000*filesplit)
Then each file split would only be saved if the number of lines of the split were equal to (60*20000*splitfile). If the file is less than that, the the splitting function for that file would end, and the next file in the .mat file from %%Retrieve Files would then continue to be split. Each of these files will vary in size, so I'm hoping to have a script that isn't size dependant, and any excess 'fat' from each individual file get's cleaved off.
The saving scheme of the files would also occur in order. If I was looking to split a 62 minute data segment names DATASEG1 into 30 minute files, I'm hoping for the datafiles to be saved as 'DATASEG1_01' and 'DATASEG1_02', with the final 2 minutes of data removed before the program moved to split the file. Then the next 62 minute file, DATASEG2, would be cut into DATASEG2_01 and DATASEG2_02, and so on until it's empty.
I'm thinking those files could be saved to a newly generated subfolder within the folder I loaded the data from in %%Retrieve Files.
Step 3) Load the split files in ascending order, and having their names autopopulate what I'm manually doign below.
This is where I get extra lost. I have no idea how to turn this into a loop. Here's a brief rundown of what's happening
I have a script written by a MatLab guru that does some crazy analysis that I'm not going to touch. That script works, but it requires input. Below is a script that will fill in the inputs (signal, epflplot, rawplot, fftplot, ft, bt), and then run the script "MatLabScripttorun" to process the data. It has to happen in segments because otherwise the linux system crashes. I'm hoping to somehow create a loop that runs all the files in the subfolder created in step 2. I'm unsure exactly how to do this. Below is what I've been autofilling. You coudl probably imagine that doing this 48 times or more is very tedious and annoying.
myfile = 'DATASEG1_01';
Filenamesave = 'DATASEG1_Processed_01';
signal = 'DEF';
epflplot = 'N';
rawplot = 'Y';
fftplot = 'Y';
ft = '20000';
bt = '30';
MatLabScripttorun(myfile, Filenamesave, signal, epflplot, rawplot, fftplot, ft, bt);
clear
clc
close all
%%
myfile = 'DATASEG1_02';
Filenamesave = 'DATASEG1_Processed_02';
signal = 'DEF';
epflplot = 'N';
rawplot = 'Y'
fftplot = 'Y'
ft = '20000';
bt = '30';
MatLabScripttorun(myfile, Filenamesave, signal, epflplot, rawplot, fftplot, ft, bt);
clear
clc
close all
%%
myfile = 'DATASEG2_01';
Filenamesave = 'DATASEG2_Processed_03';
signal = 'DEF';
epflplot = 'N';
rawplot = 'Y'
fftplot = 'Y'
ft = '20000';
bt = '30';
MatLabScripttorun(myfile, Filenamesave, signal, epflplot, rawplot, fftplot, ft, bt);
clear
clc
close all
%%
myfile = 'DATASEG2_02';
Filenamesave = 'DATASEG2_Processed_04';
signal = 'DEF';
epflplot = 'N';
rawplot = 'Y'
fftplot = 'Y'
ft = '20000';
bt = '30';
MatLabScripttorun(myfile, Filenamesave, signal, epflplot, rawplot, fftplot, ft, bt);
Thanks for the help guys. Hoe you have a great weekend!

  1 Comment

That's all relatively easily do-able as you described -- the general idea of processing mutiple sequential files is addressed in the doc or the FAQ under the i/o section altho I forget the exact links at the moment but I'm sure a little poking around should find the sections. The same general ideas work for the section to process -- one thing to help greatly is to get read of the clear so you don't wipe out variables you've already set for a given iteration before the next. Then you could just read the array of names and index through them in a counted loop using the loop counter to refer to the next in sequence.
However, instead of all that grief, I'd suggest at least consider looking at the script you're trying to run and consider either memmapfile or tall arrays to be able to process the files you already have. TMW built tools for this purpose to be able to bypass such machinations.

Sign in to comment.

Products


Release

R2018a

0 Answers