Read multiple text files and extract part of data by name

2 views (last 30 days)
Hi,
I have used the below code to read and extract selected data from text file, I used textscan (to read) and
find(~cellfun(@isempty,regexpi(allText,'RainFallID')))
to identify the required data by name. It working well, but If I run 10,000 text files it become dam slow, takes more than three hours. Please kindly help some if there is any faster way.
Sinerely,
clc;
clear all;
clc
tic
FileList=dir('D:\Mekala_Backupdata\Matlab2010\Filesfolder\PartofTextFilesData/');
j=1;
for i=3:1:(size(FileList)) %%read all files from folder of specified dir
FileName{j}=FileList(i).name;
j=j+1;
end
for j=1:size(FileName,2)
fid=fopen(['D:\Mekala_Backupdata\Matlab2010\Filesfolder\PartofTextFilesData/',FileName{j}],'r'); %%opening each files and read each line
allText = textscan(fid,'%s','delimiter','\n');
numberOfLines = length(allText{1});
allText=allText{:};
for k=1:size(allText,1)
idx_RainFallID=find(~cellfun(@isempty,regexpi(allText,'RainFallID')));
idx_LMDName=find(~cellfun(@isempty,regexpi(allText,'LMD Name')));
idx_10Under1=find(~cellfun(@isempty,regexpi(allText,'10 Under Pipe 1.Response Value')));
idx_RainFallIDtemp=allText(idx_RainFallID);
idx_RainFallIDtemp2=regexp(idx_RainFallIDtemp,' +','split');
b(j,1)=str2double(idx_RainFallIDtemp2{1}{1,3});
Variable{1,1}=char(idx_RainFallIDtemp2{1}{1,1});
end
fclose(fid)
end
  2 Comments
Walter Roberson
Walter Roberson on 23 Jan 2016
You pull out idx_LMDName and idx_10Under1 but you do not do anything with them?
You always write over Variable{1,1} instead of storing for each file?
Is it correct that your desired output is the list of filenames, and a vector of the numeric forms of the corresponding RainfallID ?
Kanakaiah Jakkula
Kanakaiah Jakkula on 23 Jan 2016
I want to store the corresponding numeric (if it is nueric) or string if it is text. For example, corresponding to RainfallID its a numeric:"4233", whereas corresponding to LMD Name, its a text i.e.,"NSF4_TTD_55+_1002cm_hh_2122N_2022N_3000N_7899uM_IND". Same way, I want to call by name 10 Under Pipe 1.Response Value, and store the corresponding close value (only numeric, do not want the units). Finally I want to store al in matrix form. Sincerely,

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 23 Jan 2016
The below should be faster. It makes use of some of the more advanced facilities of regexp. It is easy to get the pattern incorrect :(
project_dir = 'D:\Mekala_Backupdata\Matlab2010\Filesfolder\PartofTextFilesData';
FileList = dir(project_dir);
FileName = {FileList.name};
FileName([FileList.isdir]) = []; %get rid of . and .. and other directories
pattern = '(?<=RainFallID\s+:\s+)(?<RainFallID>\d+)|(?<=LMD\s+Name\s+:\s+)(?<LMD_Name>\S+)|(?<=10\s+Under\s+Pipe\s+1\.Response\s+Value\s*\S+\s+[a-zA-Z]+\s+)(?<TenUnder1>\S+)';
numfile = length(FileName);
RainFallID = zeros(numfile,1);
LMD_Name = cell(numfile,1);
Value = zeros(numfile,1);
for K = 1 : length(FileName)
thisfile = fullfile(project_dir, FileName{K});
filecontent = fileread(thisfile);
tokens = regexp(filecontent, pattern, 'names'); %1 x 3 struct with mostly empty entries
RainFallID(K) = str2double( vertcat(tokens.RainFallID) );
LMD_Name{K} = vertcat(tokens.LMD_Name);
Value(K) = str2double( vertcat(tokens.TenUnder1) );
end
I allowed for some variability in responses, but if the format differs too much you would encounter problems.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!