MATLAB Answers

0

Extract columns from multiple csv files

Asked by Paul Broadley on 14 Aug 2019
Latest activity Answered by dpb
on 14 Aug 2019
I have a folder containing 27 csv files, of which contain 6 columns each. (I have 32 of these folders, hence my reluctance to do this manually).
I must do some signal processing on one column (column 3) of each file, so I want to write a code allowing me to extract all 3rd columns from each file in the given folder. This can either be saved seperately or as one large table/ array.
My code reads like this (for the first folder: A100):
A100dir = 'C:\Work\TESTING\A100'
A100 = dir(fullfile(A100dir, '*.csv'))
A100 = {A100}
String = cell(1,numel(A100))
n = length(A100)
for i = 1:n
data = csvread(A100(i))
data = readtable(files{i})
colL = data(:,3)
end

  2 Comments

dpb
on 14 Aug 2019
Are the record lengths in each file the same or different? If the latter, is wanted to have all same or keep differences? If each is same length, then a 2D array is the simplest storage mechanism by far.
Or, of course, if you're processing these files for this particular column just once, there would be no need to actually create the secondary file/data array altogether unless it requires more than one signal at a time to do the computation. If not, just go ahead and process each when you read it and go on and save all the memory needed to build the composite array.
Of course, if you are going to need to reprocess all these data multiple times it makes sense to consolidate it...in that case, one presumes that perhaps there would be a need to pick another column besides the third? So, one should write the code to be able to pick the desired column(s) with user input, not hardcoded in the code itself (which is a general tenet to not bury "magic numbers" in code whatever their purpose if there's any chance at all the number could change).
They are of different lengths in each circumstance.
I will need to do a similar process for another column (column 6) and the other columns can be ignored.
It will be required for further analysis later on but for now it will be just the one time analysis. Each aforementioned folder corresponds to a different instance of parameters.

Sign in to comment.

2 Answers

Answer by Neuropragmatist on 14 Aug 2019

Why not just concatenate the data - or am I missing something?
If in the end you want a single vector to analyse you can just concatenate without preallocation (forgive me matlab):
all_data = [];
for i = 1:n
data = csvread(A100(i));
data = readtable(files{i});
all_data = [all_data; data(:,3)]; % if you know the size of data before you can preallocate all_data for better performance
end
Although I would suggest adding a column to keep track of which file what data came from:
all_data = [];
for i = 1:n
data = csvread(A100(i));
data = readtable(files{i});
all_data = [all_data; ones(size(data(:,3))).*i data(:,3)]; % if you know the size of data before you can preallocate all_data for better performance
end
Or you could collect the data in a cell array and use cellfun later to apply your analysis to each cell/data set:
all_data = cell(n,1);
for i = 1:n
data = csvread(A100(i));
data = readtable(files{i});
all_data{i,1} = [ones(size(data(:,3))).*i data(:,3)];
end
Is any of that close to what you want?
I think I would agree with dpb that you should probably load all the data into one big array, probably in a table for convenience. Then just extract the third column from that. This might save time in the long run because you only have to loop over and load files once and it would be easier for users to make modifications to the code in future.
M.

  0 Comments

Sign in to comment.


Answer by dpb
on 14 Aug 2019

targetColumn=3; % don't bury magic numbers, use input() to set, maybe...
rootdir = 'C:\Work\TESTING'; % set the root directory location (could use uigetfile)
dirName = 'A*'; % base pattern for the directories to traverse
dDir=dir(fullfile(rootdir,dirName,'.')); % return the directories of interest A*
for i=1:numel(dDir) % iterate over directories found
d=dir(fullfile(rootdir,dDir(i).Name,'*.csv')); % and pick up all the .csv files therein...
for j = 1:numel(d)
data = csvread(fullfile(rootdir,dDir(i).Name,d(j).name)); % read the jth file in ith directory
res=youranalysisfunction(data(:,targetColumn)); % do analysis of target column
% insert code to save the analysis results here...
...
end
The above does the one-pass, do the analysis while traversing method...warning air-code may be typos, etc., ...
If you do want to save the given column across all for future reference, you can just create a 2D cell array and then inside the inner loop after csvread
alldata{i,j}=data(:,targetColumn)); % save target column given subdir/file as cell array

  0 Comments

Sign in to comment.