MATLAB Answers

Combine three columns with comma for a lot CSV files

112 views (last 30 days)
Ehsan Shooshtari
Ehsan Shooshtari on 23 Jul 2020 at 21:34
Commented: Ehsan Shooshtari on 2 Aug 2020 at 16:20
Thank you Friends
I am working to write a script which it needs to open the folder of my CSV files, more than 1000 files, combine three spacific columns and prepare an xls file as result. Every CSV, 16* 15000 (raw, Col) with same headers after combine and it require to comes up with an xls file 1000 raws and the combine column.(1000*1)(a raw for every CSV).
(I prepared three simple CSV, folder of Examps, which I attach here to exam the script on them and then I need to deal with all 1000 CSV.
Up to now after I import the CSV in Matlab but I couldn't ask to find specific columns for combine.)
clc
close all
fileDir = 'D:\Examps';
outfile = 'D:\XYZ_1.1000.csv';
addpath(fileDir);
fileNames = dir(fileDir);
fileNames = {fileNames.name};
fileNames = fileNames(cellfun(...
@(f)contains(f,'.csv'),fileNames));
%x = xlsread(filenames);
for f = 1:numel(fileNames)
[~, ~, raw] = xlsread(fullfile(fileDir, fileNames{f}))
xlswrite(outfile, raw, fileNames{f})
end.
*******************The rest is my effort to handle it without success*********************
((((((((((((((((((((((((((fid = fopen('.csv');
g = tableread(fid,'%s ' , 'delimiter','\n');
fclose(fid)
c = g{1,1}(10:16);
c = str2num(cell2mat(c));
c = c(:,3); %reads third column
%combine = cat(:,'D','E','F')
%c = f{1,1}(10:16)
%c = c(:,4) %reads third column
% X = f(:,1)
% out = strcat(X,E,F)
%[~,~, col] = xlsread (combine)
%result = fileNames(:,4);
end
%fileNamesTemp = fileNames(:,[4:6])
%for f = 1:numel(fileNames)
%[~, ~, raw] = xlsread( fullfile(fileDir, fileNames{f}));))))))))))))))))))))

  0 Comments

Sign in to comment.

Accepted Answer

dpb
dpb on 23 Jul 2020 at 23:51
Edited: dpb on 24 Jul 2020 at 12:23
fileDir = 'D:\Examps';
d=dir(fullfile(fileDir,'*.csv');
for i = 1:numel(d)
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=data(:,3);
end
OK, the above will read all your .csv files in the directory and leave you with the thrid column for each in turn.
I couldn't decipher the desired output form -- one long column or an array of however many columns there are files...?
That controls how to do the write or catenation after reading...can finish up when get clarification of the desired output.

  18 Comments

dpb
dpb on 28 Jul 2020 at 23:54
display('XYZ(i)'); just Shows "XYZ" .
That's what you told it to display. 'XYZ(i)' is a literal string, not a variable.
fprintf shows Error of "Function is not defined for 'cell' inputs."
Yes. Documented. Read the doc for fprintf and cell arrays. Are specialized functions to write cell arrays now; otherwise must dereference the cell array content.
Ehsan Shooshtari
Ehsan Shooshtari on 29 Jul 2020 at 17:15
Hello Dear dpb
Thanks for your time
I solved the order of XYZ, every x atech to the y and z of the point, just I have two problems.
  1. The output just store in one column, not a seperated column for every csv.
  2. The result for first csv is completly correct but for second csv, start with showing first csv and following by second, and for third , show the whole result.
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
XYZ=cell(1,N); % allocate cell array of Nx1 elements
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
XYZ(i)={data(:,1:3).'}; % select the three columns only, put in XYZ
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
a=XYZ; % save the output to a variable
fprintf(fid,'%f\n',a{:});
end
fid=fclose(fid);
dpb
dpb on 29 Jul 2020 at 18:51
Of course, on both...
  1. you converted from an array to a cell array so the repeat count in the format string is that array size, not the size of the data in the cell array total, and
  2. you're saving the elements of the cell array each pass through the loop so every loop has one more cell containing somthing -- but you write the entire cell array every time by using a{:} (and there's no need for the temporary variable, anyway)
It's still not clearly defined just how you want things written; again, you built a cell array to keep data together that belongs together; if you then try to write this out on a record basis you're going to destroy all that. The example you gave had, I think I recall, some four sets of 3 coordinates--if you string that together you'll have 12 elements on a record. But if you string all 1000 or so files together, then that would end up as being some 12,000 element-long record that makes no sense.
Why not build the XYZ cell array and then just SAVE() it and LOAD() it again by whatever subsequently needs it?
CSV files are very inefficient and bulky and don't carry full precision unless explicitly write like 15 decimal digts which really adds to the bulk.
As noted from the git-go, the needs here are just not at all clear as to what it is that is the needed end result but one has to guess it's not an optimal way to go about achieving whatever that end objective is.

Sign in to comment.

More Answers (1)

Ehsan Shooshtari
Ehsan Shooshtari on 31 Jul 2020 at 1:57
Hello
Dear MR. dpb
Thank you for response
I was figured out, you was right about XYZ order and I came back to previous arrange. However, I meet two more problems.
First, The below program repeat result for example after write the XYZ again in the same row repeat the numer without change.
Second, I ran this script with actual data finally comes up with a bulky exell 6.5 gig and my excell can not open it.
..............................................
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ_12000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
XYZ=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=XYZ(:,1:3) % select the three columns only
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
fprintf(fid,fmt,XYZ);
end
fid=fclose(fid);
Best

  2 Comments

dpb
dpb on 31 Jul 2020 at 2:38
"with actual data finally comes up with a bulky exell 6.5 gig"
That's what I told you from the beginning trying to do what you're doing in using csv text files would create very large files...
I suggested two possible workarounds--
  1. Process each file as it is read instead of trying to merge them all at once. Most algorithms are sequential so is at least moderately likely can do so if think through what is that is the end result needed;
  2. If, indeed, it is mandatory to have all data at one time, use .mat files to save the intermediate and load into what code needs the result.
If must for some unfortunate reason use Excel you'll simply have to cut the size down to what it can handle one way or the other...

Sign in to comment.