Combine three columns with comma for a lot CSV files

Thank you Friends
I am working to write a script which it needs to open the folder of my CSV files, more than 1000 files, combine three spacific columns and prepare an xls file as result. Every CSV, 16* 15000 (raw, Col) with same headers after combine and it require to comes up with an xls file 1000 raws and the combine column.(1000*1)(a raw for every CSV).
(I prepared three simple CSV, folder of Examps, which I attach here to exam the script on them and then I need to deal with all 1000 CSV.
Up to now after I import the CSV in Matlab but I couldn't ask to find specific columns for combine.)
clc
close all
fileDir = 'D:\Examps';
outfile = 'D:\XYZ_1.1000.csv';
addpath(fileDir);
fileNames = dir(fileDir);
fileNames = {fileNames.name};
fileNames = fileNames(cellfun(...
@(f)contains(f,'.csv'),fileNames));
%x = xlsread(filenames);
for f = 1:numel(fileNames)
[~, ~, raw] = xlsread(fullfile(fileDir, fileNames{f}))
xlswrite(outfile, raw, fileNames{f})
end.
*******************The rest is my effort to handle it without success*********************
((((((((((((((((((((((((((fid = fopen('.csv');
g = tableread(fid,'%s ' , 'delimiter','\n');
fclose(fid)
c = g{1,1}(10:16);
c = str2num(cell2mat(c));
c = c(:,3); %reads third column
%combine = cat(:,'D','E','F')
%c = f{1,1}(10:16)
%c = c(:,4) %reads third column
% X = f(:,1)
% out = strcat(X,E,F)
%[~,~, col] = xlsread (combine)
%result = fileNames(:,4);
end
%fileNamesTemp = fileNames(:,[4:6])
%for f = 1:numel(fileNames)
%[~, ~, raw] = xlsread( fullfile(fileDir, fileNames{f}));))))))))))))))))))))

 Accepted Answer

dpb
dpb on 23 Jul 2020
Edited: dpb on 24 Jul 2020
fileDir = 'D:\Examps';
d=dir(fullfile(fileDir,'*.csv');
for i = 1:numel(d)
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=data(:,3);
end
OK, the above will read all your .csv files in the directory and leave you with the thrid column for each in turn.
I couldn't decipher the desired output form -- one long column or an array of however many columns there are files...?
That controls how to do the write or catenation after reading...can finish up when get clarification of the desired output.

18 Comments

Thanks a lot for responding.
First, below parts makes error,
data=readmatrix(fullfile(fileDir,d(1i).name),'NumHeaderLines',1);
error is " Array indices must be positive integers or logical values".
Second, I need to combine column 3 and 4 and 5 as DEF in a column.
Third, The desire output is an xls file 1000 raws and the combine column.(1000*1)(a raw for every CSV), means a long raw for every CSV.
Fourth, every CSV contain, 16* 15000 (raw, Col) with same headers.
Fifth, This part " xlswrite(outfile, raw, fileNames{f})"
prepare output for my script but make it an xls with combine CSV.
Appreciate
for i = 1:numel(d)
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
...
"I need to combine column 3 and 4 and 5 as DEF in a column."
Don't know precisely what that means...
data=data(:,3:5); % the threecolumns as array
data=data(:); % as a single column???
or
data=data(:,3:5); % the threecolumns as array
data=data(:).'; % as a single row???
Still can't decipher the rest of the wants...show an example of expected output from the provided inputs.
Thank you for your responding.
This work for me
data=data(:).'; % as a single row???
however, combine means mearge like X,Y,Z, all three in a column.
Exactly add content of every cell of column D,E,F and make new single cell for three cells of xyz
example:
X,Y,Z mearge by comma
0.062879,10.59602,-2.83924
0.077432,12.32359,-2.84518
0.093602,14.4944,-2.81749
0.121658,17.87279,-2.83084
finally, it require to comes up with an xls file 1000 raws.(1000*1)(a raw for every CSV) content of X,Y,Z
  1. Load XYZ information from csv files to a Cell structure. The output of this step is a 1× 1000 Cell and each cell records the selected information from each csv.
  2. For each cell/frame, divide the region of interest (ROI) space (X: -100 m to 100m; Y:-100m to 100m; Z: -5m to 5m) into continuous cubes (0.1m cube length).
I updated the upper scrip so, maybe that will help.
Appreciate
"This work for me
data=data(:).'; % as a single row???
however, combine means mearge like X,Y,Z, all three in a column."
We still have a communications issue -- the above will/does combine three columns into a single ROW vector, NOT a column (a column is vertical, a row horizontal in geometric terms to ensure talking same meanings). So, if the above does work, why the subsequent reference to column?
There's no need for referrring to commas; that'll happen automagically if you write the data to a file using the proper syntax to write a csv file. Don't get distracted by external representation as compared to in memory.
Carrying on from the above with the presumption of rows, then there's no need for a cell array.
fileDir = 'D:\Examps'; % the target data directory
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=reshape(data(:,3),1,[]); % turn into long row vector
% Do whatever to the frame here...
end
Unfortunately, I've no idea now what the last new request has to do with anything having gone before -- if the three columns were x,y,z coordinates, having just smooshed them all together into one long array has made for trying to separate out pieces far more complicated that was to begin with.
To bindata by coordinates, look at discretize or histcounts but is definitely going to be confused if have done the above on the coordinate arrays first.
It's just completely unclear what is the end objective here to be able to see way to the effective solution and what it is you think would be accomplished by the arrangement into column/vector.
I am sorry about my mistake, Up to this step we have one row for every XYZ .
For next stage to reach a 1*1000 matrix, I need to merge all the row of xyz to reach a row for every CSV.
Also, I ran your code and face a error of
'Invalid expression. When calling a function or indexing a variable, use parentheses. Otherwise, check for mismatched'.
Thanks
Read the error message -- looks like there is an unbalanced parentheses somewhere...that happens when typing in code at the terminal; sometimes one miscounts...
The error will be associated with a specific line; see what isn't balanced on that line.
(HINT: I spotted it and already fixed above; it's good practice to find such things on own...if don't find it quickly just looking, compare to what's there now.)
Thank you for your patience and hospitality, I am so sorry about last accusation, I was very tierd to check the last error.
Up to now, the script gives us X , Y and Z and I yesterday worked on it to arrange XYZ stick togather from a row in one cell and after that I need to store all XYZ in a cell of an Exell file. I bring another parapherase of what I mean from one of my friend here, maybe help to figure it out.
"Each cell will have all the XYZ data of a single CSV on it. It means, you will have a 1*3 cell where 1x1 cell will contain the 4 or however much number of coordinates are in the CSV in that cell. Then the 1x2 position cell will again have all the XYZ information of the coordinates of the second excel file and so on".
I also had writng problem on my xls output, because I could not adress correctly where the program map the result without removing former data.
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ_1.1000.csv';
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1); % show the whole Data
j=1:N;
dataXYZ = reshape(data(:,j),N+1,[])
xlswrite(outfile,dataXYZ,i,'A1:C3');
%dataY = reshape(data(:,2),1,[]); % turn into long row vector
%dataZ = reshape(data(:,3),1,[]);
%G(1)= mcat(3,'dataX','dataY','dataZ')
end.
Best Regards
No problem on the syntax error -- just a little nudge that it'll be faster (and you'll learn MATLAB more quickly along the way) if you try to find syntax errors and such as this on own instead of waiting for someone on the forum to see and respond. Sorry I made the typo and left off the closing paren, but you'll make such typos or overisghts, too, so practice is good! :)
There's still a problem in the above description though -- xlswrite will NOT write a .csv file; what it would do as written above is write a .xls file with the file extension as .csv instead of .xls which then confuses Excel royally if try to open the file in Excel.
As the doc <XLSWRITE() Documentation> clearly says, xlswrite is not recommended unless you're using a release earlier than R2019a and if so, the caveat above still holds.
To write the above row vectors to a .csv file, without making a big internal array to store them all first, use fprintf
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ_1.1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1); % show the whole Data
data = reshape(data(:,j),1,[]);
fmt=[repmat('%f,',1,numel(data)-1) '%f\n'];
fprintf(fid,fmt,data)
end
fid=fclose(fid);
will write the XYZ array from each file in a row vector sequentially to a .csv file
Why you would want to do this still baffles me entirely, but...
Thank you for your response.
1- I have question about the output of upper code. I bring the output here.Is this output means all data of every CSV are in the one cell? or need another changes? I highlighted the first output Row.
0.0629 0.0774 0.0936 0.1217 ( all X's of a csv) 10.596 12.3236 14.4944 17.8728 ( all Y's of a csv) -2.8392 -2.8452 -2.8175 -2.8308 ( all Z's of a csv)
0.074015 0.090223 0.108839 0.140359 10.60174 12.30791 14.50215 17.87068 -2.8408 -2.84158 -2.81901 -2.83053
0.020369 0.027953 0.097688 0.035417 10.60971 12.3199 43.0548 14.49466 -2.84287 -2.84428 2.256413 -2.81749
2- I see all the X's attach togather , Y's and Z's. However, because of XYX are coordinator points would it be possible to change the code somehow to print every X,Y,Z stick togather?
Best Regards and Wishes
Just transpose first...
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ_1.1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
XYZ=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
XYZ=data(:,1:3).'; % select the three columns only, transpose
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
fprintf(fid,fmt,XYZ)
end
fid=fclose(fid);
That makes more sense than the previous request...the above also takes only three columns from the sample file; there are actually six but the data are duplicated. If need all sets of points, just remove the expression and transpose XYZ when read it in.
Remember MATLAB is column-oriented storage in memory -- referring to the array as a whole in fprintf will output the entire array in that sequence; no explicit reshape is even needed; just reorder in memory by the transpose operation.
BTW, if you know there are always the same number of elements in every file to write (and the above with the three columns selected always will be), then the format string can be moved outside the for...end loop for efficiency -- but will need the length number set statically first, too, of course.
Dear Mr.dpb
Thank you for your response
first, I looked at the result and figureout this part " XYZ=data(:,1:3).'; " just working for the first csv and repeat the result instead of the rest data. BTW, I am trying to figure out the meaning of "will need the length number set statically first"
second, I brought the exact project note here which is required to meet, writen by a native for more clearancy of my final output. ( the rest of the project)
  1. Load XYZ information from csv files to a Cell structure. The output of this step is a 1× 1000 Cell (1000 csv files) and each cell records the selected information from each csv.
  2. For each cell, divide the region of interest (ROI) space (X: -100 m to 100m; Y:-100m to 100m; Z: -5m to 5m) into continuous cubes (0.1m cube length) .
Appreciate
If you want a cell array, enclose the three elements in braces ("the curlies") on each pass through the loop. That has nothing to do with trying to write a csv file, though.
fileDir = 'D:\Examps'; % the target data directory
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
XYZ=cell(1,N); % allocate cell array of Nx1 elements
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
XYZ(i)={data(:,1:3)}; % select the three columns only, put in XYZ
end
The above will leave with the cell array; the XYZ coordinates in the same orientation as the original; that would seem to actually make some sense to have done.
Dear dpb
Thank you for response
About last coding, ouput is out of reach after I try fprintf -fopen and
dispalay.
display('XYZ(i)'); just Shows "XYZ" .
fprintf shows Error of "Function is not defined for 'cell' inputs."
.................
fileDir = 'D:\Examps'; % the target data directory
%outfile = 'D:\XYZ1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
XYZ=cell(1,N); % allocate cell array of Nx1 elements
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
XYZ(i)={data(:,1:3)}; % select the three columns only, put in XYZ
%display('XYZ(i)');
fprintf(fid,fmt,XYZ)
end
fid=fclose(fid);
.....................
Also I want to use those XYZ for background filtering and normaly with
upper threshold for X, Y, Z will show some fudamental countor line(Intresting lines). Any Idea.
Best
display('XYZ(i)'); just Shows "XYZ" .
That's what you told it to display. 'XYZ(i)' is a literal string, not a variable.
fprintf shows Error of "Function is not defined for 'cell' inputs."
Yes. Documented. Read the doc for fprintf and cell arrays. Are specialized functions to write cell arrays now; otherwise must dereference the cell array content.
Hello Dear dpb
Thanks for your time
I solved the order of XYZ, every x atech to the y and z of the point, just I have two problems.
  1. The output just store in one column, not a seperated column for every csv.
  2. The result for first csv is completly correct but for second csv, start with showing first csv and following by second, and for third , show the whole result.
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
XYZ=cell(1,N); % allocate cell array of Nx1 elements
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
XYZ(i)={data(:,1:3).'}; % select the three columns only, put in XYZ
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
a=XYZ; % save the output to a variable
fprintf(fid,'%f\n',a{:});
end
fid=fclose(fid);
Of course, on both...
  1. you converted from an array to a cell array so the repeat count in the format string is that array size, not the size of the data in the cell array total, and
  2. you're saving the elements of the cell array each pass through the loop so every loop has one more cell containing somthing -- but you write the entire cell array every time by using a{:} (and there's no need for the temporary variable, anyway)
It's still not clearly defined just how you want things written; again, you built a cell array to keep data together that belongs together; if you then try to write this out on a record basis you're going to destroy all that. The example you gave had, I think I recall, some four sets of 3 coordinates--if you string that together you'll have 12 elements on a record. But if you string all 1000 or so files together, then that would end up as being some 12,000 element-long record that makes no sense.
Why not build the XYZ cell array and then just SAVE() it and LOAD() it again by whatever subsequently needs it?
CSV files are very inefficient and bulky and don't carry full precision unless explicitly write like 15 decimal digts which really adds to the bulk.
As noted from the git-go, the needs here are just not at all clear as to what it is that is the needed end result but one has to guess it's not an optimal way to go about achieving whatever that end objective is.

Sign in to comment.

More Answers (1)

Hello
Dear MR. dpb
Thank you for response
I was figured out, you was right about XYZ order and I came back to previous arrange. However, I meet two more problems.
First, The below program repeat result for example after write the XYZ again in the same row repeat the numer without change.
Second, I ran this script with actual data finally comes up with a bulky exell 6.5 gig and my excell can not open it.
..............................................
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ_12000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
XYZ=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=XYZ(:,1:3) % select the three columns only
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
fprintf(fid,fmt,XYZ);
end
fid=fclose(fid);
Best

2 Comments

"with actual data finally comes up with a bulky exell 6.5 gig"
That's what I told you from the beginning trying to do what you're doing in using csv text files would create very large files...
I suggested two possible workarounds--
  1. Process each file as it is read instead of trying to merge them all at once. Most algorithms are sequential so is at least moderately likely can do so if think through what is that is the end result needed;
  2. If, indeed, it is mandatory to have all data at one time, use .mat files to save the intermediate and load into what code needs the result.
If must for some unfortunate reason use Excel you'll simply have to cut the size down to what it can handle one way or the other...

Sign in to comment.

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!