Combine three columns with comma for a lot CSV files

Question

0 votes

Thank you Friends

I am working to write a script which it needs to open the folder of my CSV files, more than 1000 files, combine three spacific columns and prepare an xls file as result. Every CSV, 16* 15000 (raw, Col) with same headers after combine and it require to comes up with an xls file 1000 raws and the combine column.(1000*1)(a raw for every CSV).

(I prepared three simple CSV, folder of Examps, which I attach here to exam the script on them and then I need to deal with all 1000 CSV.

Up to now after I import the CSV in Matlab but I couldn't ask to find specific columns for combine.)

clc
close all
fileDir = 'D:\Examps';
outfile = 'D:\XYZ_1.1000.csv';
addpath(fileDir);
fileNames = dir(fileDir);
fileNames = {fileNames.name};
fileNames = fileNames(cellfun(...
    @(f)contains(f,'.csv'),fileNames));
%x = xlsread(filenames);
for f = 1:numel(fileNames)
    [~, ~, raw] = xlsread(fullfile(fileDir, fileNames{f}))
    xlswrite(outfile, raw, fileNames{f})
end.
*******************The rest is my effort to handle it without success*********************
((((((((((((((((((((((((((fid = fopen('.csv');
g = tableread(fid,'%s ' , 'delimiter','\n');
fclose(fid)
c = g{1,1}(10:16);
c = str2num(cell2mat(c));
c = c(:,3); %reads third column
    %combine = cat(:,'D','E','F')
    %c = f{1,1}(10:16)
%c = c(:,4) %reads third column
   % X = f(:,1)
   % out = strcat(X,E,F)
    %[~,~, col] = xlsread (combine)
%result = fileNames(:,4);
end
%fileNamesTemp = fileNames(:,[4:6])
%for f = 1:numel(fileNames)
    %[~, ~, raw] = xlsread( fullfile(fileDir, fileNames{f}));))))))))))))))))))))

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 23 Jul 2020

Edited: dpb on 24 Jul 2020

Open in MATLAB Online

0 votes

fileDir = 'D:\Examps';
d=dir(fullfile(fileDir,'*.csv');
for i = 1:numel(d)
  data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
  data=data(:,3);
end

OK, the above will read all your .csv files in the directory and leave you with the thrid column for each in turn.

I couldn't decipher the desired output form -- one long column or an array of however many columns there are files...?

That controls how to do the write or catenation after reading...can finish up when get clarification of the desired output.

18 Comments
Show 16 older comments Hide 16 older comments

dpb on 24 Jul 2020

Edited: dpb on 24 Jul 2020

Open in MATLAB Online

"This work for me

data=data(:).'; % as a single row???

however, combine means mearge like X,Y,Z, all three in a column."

We still have a communications issue -- the above will/does combine three columns into a single ROW vector, NOT a column (a column is vertical, a row horizontal in geometric terms to ensure talking same meanings). So, if the above does work, why the subsequent reference to column?

There's no need for referrring to commas; that'll happen automagically if you write the data to a file using the proper syntax to write a csv file. Don't get distracted by external representation as compared to in memory.

Carrying on from the above with the presumption of rows, then there's no need for a cell array.

fileDir = 'D:\Examps';              % the target data directory
d=dir(fullfile(fileDir,'*.csv'));    % dir() struct of *.csv files in directory
N=numel(d);                         % total number files found
for i = 1:N                         % iterate over them...
  data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
  data=reshape(data(:,3),1,[]);         % turn into long row vector
  % Do whatever to the frame here...
end

Unfortunately, I've no idea now what the last new request has to do with anything having gone before -- if the three columns were x,y,z coordinates, having just smooshed them all together into one long array has made for trying to separate out pieces far more complicated that was to begin with.

To bindata by coordinates, look at discretize or histcounts but is definitely going to be confused if have done the above on the coordinate arrays first.

It's just completely unclear what is the end objective here to be able to see way to the effective solution and what it is you think would be accomplished by the arrangement into column/vector.

Ehsan Shooshtari on 26 Jul 2020

Edited: dpb on 26 Jul 2020

Open in MATLAB Online

Thank you for your patience and hospitality, I am so sorry about last accusation, I was very tierd to check the last error.

Up to now, the script gives us X , Y and Z and I yesterday worked on it to arrange XYZ stick togather from a row in one cell and after that I need to store all XYZ in a cell of an Exell file. I bring another parapherase of what I mean from one of my friend here, maybe help to figure it out.

"Each cell will have all the XYZ data of a single CSV on it. It means, you will have a 1*3 cell where 1x1 cell will contain the 4 or however much number of coordinates are in the CSV in that cell. Then the 1x2 position cell will again have all the XYZ information of the coordinates of the second excel file and so on".

I also had writng problem on my xls output, because I could not adress correctly where the program map the result without removing former data.

clc

close all

fileDir = 'D:\Examps';                % the target data directory
outfile = 'D:\XYZ_1.1000.csv';
d=dir(fullfile(fileDir,'*.csv'));       % dir() struct of *.csv files in directory
N=numel(d);                             % total number files found
for i = 1:N                                 % iterate over them...
  data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);             % show the whole Data
  j=1:N;
  dataXYZ = reshape(data(:,j),N+1,[])                              
   xlswrite(outfile,dataXYZ,i,'A1:C3');
  %dataY = reshape(data(:,2),1,[]);                                  % turn into long row vector
  %dataZ = reshape(data(:,3),1,[]);
  %G(1)= mcat(3,'dataX','dataY','dataZ')
end.    

Best Regards

dpb on 26 Jul 2020

Open in MATLAB Online

No problem on the syntax error -- just a little nudge that it'll be faster (and you'll learn MATLAB more quickly along the way) if you try to find syntax errors and such as this on own instead of waiting for someone on the forum to see and respond. Sorry I made the typo and left off the closing paren, but you'll make such typos or overisghts, too, so practice is good! :)

There's still a problem in the above description though -- xlswrite will NOT write a .csv file; what it would do as written above is write a .xls file with the file extension as .csv instead of .xls which then confuses Excel royally if try to open the file in Excel.

As the doc <XLSWRITE() Documentation> clearly says, xlswrite is not recommended unless you're using a release earlier than R2019a and if so, the caveat above still holds.

To write the above row vectors to a .csv file, without making a big internal array to store them all first, use fprintf

fileDir = 'D:\Examps';                % the target data directory
outfile = 'D:\XYZ_1.1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv'));       % dir() struct of *.csv files in directory
N=numel(d);                             % total number files found
for i = 1:N                                 % iterate over them...
  data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);             % show the whole Data
  data = reshape(data(:,j),1,[]);
  fmt=[repmat('%f,',1,numel(data)-1) '%f\n'];
  fprintf(fid,fmt,data)
end
fid=fclose(fid);

will write the XYZ array from each file in a row vector sequentially to a .csv file

Why you would want to do this still baffles me entirely, but...

dpb on 27 Jul 2020

Edited: dpb on 27 Jul 2020

Open in MATLAB Online

Just transpose first...

fileDir = 'D:\Examps';                  % the target data directory
outfile = 'D:\XYZ_1.1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv'));       % dir() struct of *.csv files in directory
N=numel(d);                             % total number files found
for i = 1:N                             % iterate over them...
  XYZ=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
  XYZ=data(:,1:3).';                    % select the three columns only, transpose
  fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
  fprintf(fid,fmt,XYZ)
end
fid=fclose(fid);

That makes more sense than the previous request...the above also takes only three columns from the sample file; there are actually six but the data are duplicated. If need all sets of points, just remove the expression and transpose XYZ when read it in.

Remember MATLAB is column-oriented storage in memory -- referring to the array as a whole in fprintf will output the entire array in that sequence; no explicit reshape is even needed; just reorder in memory by the transpose operation.

Ehsan Shooshtari on 28 Jul 2020

Open in MATLAB Online

Dear dpb
Thank you for response
About last coding, ouput is out of reach after I try fprintf -fopen and 
dispalay.
display('XYZ(i)');     just Shows "XYZ" .
fprintf shows Error of "Function is not defined for 'cell' inputs."
.................
fileDir = 'D:\Examps';                  % the target data directory
%outfile = 'D:\XYZ1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv'));       % dir() struct of *.csv files in directory
N=numel(d);                             % total number files found
XYZ=cell(1,N);                         % allocate cell array of Nx1 elements
for i = 1:N                             % iterate over them...
  data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
  XYZ(i)={data(:,1:3)};                  % select the three columns only, put in XYZ
  %display('XYZ(i)');
  fprintf(fid,fmt,XYZ)
end
fid=fclose(fid);
.....................
Also I want to use those XYZ for background filtering and normaly with 
upper threshold for X, Y, Z will show some fudamental countor line(Intresting lines). Any Idea.
Best

Ehsan Shooshtari on 29 Jul 2020

Open in MATLAB Online

Hello Dear dpb

Thanks for your time

I solved the order of XYZ, every x atech to the y and z of the point, just I have two problems.

The output just store in one column, not a seperated column for every csv.
The result for first csv is completly correct but for second csv, start with showing first csv and following by second, and for third , show the whole result.

clc

close all

fileDir = 'D:\Examps';                  % the target data directory
outfile = 'D:\XYZ1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv'));       % dir() struct of *.csv files in directory
N=numel(d);                             % total number files found
XYZ=cell(1,N);                          % allocate cell array of Nx1 elements
for i = 1:N                                  % iterate over them...
  data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
  XYZ(i)={data(:,1:3).'};              % select the three columns only, put in XYZ
  fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
  a=XYZ;                                   % save the output to a variable
  fprintf(fid,'%f\n',a{:});               
end
fid=fclose(fid);

dpb on 29 Jul 2020

Of course, on both...

you converted from an array to a cell array so the repeat count in the format string is that array size, not the size of the data in the cell array total, and
you're saving the elements of the cell array each pass through the loop so every loop has one more cell containing somthing -- but you write the entire cell array every time by using a{:} (and there's no need for the temporary variable, anyway)

It's still not clearly defined just how you want things written; again, you built a cell array to keep data together that belongs together; if you then try to write this out on a record basis you're going to destroy all that. The example you gave had, I think I recall, some four sets of 3 coordinates--if you string that together you'll have 12 elements on a record. But if you string all 1000 or so files together, then that would end up as being some 12,000 element-long record that makes no sense.

Why not build the XYZ cell array and then just SAVE() it and LOAD() it again by whatever subsequently needs it?

CSV files are very inefficient and bulky and don't carry full precision unless explicitly write like 15 decimal digts which really adds to the bulk.

As noted from the git-go, the needs here are just not at all clear as to what it is that is the needed end result but one has to guess it's not an optimal way to go about achieving whatever that end objective is.

Sign in to comment.

Answer 2

Ehsan Shooshtari on 31 Jul 2020

0 votes

Hello

Dear MR. dpb

Thank you for response

I was figured out, you was right about XYZ order and I came back to previous arrange. However, I meet two more problems.

First, The below program repeat result for example after write the XYZ again in the same row repeat the numer without change.

Second, I ran this script with actual data finally comes up with a bulky exell 6.5 gig and my excell can not open it.

..............................................

clc

close all

fileDir = 'D:\Examps'; % the target data directory

outfile = 'D:\XYZ_12000.csv';

fid=fopen(outfile,'w');

d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory

N=numel(d); % total number files found

for i = 1:N % iterate over them...

XYZ=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);

data=XYZ(:,1:3) % select the three columns only

fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];

fprintf(fid,fmt,XYZ);

end

fid=fclose(fid);

Best

2 Comments
Show None Hide None

dpb on 31 Jul 2020

"with actual data finally comes up with a bulky exell 6.5 gig"

That's what I told you from the beginning trying to do what you're doing in using csv text files would create very large files...

I suggested two possible workarounds--

Process each file as it is read instead of trying to merge them all at once. Most algorithms are sequential so is at least moderately likely can do so if think through what is that is the end result needed;
If, indeed, it is mandatory to have all data at one time, use .mat files to save the intermediate and load into what code needs the result.

If must for some unfortunate reason use Excel you'll simply have to cut the size down to what it can handle one way or the other...

Ehsan Shooshtari on 2 Aug 2020

Appreciate

Sign in to comment.

Combine three columns with comma for a lot CSV files

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

18 Comments
Show 16 older comments Hide 16 older comments

More Answers (1)

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Combine three columns with comma for a lot CSV files

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

18 Comments Show 16 older comments Hide 16 older comments

More Answers (1)

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

18 Comments
Show 16 older comments Hide 16 older comments

2 Comments
Show None Hide None