How can I convert from numerical data to char data more efficiently?

2 views (last 30 days)
I wrote a matlab script to open a batch on csv files one at a time, recalcultes the relative time, and then writes the relative time and new timedata along with all of the other data as char into a new csv file. The code works fine for smaller files but will choke on files that are larger than 1GB. It looks like the main problem occurs starting at line 156 which I have marked as %%%%%%%%%%%%%%%%Concerned about computation time from here to end%%%%%%%%% Please help. Thank You.
clc;
clear all;
FileListi = dir('*.csv');
N = size(FileListi,1);
ctick=0;
filename_o='Node770.csv';
%filename_onm=str2double(filename_o);
for bb = 1:N
filenamei=FileListi(bb).name
TF = strcmp(filename_o,filenamei)
if TF==1;
movefile(filenamei,'Node770-1.csv');
end
end
FileList = dir('*.csv');
NN = size(FileList,1);
ctick=0;
for k = 1:NN
% get the file name:
filename = FileList(k).name
fid = fopen(filename);
% Gets the character string from the csv files in the header
s = fgetl(fid);
s2 = fgetl(fid);
s3 = fgetl(fid);
s4 = fgetl(fid);
s5 = fgetl(fid);
s6 = fgetl(fid);
s7 = fgetl(fid);
s8 = fgetl(fid);
% Close the file
fclose(fid);
%Takes the Time and date from different rows seperates the truncated time
%from the date and concatenates the higher precision time to the date
%Lastely reformatts date and time and converts from GMT to CST
D1=regexp(s7,',','split');
s_date=regexpi(D1,' ','split');
s_time=regexp(s8,',','split');
s7p=datestr(s_date{1,2}(1,1),'mm/dd/yyyy');
s7pnm(k)=datenum(s7p, 'mm/dd/yyyy');
date_string = strcat(s7p, {' '}, s_time(1,2))
cvnt=0/24; %GMT to UTC %-5/24; % convert GMT to CST
xdate_start = datenum(date_string, 'mm/dd/yyyy HH:MM:SS:FFF')+cvnt;
startdateCST=datestr(xdate_start)
if k>=2
if s7pnm(k)==s7pnm(k-1) && ctick==0
sufsg='_001.csv';
sufsg2='_001';
ctick=1;
else if s7pnm(k)==s7pnm(k-1) && ctick==1;
sufsg='_002.csv';
sufsg2='_002';
ctick=2;
else if s7pnm(k)==s7pnm(k-1) && ctick==2;
sufsg='_003.csv';
sufsg2='_003';
ctick=3;
else if s7pnm(k)==s7pnm(k-1) && ctick==3;
sufsg='_004.csv';
sufsg2='_004';
ctick=4;
else if s7pnm(k)==s7pnm(k-1) && ctick==4;
sufsg='_005.csv';
sufsg2='_005';
ctick=5;
else if s7pnm(k)~=s7pnm(k-1)
sufsg='_000.csv';
sufsg2='_000';
end
end
end
end
end
end
else
sufsg='_000.csv';
sufsg2='_000';
end
%Dataname=datestr(s_date{1,2}(1,1),'yyyy_mm_dd_000');
%newName=[Datename];
Datar=csvread(filename,21,0);
SR=128;
tt=Datar(:,1); %timer tick
tcnts=length(Datar(:,1)); %total cnts HS
%%%%%New method using incremnts and converting to date in one step%%%%%%
inct(1,1)=1/SR*tt(1,1)*(1/(24*3600)); %First Sample
inc=1;
%%%All other samples%%%%
for i=2:tcnts;
if tt(i-1,1) > tt(i,1)
inct(i,1)=inct(i-1,1)+1/SR*(65536-tt(i-1,1)+tt(i,1))*(1/(24*3600));
inc=inc+1;
else
inct(i,1)=inct(i-1,1)+1/SR*(tt(i,1)-tt(i-1,1))*(1/(24*3600));
end
end
xt(:,1)=xdate_start+inct(:,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Test Calcs / End date
EndDateCST=datestr(xt(end,1))
testdelta=(xt(end,1)-xdate_start)*24*3600;
Dtime_s=inct(:,1)*24*3600;
Data_EX=[Datar(:,1),Dtime_s(:,1),Datar(:,3),Datar(:,4),Datar(:,5),Datar(:,6),...
Datar(:,7),Datar(:,8),Datar(:,9),Datar(:,10),Datar(:,11),Datar(:,12)];
%%%%%%%%%Renames files according to date %%%%%%%%%%%
filedatename=datestr(s_date{1,2}(1,1),'yyyy_mm_dd');
presg='IIBOP_128Hz_';
newName=strcat(presg,filedatename,sufsg);
zipName=strcat(presg,filedatename,sufsg2);
samp=tcnts;
Adate_string=datestr(xt(1:samp,1),'mm/dd/yyyy HH:MM:SS:FFF');
Cdate=cellstr(Adate_string);
Headers = {'Datetime','TimerTick','Relatvie_Time',...
'String_Torque','String_Weight','String_Pressure',...
'String_Acceleration_Z','String_Rotational_Velocity',...
'StringSense_Heart_Beat','StringSense_Voltage',...
'StringSense_Temperature','Node_RSSI','Base_RSSI'};
FUnits = {'Datetime','Unitless','Seconds',...
'Kft-Lbs','KIP','Psi',...
'G','RPM',...
'Unitless','DC_Voltage',...
'C','dBm','dBm'};
%%%%%%%%%%%%%%%%Concerned about computation time from here to end%%%%%%%%%
for mm=1:12;
for i=1:samp;
CData{i,mm}={char(num2str(Data_EX(i,mm),'%.10f \n'))};
end
end
Data = [Cdate,CData];
Final = [Headers;FUnits;Data];
fid = fopen(newName,'wt');
for i=1:size(Final,1)
fprintf(fid, '%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n', char(Final{i,1}),...
char(Final{i,2}),char(Final{i,3}),char(Final{i,4}),char(Final{i,5}),...
char(Final{i,6}),char(Final{i,7}),char(Final{i,8}),char(Final{i,9}),...
char(Final{i,10}),char(Final{i,11}),char(Final{i,12}),char(Final{i,13}));
end
fclose(fid);
zip(zipName,newName);
clearvars -except FileList N ctick k s7pnm
end

Accepted Answer

Jan
Jan on 28 Aug 2013
Edited: Jan on 28 Aug 2013
A pre-allocation of inct would be important for the speed. Perhaps a proper pre-allocation of CData would have increased the speed also, but avoiding the creation of CData is even better:
fid = fopen(newName, 'W'); % Uppercase, no text mode
fprintf(fid, '%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\r\n', ...
Headers{:}, FUnits{:});
fmt = ['%s, ', repmat('%.10f, ', 1, 11), '%.10f\r\n'];
for k = 1:samp
fprintf(fid, fmt, Cdate{k}, Data_EX(:, k));
end
fclose(fid);
The 'W' method uses a buffered output, which is faster under some conditions. Instead of calling num2str for each element and converting the replied string to a string by char() (?!), hiding it in a cell element and getting it back to string by another char(), it is more efficient to let fprintf the conversion from double to string directly. I repeat it again, because it might be useful for you:
% Current method for one element:
fprintf(fid, '%s', char({char(num2str(Data_EX(i,j),'%.10f'))}))
% Faster direct way:
fprintf(fid, '%.10f', Data_EX(i,j))
By the way, 'mm/dd/yyyy HH:MM:SS:FFF' is an unusual format. Using a decimal dot to separate the fractional seconds is a better choice, perhaps.
This line:
Data_EX = [Datar(:,1),Dtime_s(:,1),Datar(:,3),Datar(:,4),Datar(:,5),Datar(:,6),...
Datar(:,7),Datar(:,8),Datar(:,9),Datar(:,10),Datar(:,11),Datar(:,12)];
can be abbreviated to:
Data_EX = [Datar(:,1), Dtime_s(:,1), Datar(:,3:12)];
  2 Comments
Ian
Ian on 29 Aug 2013
I tried your code but there were a few things that didn't work 1) The first column with the DateTime string is no longer present 2) There is a couple of rows of garbage characters that occurs on row 4 through 11 3) It looks like the numerical data is transposed
I submitted the images of the output from the original method that only works for smaller files but gives the desired output and your new suggestion. They are being reviewed.
Do you know what might be happening?
How does the section of code you wrote combind the datetime string column with the numerical data?
fmt = ['%s, ', repmat('%.10f, ', 1, 11), '%.10f\r\n'];
for k = 1:samp
fprintf(fid, fmt, Cdate{k}, Data_EX(:, k));
end
fclose(fid);
Thank you for your help. Ian
Ian
Ian on 29 Aug 2013
I found my issues and your additions to the code speed up the time on my test files from over 15 min to around 1.5 min. Hopefully it won't crash on the 1GB files.
Thank You, Ian

Sign in to comment.

More Answers (0)

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!