Time converted to a str

11 views (last 30 days)
joshmartinmont
joshmartinmont on 9 Dec 2015
Commented: dpb on 12 Dec 2015
I have Matlab R2012b and am importing a rather large .csv file that is created by another tool. The only column I care about from this file is the one with all of the timestamps in it so I've imported the column as text but it ends up as a cellstring. As I need to convert the timestamp to Seconds of Day to find out if there are any gaps in my data I have done the following:
filename = 'myfilename.csv';
delimiter = ',';
startRow = 2; %it always starts at row 2 as row 1 is the header name
FormatIn = 'HH:MM:SS.FFF'; % defines my time format for the datevec function
formatSpec = '%*s%*s%*s%*s%*s%*s%s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%[^\n\r]';
fileID = fopen(filename,'r');
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false); %reads my file
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'EmptyValue' ,NaN,'ReturnOnError', false); %creates my data array
fclose(fileID);
Time = dataArray{1, 1}; %outputs the column I want from the .csv file
Time1 = num2str(cell2mat(Time)); % where Time is my input data
TimenoDay = Time1(5:end); % This is supposed to remove the 246 and space leaving me with 14:32:54.342 231
TimenoSpaces = TimenoDay(~isspace(TimenoDay)); % This is supposed to remove all spaces leaving me with 14:32:54.342231
[Y M D H MN S] = datevec(TimenoSpaces,formatIn); this is supposed to break the time into a vector array
SOD = (H*3600+MN*60+S); % this should convert all of my time stamps to Seconds of Day
It works fine till I get to the datevec function and it runs very slow, as in 30 minutes+. Now, it is ~ 5 million rows of data, but I did something similar yesterday and it only took a few minutes to run the datevec function with a similar amount of data. The only difference between the two scripts is that the one that want quick had a normal time format and I could import it as a Time format (14:42:36.243213) instead of as text with the script above. I think my problem is because of the Julian day and the extra spaces in the timestamp.
Does anyone have any ideas that could help me speed this up? Make my code better?
Edit#1: So I see why the datevec isn't working. when I run
Time1 = num2str(cell2mat(Time)
it returns a 20 character cell with all 2's in it. But if I do
Time1 = num2str(cell2mat(Time(1:1)))
and just look at the first cell of data it works fine.
  2 Comments
dpb
dpb on 9 Dec 2015
What is the format of the input file? If all you want is the time data I'd just read the date data directly and convert it saving a step. Precise code would depend on the format...
joshmartinmont
joshmartinmont on 9 Dec 2015
It's a .csv file with 5,272,494 rows and spans columns A - AQ. The only column I care about (for this check anyway) is column G which gives me a timestamp of DDD HH:MM:SS.FFF sss (milliseconds I believe). All I need for the check I'm performing is the HH:MM:SS.FF sss, without the space. So I want my end format to look like HH:MM:SS.FFFsss. I need it in a string that I can then perform the datevec function on so I can separate it by H M SS to then convert to Seconds of Day. This will allow me to look for gaps in my data greater than 90 milliseconds.

Sign in to comment.

Accepted Answer

dpb
dpb on 9 Dec 2015
Edited: dpb on 10 Dec 2015
Again, pasting a (partial even) row of text with the actual field of interest would relieve confusion regarding the form, but presuming the 'sss' is actually microseconds (since the .FFF includes down to the msec resolution) and is in the file, then I'd do something like
fmt=[repmat('%*s',1,'G'-'A') '%3f %2f:%2f:%f %*[^\n]'];
d=cell2mat(textscan(fid,fmt,'delimiter',','headerlines',1,'collectoutput',1));
Now you'll have have an array Nx4 that are the [day,hr,min,sec.fff] values and you can pass those directly to datevec with the leading columns yr,mo as zero or some arbitrary start date (it'll all get subtracted out in the end so is immaterial as to absolute value and the time fields are all stored in the fractional portion of the datenum value. Keeping the day will eliminate issues regarding clock rollover if any data were to have been collected over the bewitching midnight hour.
Then, simply looking for
diff(datenum([YR0 1 d])>Setpoint)
will give you the desired answer. The above should be pretty quick.
Again, all above is air-code w/o an example subset of data to 'spearmint with..."salt to suit" :)
ADDENDUM
OK, with the sample data file to check on formatting, the rest is pretty much as outlined. You don't have any calendar data other than DOY and Matlab datenum needs a reference year so just pick something arbitrary as you're only looking for differences so it all gets subtracted out again, anyway.
dn=datenum(2000,0,dat(:,1),dat(:,2),dat(:,3),dat(:,4)+dat(:,5)/1E6);
dt=diff(dn)*86400*1000;
Above takes the date information you have and creates a datenum sequence. As noted before, this will have the benefit of handling clock rollover, etc., automagically. Then the "delta-t" dt is the sample time differential in msec which you can test. For a sample subset of the file supplied, this looks like--
>> dt(10:40)
ans =
0.0905
0.4526
0.7343
0.6940
0.6840
0.6940
0.6940
0.6940
10.2494
0.3722
0.3722
0.6940
0.5532
0.6840
0.1308
0.3017
0.6940
0.5834
0.2112
0.1308
0.4124
14.8058
0.3722
0.3722
0.6940
0.5532
0.6840
0.1408
0.2917
0.1509
0.3923
>>
Seems like the acquisition must be on a time-sharing system where there's a periodic latency, maybe???
Anyway, your end object is now simply to find the locations that don't satisfy your criterion. There are several choices for how to report this; you can use a logical vector of locations that satisfy, find locations that don't, report if any don't or all do, etc., etc., etc., ...
>> isOK=all(dt<90)
isOK =
1
>> sum(dt>10)
ans =
4
>> find(dt>90)
ans =
Empty matrix: 0-by-1
>> find(dt>10)
ans =
18
31
53
61
>>
Above uses a couple different fixed setpoints; you'll make that a variable, of course. I used an example of 10 just to illustrate the case of a few failing. The logical-addressing logic vector solution returns a full-length logical array of T/F, of course. Which turns out easiest to use is all dependent upon end objectives.
  16 Comments
joshmartinmont
joshmartinmont on 11 Dec 2015
/slowly bangs head into desk over and over
I'm confusing my gaps. Sigh. yes, it is doing it right. I had to step away and stop being frustrated. Let me finish running my entire script and compare the output to the output of the tool I'm replacing and I think all shall be good.
thanks for putting up with my ramblings this morning.
dpb
dpb on 12 Dec 2015
Ayup, that's what the dataset you posted shows, anyway. There's one of about 16 msec is the largest in that particular segment of a file (which is, of course, relatively short so don't know what would be max(dt) over a longer acquisition period).
Indeed the 1E6 and 86400*1000 constants are correct for the data file as given; datenums are in integer and fractional days by the implementation.
>> dat(1:4,:)
ans =
246.0000 18.0000 27.0000 8.1520 160.0000
246.0000 18.0000 27.0000 8.1520 852.0000
246.0000 18.0000 27.0000 8.1520 984.0000
246.0000 18.0000 27.0000 8.1530 278.0000
>> num2str(dat(1:4,4)+dat(1:4,5)/1E6,'%.8f')
ans =
8.15216000
8.15285200
8.15298400
8.15327800
shows the precise seconds to the microsecond as given in the input data.
While as you note, datestr won't show more than the msec resolution, you can prove there's more resolution within the date numbers pretty simply--
>> [~,~,~,~,~,s]=datevec(dn(1:4)); % get the seconds individually
Default precision displayed is four digits, but
>> num2str(s,'%.6f')
ans =
8.152199
8.152901
8.153000
8.153297
>>
Ewww...this isn't quite the same--let's see the the rounding error magnitude expected...
>> eps(dn(1))/(1E-6/86400)
ans =
10.0583
>> abs((ss(1)-dv(1,6)))*1E6
ans =
38.7915
>>
The first is the double-precision eps() for a date number at this particular time range divided by the magnitude of 1 usec in fractional days. This shows roughly 10 usec is the theoretical limit for a double precision variable. So, the resulting discrepancy of about 40 usec returned is the order of magnitude that is the limit of what the implementation can provide. For the sample data, the statistics are
>> er=abs((ss-dv(:,6)))*1E6; % difference between input/datenum, usec
>> [max(er) min(er) mean(er)] % range and average difference
ans =
53.8687 1.7163 25.2802
>>
But, given the requirement for checking within msec resolution, this is well within it and has the convenience behind it of dealing with whatever rollover there may be in any particular time sequence irregardless of time of day or even year rollover if the data acq is a continuous process.
If you were to need the full microsecond resolution, then you would have to do the conversion of the time fields independently to get the full double precision within range.

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!