File Exchange

image thumbnail

DateStr2Num

version 1.4.0.0 (14.3 KB) by Jan
Convert date string to date number - C-Mex: much faster than DATENUM

8 Downloads

Updated 14 Jun 2018

View Version History

View License

DATESTR2NUM - Fast conversion of DATESTR to DATENUM
The builtin DATENUM command is very powerful, but if the input is known to be valid and formatted exactly, a specific MEX can be much faster:
For single strings DateStr2Num is about 120 times faster than DATENUM, for a {1 x 10000} cell string, the speed up factor is 300 to 600(!), (Matlab 2011b/64, MSVC 2008).
D = DateStr2Num(S, F)
INPUT:
S: String or cell string in DATESTR(F) format.
In opposite to DATENUM the validity of the input string is *not* checked.
F: Integer number defining the input format. Accepted:
0: 'dd-mmm-yyyy HH:MM:SS' 01-Mar-2000 15:45:17
1: 'dd-mmm-yyyy' 01-Mar-2000
29: 'yyyy-mm-dd' 2000-03-01
30: 'yyyymmddTHHMMSS' 20000301T154517
31: 'yyyy-mm-dd HH:MM:SS' 2000-03-01 15:45:17
230: 'mm/dd/yyyyHH:MM:SS' 12/24/201515:45:17
231: 'mm/dd/yyyy HH:MM:SS' 12/24/2015 15:45:17
240: 'dd/mm/yyyyHH:MM:SS' 24/12/201515:45:17
241: 'dd/mm/yyyy HH:MM:SS' 24/12/2015 15:45:17
1000: 'dd-mmm-yyyy HH:MM:SS.FFF' 01-Mar-2000 15:45:17.123
1030: 'yyyymmddTHHMMSS.FFF' 20000301T154517.123
OUTPUT:
D: Serial date number.
EXAMPLE:
C = {'2010-06-29 21:59:13', '2010-06-29 21:59:13'};
D = DateStr2Num(C, 31)
>> [734318.916122685, 734318.916122685]
Equivalent Matlab command:
D = datenum(C, 'yyyy-mm-dd HH:MM:SS')
The C-file must be compiled before using. This is done automatically at the first call of this function.
Pre-compiled Mex files can be downloaded from: http://www.n-simon.de/mex

Tested: Matlab 6.5, 7.7, 7.8, 7.13, 32/64bit, WinXP/7
Compiler: LCC 2.4/3.8, BCC 5.5, Open Watcom 1.8, MSVC 2008
Compatibility to MacOS, Linux, 64 bit is assumed, but not tested.

See also: DateConvert (Jan Simon)
http://www.mathworks.com/matlabcentral/fileexchange/25594

Cite As

Jan (2021). DateStr2Num (https://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (31)

Daniel Kovari

As always, your contributions to Matlab are invaluable.
I recently found myself wanting to parse dates with the format: yyyy-mm-dd HH:MM:SS.FFF [A/P]M

I added an extra handler to your code, maybe someone else will find it useful:

double Str1060Num(const mxArray *S)
{
// Convert a single string to a serial date number.
// "yyyy-mm-dd HH:MM:SS.FFF AM" "2000-03-01 01:45:17 PM"
//

uint16_T *d16;
int32_T year, mon, day, hour, min, sec, mil;
double dNum;

// Check number of characters:
if (mxGetNumberOfElements(S) < 26) {
ERROR("BadDateString",
"Bad string length for [yyyy-mm-dd HH:MM:SS.FFF AM] format.");
}

// Extract the date and time:
d16 = (uint16_T *) mxGetData(S); // mxChar is UINT16!
year = (int32_T) ((d16[0] - 48) * 1000 + (d16[1] - 48) * 100 +
d16[2] * 10 + d16[3] - 528);
mon = d16[5] * 10 + d16[6] - 528; // (d16[5]-48) * 10 + d16[6]-48
day = d16[8] * 10 + d16[9] - 528;
hour = d16[11] * 10 + d16[12] - 528;
min = d16[14] * 10 + d16[15] - 528;
sec = d16[17] * 10 + d16[18] - 528;
mil = (d16[20] - 48) * 100 + (d16[21] - 48) * 10 + (d16[22] - 48);

if((d16[24]|32)==112){ //convert A/P character to lower case and check if equal to 'p'
if(hour<12) hour += 12;
}else{
if(hour==12){ //12 AM
hour = 0;
}
}

// Calculate the serial date number:
dNum = DATE_TO_NUMBER +
(hour * 3600000 + min * 60000 + sec * 1000 + mil) / 86400000.0;
ADD_LEAP_DAY

return (dNum);
}

Karamos

Jan this function is very helpful as improves speed performance.
I am using millisecond data I dont what to miss this information but your function doesn't recognise my format which is {'20140211 113301617') yyyyMMddHHmmssSSS.
Could you please advice how to use(or modify) your function and capture the millisecond. Thank you

dymitr ruta

To inject the immunity to various cases for mmm month format, i.e. 'JAN', 'JAn', 'JaN', 'jan' etc, one can notice that since lower case letters are shifted from upper case letters by 32 if we run the sum of 2nd and 3rd letter of the month modulo 32 we will get the same unique values for each months irrespective of the case of any letter, so the modification to Jan's code: mIndex = (d16[2nd_letter index] + d16[3rd_letter_index]) % 32; would do this trick. Then the piece of code with the mapping to months would look like this:

if (mIndex < 10) { // Split the test in 2 halfs for speed
switch (mIndex) {
case 1: mon = 7; break; // 'jul'
case 2: mon = 4; break; // 'apr'
case 3: mon = 6; break; // 'jun'
case 5: mon = 11; break; // 'nov'
case 7: mon = 2; break; // 'feb'
case 8: mon = 12; break; // 'dec'
default:
ERROR("BadDateString", "Bad month for [yyyy*mmm*dd] format.");
}
} else {
switch (mIndex) {
case 15: mon = 1; break; // 'jan'
case 19: mon = 3; break; // 'mar'
case 21: mon = 9; break; // 'sep'
case 23: mon = 10; break; // 'oct'
case 26: mon = 5; break; // 'may'
case 28: mon = 8; break; // 'aug'
default:
ERROR("BadDateString", "Bad month for [yyyy*mmm*dd] format.");
}

DateStr2Num has already saved me months of processing time, excellent job Jan.

Florian Aendekerk

This should be included in the Statistics and ML toolbox

Stephen Cobeldick

A neat concept neatly implemented. On the occasions when I have needed to process some large collections of data this submission was most useful: thank you Jan Simon! Does exactly what it says on the box :)

Jan

@Aaron Schurger: Now month names are accepted in uppercase also.

Aaron Schurger

One minor glitch: I get an error when the name of the month is in all caps, as in
'03-JAN-2016 09:15:02'
I converted to lower case first and then it worked like a charm.

Jan

@Johan Hagman: Of course more formats can be implemented. A modification of the code should be very simple for your case. I did not implement years with 2 digits, because this would require assumptions for the missing digits. But smart assumptions are definitely not the job of this tool. If you contact me by mail (address found in the code) and explain *exactly*, how you want the YY be completed to YYYY, I can provide a solution - if no "pivot years" or requests of the current date are required.

Johan Hagman

Is there any possibility of adding more date formats? The datenum function is way slow when handling >150 million dates, and our dates are sadly specified in 'dd-mmm-yy HH:MM:SS' or 'dd-mmm-yy HH:MM:SS.FFF', where in this function the input of just two digits for year is not enough. As specified by the formats all dates must be represented by 4 digits for years.

Aman Sethi

Alain

Jan this is truely great.

Have you reflected about datestr also? this one seems even slower - having a DateNum2Str would nicely complement...

Ingrid

reading in large csv-files with dates in the first column turned out to be extremely slow due to the datenum function and this file save the day

Felipe

Jan

@joh: What exactly is an "array of dates"? Cell strings are handles internally already.

joh

hi, would you use a for loop for an array of dates?

scott worland

Perfect! I am using 15 minute interval stream records --- my data sets are around 0.5 million lines (Elapsed time is 0.019273 seconds to convert dates to number using DateStr2Num)

Jan

@James: I do not understand the question. Of course I've created the C-file, before it could be compiled by the mex command.
I suggest to post the error you get form the compiler and explain the required details.

James

Excellent work Jan! How did you create the c file before using the mex command? I keep getting errors from matlab compiler.

Thanks
-Jimmy

Hao Shen

Swasti Khuntia

Excellent stuff !!!!

German Gomez-Herrero

Exactly what I was looking for! This is really good stuff

Jonathan Sullivan

Very nice submission. It is enormously faster than datenum. Another great file from Jan.

Jan

To convert a serial date number to a '2011-08-24 23:38:44' date string:
sprintf('%.4d-%.2d-%.2d %.2d:%.2d:%.2d', datevecmx(now, 1));

Saad

ignore my previous comment...the string had some trailing whitespace which was leading the c program to throw an exception.

Saad

Hi, thanks for writing this, it works great. But it seems like the C compiled version doesn't accept a char string? I have a char string of 1x25 which works with the normal .m function but I get a warning that the format is not acceptable within the compiled version. I have to use the cellstr() function to convert, before passing to the compiled version. In the end, the run time between compiled and .m function are the same. Anyway you can adapt this?

Todd

My bad. Like Nate, I figured it out - what a difference! The command:
mex -O DateStr2Num.c
Generated a DateStr2Num.mexw64 file and wham! My code was at 90s using datenum. Now that section takes 0.02s. Thanks!

Nate Jensen

Sorry, I'm retarded, I figured it out.

Nate Jensen

Good function, I use it all the time. I am retarded at C though. Could you tell me how to run the C function from Matlab? Thanks.

Jan

@Reyna: The new '300' format contains milliseconds also.

Reyna

This is great. Is it possible to add millisec? Particularly to extend format 30 to yyyymmddTHHMMSS.FFF?

MATLAB Release Compatibility
Created with R2016b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!