Datastore won't recognize datetime in CSV files (Matlab 2019b)

26 views (last 30 days)
Michael
Michael on 27 Aug 2019
Commented: Steve Gardner on 5 Dec 2019
Hello,
I'm trying to evaluate a datastore of CSV data that I saved with Matlab using writetable. One column contains datetimes and an example of the files' contents is this:
29-Jul-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.8979
31-Aug-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.9029
30-Sep-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.9106
31-Oct-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.9154
30-Nov-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.9227
30-Dec-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.9311
I tried the following code and received the subsequent error in the postscript. When I use datastore with "DatetimeType" set to "text," it works, but that is obviously inefficient. Can someone enlighten me on how to get this to work?
Thank You,
Michael
This code works
ds = datastore('tall.csv','DatetimeType','text');
tds = tall(ds);
u = unique(tds.FIELD);
U = gather(u);
This code fails
ds = datastore('tall.csv');
tds = tall(ds);
u = unique(tds.FIELD);
U = gather(u);
The error is
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: 0% complete
Evaluation 0% complete
Error using matlab.io.datastore.TabularTextDatastore/readData (line 77)
Unable to read the DATETIME data using the locale setting for your system: 'en_US'
If the data contains month or day names in a language foreign to this locale, use the 'DateLocale' parameter to specify the correct locale.
Learn more about errors encountered during GATHER.
Error in matlab.io.datastore.TabularDatastore/read (line 120)
[t, info] = readData(ds);
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
  1 Comment
Walter Roberson
Walter Roberson on 27 Aug 2019
Which release are you using?
Which column does FIELD correspond to?
The code seemed to work for me.

Sign in to comment.

Accepted Answer

Michael
Michael on 29 Aug 2019
Dear Mr. Robertson,
I changed the original input CSV files to MM/dd/yyyy and it worked so I'm going to give up. If anyone at the Mathworks is reading, it would be great if we could input a DatetimeFormat when specifying the datastore.
Thanks,
Michael

More Answers (5)

Nimit Dhulekar
Nimit Dhulekar on 27 Aug 2019
Hi Michael,
Are you executing these set of commands from outside the US? If so, the datetime formats available would be different from the ones available in the US. Try the following command:
datetime('29-Jul-1983 00:00:00')
You might quite possibly get an error similar to the one you posted. To get around this issue, you can supply "DatetimeLocale" as a Name-Value pair when constructing the datastore.
ds = datastore('tall.csv','DatetimeLocale','en_US');
Hope that helps!
-Nimit

Michael
Michael on 27 Aug 2019
Dear Mr. Robertson,
Thank you for responding.
I am using MATLAB version 9.6.0.1150989 (R2019a) Update 4, Windows 10 Pro Version 10.0 (Build 18362), and Java 1.8.0_181-b13.
The column names are DATE, TICKER, FIELD, and VALUE, so for the first line:
29-Jul-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.8979
DATE is represented in this format:
29-Jul-1983 00:00:00
FIELD is
MOV_AVG_50D
But, I don't think FIELD is the problem, even though it's what I'm operating on, for two reasons:
  1. The error reads "Unable to read the DATETIME"
  2. The error disappears when I add the parameter pair 'DatetimeType','text' to the datastore command.
Thank You,
Michael

Michael
Michael on 28 Aug 2019
Dear Nimit,
Thank you, but that does not produce an error:
>> datetime('29-Jul-1983 00:00:00')
ans =
datetime
29-Jul-1983 00:00:00
Best,
Michael
  1 Comment
Walter Roberson
Walter Roberson on 28 Aug 2019
Try all of the months: there might be a difference only for a few of them.

Sign in to comment.


Michael
Michael on 28 Aug 2019
Dear Mr. Roberson,
Good idea. So, with that in mind, I tried all the dates and they didn't produce errors! See below.
Any other thoughts?
Thanks,
Michael
Input
ds = datastore('bigtall.csv','DatetimeType','text');
tds = tall(ds);
u = unique(tds.DATE);
U = gather(u);
for i=1:length(U) b(i) = isnat(datetime(U{i})); end
any(b)
Output
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 25 min 22 sec
Evaluation completed in 25 min 29 sec
ans =
logical
0
  1 Comment
Walter Roberson
Walter Roberson on 28 Aug 2019
And if you use tds.FIELD does it go back to failing? Is it possible that it has decided that tds.FIELD is a datetime ?

Sign in to comment.


Michael
Michael on 28 Aug 2019
Dear Mr. Robertson,
This code workes and uses tds.FIELD. Is that what you mean? I'm not sure I understand how to answer your question.
ds = datastore('tall.csv','DatetimeType','text');
tds = tall(ds);
u = unique(tds.FIELD);
U = gather(u);
Thanks,
Michael
  6 Comments
Steve Gardner
Steve Gardner on 5 Dec 2019
I to have a simular issue CSV files with the date format of yyyy/MM/dd, datastore converts this to MM/dd/yyyy which is fine but for anything after the 12th day of the month it gives the value as NaN, basically it gets the month and day fields mixed up.
I too gave up and converted the CSV file date field to MM/dd/yyyy, just need to remember to do this evey time I get a new CSV file, bit of a pain really.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!