problem with converting dates to numbers

6 views (last 30 days)
I want to convert the 3 rd column "date" to "numbers" using datanum function in a loop.
I have few problems.
1. datenum function does not read 3rd col properly (see the file attached csv file and image for the error) 2. I need to perform this in a loop since I have large number of files.
Please see my code below.
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false); % Load Data
dn = datenum(tr.Var3,'yyyy/mm/dd');

Accepted Answer

Peter Perkins
Peter Perkins on 3 Dec 2015
The version of 01AA002_Daily_Flow_ts.csv that you've attached has column headings and freeform text at the bottom, so
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false);
is gonna return a table with one (cellstr) variable. Assuming you added that disclaimer text just for the purposes of posting the file and just forgot to tell us to take it out, using 'ReadVariableNames',false will give you a table with five cellstr variables. That's almost surely NOT what you want, although in the case of the date strings, they're strings either way.
When I do this
>> tr = readtable('01AA002_Daily_Flow_ts.csv');
>> dn = datenum(tr.Date);
>> dn(1:5)
ans =
everything works fine. Do you get something different when you do that? In your latest code, it fails because you've used the wrong format string when calling datenum.
Damith on 4 Dec 2015
Edited: Damith on 4 Dec 2015
I am kind of lost here. Can you please guide/help me with what you mentioned here?
I have revised the code now. See below.
clear all
cd ('C:\Users\Desktop')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
out{k} = textscan(fid(k),'%s%*s%s%f%*[^\n]','delimiter',',','headerlines',1);
for jj=1:length(csvFiles)
for ij=1:3
if ij==3
for i=1:-1:0
clear i
"Return the year via datevec and count the number for each unique year found. If not nDays(yr) for the given year then throw that one out"
Finally, I want to write to csv file with station ID, years of data available (complete) and maximum value of each complete year. (Please see the attached csv file). Flow values and years are not correct in the attached csv file.
dpb on 4 Dec 2015
This again seems to have transmuted the original question into asking how to solve another particular processing problem...I posted another code snippet that illustrates as another Answer...

Sign in to comment.

More Answers (2)

dpb on 2 Dec 2015
Edited: dpb on 3 Dec 2015
The problem isn't datenum, it's that you're trying to pass a cellstr array into it. If you're going to use readtable, use a specific format string to convert the dates on input; see doc for details and an example (albeit it converts a non-US date to US as well, but it does show the date formatting string).
Failing that, revert to the way I showed previously to parse the .csv file directly into numeric arrays, bypassing all the higher-level abstractions but leaving you with a set of double arrays that can be handled pretty simply for your needs.
Please cut 'n paste text instead of the images -- they're exceedingly difficult to read plus one can't select them to try to repeat anything you've done...
Anyway, I seem to have misspoken re: cellstring arrays and datenum; it actually accepts them just fine.
I used the import tool and retrieved both files (I have R2012b so don't have readtable so can't test it directly, but they have different forms for the date string. However, each worked just fine with datenum even as the time portion of the one file is ignored.
>> whos VarName5
Name Size Bytes Class Attributes
VarName5 32142x1 3535620 cell
>> VarName5(1:4)
ans =
>> datestr(datenum(VarName5(1:4),'yyyy-mm-dd'))
ans =
>> datestr(datenum(VarName5(1:4),'yyyy-mm-ddTHH:MM:SS'))
ans =
01-Jan-1920 07:00:00
02-Jan-1920 07:00:00
03-Jan-1920 07:00:00
04-Jan-1920 07:00:00
With the new datetime '%d' format string as noted you can interpret the rest of the time string as well but datenum doesn't have that facility.
The other file is an "ordinary" YYYY-MM-DD string; should be no issues whatever with it.
Whatever problem you're having seems to be associated with the "how" of how you're reading the files, but can't see what Matlab actually complained about from the pictures without the full error text in context.
  1 Comment
Damith on 2 Dec 2015
Edited: Damith on 2 Dec 2015
OK. For example, I will show you the evaluation of the first two lines of the code for the attached KH_100303.csv file. (See the images below)
tr = readtable('KH_100303.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%s%s%s%s); % Load Data
dn = datenum(tr.Var5, 'yyyy-mm-ddTHH:MM:SS');
But, this does not work for 01AA002_Daily_Flow_ts.csv file.
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%s); % Load Data
dn = datenum(tr.Var3,'mm/dd/yyyy');
Error using datenum (line 178)
DATENUM failed.
Caused by:
Error using dtstr2dtnummx
Failed on converting date string to date number.

Sign in to comment.

dpb on 4 Dec 2015
Edited: dpb on 5 Dec 2015
You seem to keep retrogressing past what we've already solved/shown solutions for. Why not build on the previously working solution in the previous thread remove-rows-text-at-the-bottom-of-a-csv-file? There I showed a simple way to return the values from the .csv file that mitigates the trailing disclaimer text essentially automagically. Instead you've returned to the previous case of holding all the file content in a cell array of cells which is exceedingly difficult to address owing to the need to get all the curlies and parens correct plus you can't do global addressing of cells with two-layer addressing to get subsets.
The previous file in the above thread used '-' as the date separator whereas this one uses '/' so that's one modification if choose to return the dates as y,m,d values rather than the string so that might mitigate using that altho you'll probably have to fixup the format string for datenum so it's likely a wash in writing generic code; you'll have to deal with the specific format at some point, anyway.
All that aside, start by first reading a single file and returning the specific information needed; namely the max for complete years, then look at wrapping that functionality over the files...
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
fid=fclose(fid); % close input file
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
% write out the results in other file (presume already open)
fprintf(fido,'%s,%d'\n',stn,length(yr)) % output station, # years
fprintf(fido,'%4d,%.1f\n', [yr mx].';) % year, max for each
You'll have to put in the housekeeping to create and open the output file(s*) then close after done and such, but the basic processing should be taken care of in the above...
You'll note I didn't bother to parse the station name from the file; that's just a complication of a bunch of meaningless text; I just parsed it from the input file name. The output file format is
(*) I basically presumed in the above the idea is to consolidate all these into a single file; hence the station and number of entries in each section to aid reading. If again want one per station, then as the sample in the other thread demonstated, find some common name-generating pattern here as well.
Also note the utility function isleapyr is one of my little helpers...
function is=isleapyr(yr)
% returns T for input year being a leapyear
dpb on 10 Dec 2015
Edited: dpb on 10 Dec 2015
How many times do you have to be told to open the output file first, not inside the loop? This isn't rocket science...
And, of course, don't close it until after done writing into it...
Damith on 10 Dec 2015
OK. Figured it out. But, having a hard time writing the " mxary" row by row in to the same csv file corresponding to each row. See the image for station name and year columns.
See the code below:
clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
fileID = fopen(filename,'w');
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
fid=fclose(fid); % close input file
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
stn=strtok(d(i).name,'_'); % parse station name from file
fprintf(fileID,'%s,%d\n', stn,length(yr));

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!