How to extract certain column of date-time data from a text file?

I have a .cef text file and it is a mixture of strings and data. A part of the code is as below:
% more irrelevant description on top
DEPEND_0 = time_tags__C3_CP_EDI_EGD
end_variable = kine_flag__C3_CP_EDI_EGD
! END_CEFMERGE_INCLUDE = "C3_CH_EDI_EGD_DATASET.ceh"
!
DATA_UNTIL=EOF
!
2014-12-17T20:14:24.220514Z, 109.196, 1.526, 1, 0
2014-12-17T20:14:45.373833Z, 109.315, 0.763, 1, 0
2014-12-17T20:15:21.192066Z, 108.480, 0.763, 1, 0
2014-12-17T20:15:48.907884Z, 107.527, 0.763, 1, 0
2014-12-17T20:16:26.787885Z, 107.646, 0.763, 1, 0
2014-12-17T20:16:45.387492Z, 107.169, 0.763, 1, 0
2014-12-17T20:17:04.818033Z, 106.573, 0.763, 1, 0
2014-12-17T20:17:48.323033Z, 106.454, 0.763, 1, 0
...
I hope to extract the first column of the data, which are dates in ISO 8601 format, and use it to plot a graph. To do this my code have to neglect all the description above, as well as the remaining columns.
I have thought of using dlmread, textscan, xlsread, sscanf, fopen, but since the data type is date-time and it consists of integers and strings, it is quite a challenge to me.
2014-12-17T20:14:24.220514Z
2014-12-17T20:14:45.373833Z
2014-12-17T20:15:21.192066Z
2014-12-17T20:15:48.907884Z
2014-12-17T20:16:26.787885Z

5 Comments

Thank you for all the codes. It works fine on the example provided.
However, it does not work on my file, which has a lot more words on it.
Attached is the text file I am trying to read. I would like to receive solutions and suggestions on this.
P.S. I have converted this file from a .cef file format to .txt file format using Microsoft Word. It would be better if there is a code script to read the .cef file directly, since the readtable function could not read a .cef file.
What is a .cef file?
Attach the original file itself. The extension has no bearing on the content...a text file is a text file unless it has specific other encoding, the filename and extension are just external user names that the system may (or may not) use to assign a particular application link with.
I can't attach the file here since the file format is unsupported. I have attached a link to the file exchange website.
Just rename it, then.
I didn't realize TMW had such a parochial mindset, but just tried it and see you're correct.
Do
copyfile('test.cef','test.txt')
where substitute your filename for 'test' and then attach that file here. Should work.

Sign in to comment.

 Accepted Answer

And, w/ all you considered, you neglected probably the easiest... :)
Putting your first sample data into a file I named 'jie.dat',
t=readtable('jie.dat','headerlines',7);
t.DT=datetime(t.Var1,'InputFormat','uuuu-MM-dd''T''HH:mm:ss.SSSSSSZ','TimeZone','UTC');
t.DT.Format=[t.DT.Format '.SSSSSS'];
The result of the above is...you can, of course, remove any variables not needed...
>> t
t =
8×6 table
Var1 Var2 Var3 Var4 Var5 DT
_____________________________ ______ _____ ____ ____ ___________________________
'2014-12-17T20:14:24.220514Z' 109.2 1.526 1 0 17-Dec-2014 20:14:24.220514
'2014-12-17T20:14:45.373833Z' 109.31 0.763 1 0 17-Dec-2014 20:14:45.373833
'2014-12-17T20:15:21.192066Z' 108.48 0.763 1 0 17-Dec-2014 20:15:21.192066
'2014-12-17T20:15:48.907884Z' 107.53 0.763 1 0 17-Dec-2014 20:15:48.907884
'2014-12-17T20:16:26.787885Z' 107.65 0.763 1 0 17-Dec-2014 20:16:26.787885
'2014-12-17T20:16:45.387492Z' 107.17 0.763 1 0 17-Dec-2014 20:16:45.387492
'2014-12-17T20:17:04.818033Z' 106.57 0.763 1 0 17-Dec-2014 20:17:04.818033
'2014-12-17T20:17:48.323033Z' 106.45 0.763 1 0 17-Dec-2014 20:17:48.323033
>>
ADDENDUM:
function n=readHdrLines(filename)
fid=fopen(filename,'r'); % open file...ensure filename fully-qualified name or in path
% do error checking here for valid fid
n=0; % initialize counter
while ~feof(fid) % loop until find key phrase
n=n+1; % increment line counter
if contains(fgetl(fid,'DATA_UNTIL=EOF')), break, end % found it; quit
end
n=n+1; % account for the comment line after before first data
fid=fclose(fid); % close the file
end
then just fixup the above sample code to something like--
filename=fullfile(yourpath,yourfile);
nHdr=readHdrLines(filename);
t=readtable('jie.dat','headerlines',nHdr);
...
ADDENDUM 2:
The grep utility will do this a whole lot faster; and can also be done with regexp if have file in memory. Don't know how that would compare to fgetl, but the portion of the file header section isn't that long so performance won't be terribly slow however you choose to do it.

8 Comments

Hi dpb, your code works fine on my example but does not work on my text file. I have attached the text file on a comment below my question.
Well, did you fix up the 'headerlines' argument to match the actual file?
You'll have to either know that a priori or scan the file to find the location first...
Figured it would... :) As long as there really wasn't something embedded in the actual file other than ASCII text...which was the point of the other exercise to prove was/wasn't.
Is it possible to read the text line by line and detects automatically which line of code that it will start to divide the text into column?
I am planning to create a code to read similar format of text, but the number of rows to skip will be different.
t=readtable('jie.dat','headerlines',7);
So if I use the original approach I will have to change the number (e.g. 7) to a different one everytime. Is there a better approach to this?
It works, thanks.
Looking at the code, I assume that the following line:
if contains(fgetl(fid,'DATA_UNTIL=EOF')), break, end
checks that if the line of script contains the data that it is looking for. I just wonder how does it knows the type of data that it is looking for or which line to start, since we did not define the structure of the data?
I don't really understand what does the line of code above does.
Have you read the doc for each function???
fgetl() returns the next line of the file as character string so the data is always character to it...just experiment at the command line with the sequence of commands and see how it works...just don't start the while() loop from the command line; just enter a few cases manually until get the picture.
As for " which line to start" it starts at the beginning... fopen returns the file handle and leaves the file ready at the beginning sans the 'append' flag for writing or some other movement of the filepointer by fseek before calling fgetl

Sign in to comment.

More Answers (0)

Asked:

on 26 Jul 2019

Edited:

dpb
on 8 Aug 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!