extract date and time preceded by strings with variable length from text file

11 views (last 30 days)
Hi all,
I am facing the challenge of extracting date and time from the following table.
>>460110419137703 0 2018-09-28 01:23:10 0.000000 0.000000
>>460028186815737 0368ceefe32555 2018-09-28 01:19:25 113.944056 22.504925
I have tried the following command but it only manages to extract the second line correctly.
>>tmp = readtable('1.txt','ReadVariableNames',false,'Format','%f%s%{yyyy-MM-dd HH:mm:ss}D%f%f');
One way to go around this is to replace '0' in the first line with a string of fourteen 0's to match the string length and then use the following command.
>>tmp = readtable('1.txt','ReadVariableNames',false,'Format','%f%14c%{yyyy-MM-dd HH:mm:ss}D%f%f');
But I think there may be a single command that can handle both lines without manual modification as above. Could somebody give me an idea how this could be done? Thanks!

Accepted Answer

dpb
dpb on 6 Oct 2018
Edited: dpb on 6 Oct 2018
fmt='%f%s%{yyyy-MM-dd HH:mm:ss}D%f%f';
dat=textscan(fid,fmt,'delimiter','\t','whitespace','');
This is directly related tp the discussion of parsing datetime with other fields we just had at Answer_339313>, and more particularly at your previous Q? discussion that Walter linked to there.
You've got to clear the whitespace from being seen inside the datetime field so textscan won't break the date and time into two pieces; fortunately your file here is tab-delimited.
Unfortunately, there's another insurmountable issue with the hex data without reverting to the fscanf level; I'd recommend you also submit enhancement request for missing '%x' conversion format string. I've complained about it previously; I've no idea why TMW would have left it out...one isn't supposed to have hex data???.
fmt='%f%x%{yyyy-MM-dd HH:mm:ss}D%f%f';
fid=fopen('bayer.dat','r');
>> textscan(fid,fmt,'delimiter','\t','whitespace','')
Error using textscan
Unable to parse the format character vector at position 3 ==> %x%{yyyy-MM-dd HH:mm:ss}D%f%f
Unsupported format specifier '%x'. See the documentation for TEXTSCAN for supported formats.
>> frewind(fid)
>> fmt='%f%s%{yyyy-MM-dd HH:mm:ss}D%f%f';
>> textscan(fid,fmt,'delimiter','\t','whitespace','')
ans =
1×5 cell array
{2×1 double} {2×1 cell} {2×1 datetime} {2×1 double} {2×1 double}
>> fid=fclose(fid);
>> t=readtable('bayer.dat')
t =
3×5 table
Var1 Var2 Var3 Var4 Var5
__________ ____ ___________________ ______ ______
4.6011e+14 0 2018-09-28 01:23:10 0 0
4.6003e+14 368 NaT 2018 -9
-28 1 NaT 113.94 22.505
>> opt=setvartype(opt,'Var2','char');
>> t=readtable('bayer.dat',opt)
t =
2×5 table
Var1 Var2 Var3 Var4 Var5
__________ ________________ ____________________ ______ ______
4.6011e+14 '0' 28-Sep-2018 01:23:10 0 0
4.6003e+14 '0368ceefe32555' 28-Sep-2018 01:19:25 113.94 22.505
>>
Your choice of how to handle the hex string; either convert to decimal or categorical depending on how it is to be used. One might consider also converting the first column to long integer as well...or it might also be categorical depending on just what it represents.
  5 Comments
dpb
dpb on 12 Oct 2018
Edited: dpb on 13 Oct 2018
What it will return by default will depend on what if finds in the file; for the file here the first row is a valid integer and it returns 'double'; if you use detectImportOptions on a file wherein the first row is one of the hex values with a non-decimal character in it; then it will not be able to convert it with '%f' and will return 'char'. To be robust and handle any file of this type, you need to set it manually (or ensure you only scan a file in which the first element contains a hex digit other than 0-9 which means much more coding effort than simply setting the known variable type manually).
BTW, when you have multiple files of the same type, create the import object once and re-use it; don't waste compute cycles re-inventing the wheel.
Here's the result for the test file I used here--
>> type bayer.dat
460110419137703 0 2018-09-28 01:23:10 0.000000 0.000000
460028186815737 0368ceefe32555 2018-09-28 01:19:25 113.944056 22.504925
>> opt=detectImportOptions('bayer.dat')
opt =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 2 more}
VariableTypes: {'double', 'double', 'datetime' ... and 2 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 2 more}
VariableOptions: Show all 5 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLine: 1
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
>>
The second column is 'double'...
bayern science
bayern science on 15 Oct 2018
I now see the reason behind the use of 'detectImportOptions'. Many thanks for the clarification and the suggestion of reusing the import object dpb!

Sign in to comment.

More Answers (0)

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!