extract date and time preceded by strings with variable length from text file

Question

bayern science on 6 Oct 2018

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/422506-extract-date-and-time-preceded-by-strings-with-variable-length-from-text-file

Commented: bayern science on 15 Oct 2018

Accepted Answer: dpb

Open in MATLAB Online

Hi all,

I am facing the challenge of extracting date and time from the following table.

>>460110419137703  0  2018-09-28 01:23:10  0.000000  0.000000
>>460028186815737  0368ceefe32555  2018-09-28 01:19:25  113.944056  22.504925

I have tried the following command but it only manages to extract the second line correctly.

>>tmp = readtable('1.txt','ReadVariableNames',false,'Format','%f%s%{yyyy-MM-dd HH:mm:ss}D%f%f');

One way to go around this is to replace '0' in the first line with a string of fourteen 0's to match the string length and then use the following command.

>>tmp = readtable('1.txt','ReadVariableNames',false,'Format','%f%14c%{yyyy-MM-dd HH:mm:ss}D%f%f');

But I think there may be a single command that can handle both lines without manual modification as above. Could somebody give me an idea how this could be done? Thanks!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 6 Oct 2018

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/422506-extract-date-and-time-preceded-by-strings-with-variable-length-from-text-file#answer_340050

Edited: dpb on 6 Oct 2018

Open in MATLAB Online

fmt='%f%s%{yyyy-MM-dd HH:mm:ss}D%f%f';
dat=textscan(fid,fmt,'delimiter','\t','whitespace','');

This is directly related tp the discussion of parsing datetime with other fields we just had at Answer_339313>, and more particularly at your previous Q? discussion that Walter linked to there.

You've got to clear the whitespace from being seen inside the datetime field so textscan won't break the date and time into two pieces; fortunately your file here is tab-delimited.

Unfortunately, there's another insurmountable issue with the hex data without reverting to the fscanf level; I'd recommend you also submit enhancement request for missing '%x' conversion format string. I've complained about it previously; I've no idea why TMW would have left it out...one isn't supposed to have hex data???.

fmt='%f%x%{yyyy-MM-dd HH:mm:ss}D%f%f';
fid=fopen('bayer.dat','r');
>> textscan(fid,fmt,'delimiter','\t','whitespace','')
Error using textscan
Unable to parse the format character vector at position 3 ==> %x%{yyyy-MM-dd HH:mm:ss}D%f%f
Unsupported format specifier '%x'. See the documentation for TEXTSCAN for supported formats. 
>> frewind(fid)
>> fmt='%f%s%{yyyy-MM-dd HH:mm:ss}D%f%f';
>> textscan(fid,fmt,'delimiter','\t','whitespace','')
ans =
  1×5 cell array
    {2×1 double}    {2×1 cell}    {2×1 datetime}    {2×1 double}    {2×1 double}
>> fid=fclose(fid);
>> t=readtable('bayer.dat')
t =
  3×5 table
       Var1       Var2           Var3             Var4      Var5 
    __________    ____    ___________________    ______    ______
    4.6011e+14      0     2018-09-28 01:23:10         0         0
    4.6003e+14    368     NaT                      2018        -9
           -28      1     NaT                    113.94    22.505
>> opt=setvartype(opt,'Var2','char');
>> t=readtable('bayer.dat',opt)
t =
  2×5 table
       Var1             Var2                  Var3             Var4      Var5 
    __________    ________________    ____________________    ______    ______
    4.6011e+14    '0'                 28-Sep-2018 01:23:10         0         0
    4.6003e+14    '0368ceefe32555'    28-Sep-2018 01:19:25    113.94    22.505
>>

Your choice of how to handle the hex string; either convert to decimal or categorical depending on how it is to be used. One might consider also converting the first column to long integer as well...or it might also be categorical depending on just what it represents.

5 Comments
Show 3 older commentsHide 3 older comments

dpb on 12 Oct 2018

Edited: dpb on 13 Oct 2018

Open in MATLAB Online

What it will return by default will depend on what if finds in the file; for the file here the first row is a valid integer and it returns 'double'; if you use detectImportOptions on a file wherein the first row is one of the hex values with a non-decimal character in it; then it will not be able to convert it with '%f' and will return 'char'. To be robust and handle any file of this type, you need to set it manually (or ensure you only scan a file in which the first element contains a hex digit other than 0-9 which means much more coding effort than simply setting the known variable type manually).

BTW, when you have multiple files of the same type, create the import object once and re-use it; don't waste compute cycles re-inventing the wheel.

Here's the result for the test file I used here--

>> type bayer.dat
460110419137703  0  2018-09-28 01:23:10  0.000000  0.000000
460028186815737  0368ceefe32555  2018-09-28 01:19:25  113.944056  22.504925
>> opt=detectImportOptions('bayer.dat')
opt = 
  DelimitedTextImportOptions with properties:
   Format Properties:
                    Delimiter: {'\t'}
                   Whitespace: '\b '
                   LineEnding: {'\n'  '\r'  '\r\n'}
                 CommentStyle: {}
    ConsecutiveDelimitersRule: 'split'
        LeadingDelimitersRule: 'keep'
                EmptyLineRule: 'skip'
                     Encoding: 'windows-1252'
   Replacement Properties:
                  MissingRule: 'fill'
              ImportErrorRule: 'fill'
             ExtraColumnsRule: 'addvars'
   Variable Import Properties:   Set types by name using setvartype
                VariableNames: {'Var1', 'Var2', 'Var3' ... and 2 more}
                VariableTypes: {'double', 'double', 'datetime' ... and 2 more}
        SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 2 more}
              VariableOptions: Show all 5 VariableOptions 
  Access VariableOptions sub-properties using setvaropts/getvaropts
   Location Properties:
                     DataLine: 1
            VariableNamesLine: 0
               RowNamesColumn: 0
            VariableUnitsLine: 0
     VariableDescriptionsLine: 0
>>

The second column is 'double'...

bayern science on 15 Oct 2018

I now see the reason behind the use of 'detectImportOptions'. Many thanks for the clarification and the suggestion of reusing the import object dpb!

Sign in to comment.

extract date and time preceded by strings with variable length from text file

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

extract date and time preceded by strings with variable length from text file

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments