Remove non time string values in a time matrix

Hi,
I have a time string matrix (592x1 cell) that looks something like this.(Time string values are outputs from a parsed serial port communication link).
time_mat = {'00:21:51.000',.........................'00:22:16.200','00:22:16.400','00:22:16.600','2019/05/30','00:22:17.000'....'22Rover6'.......,'2620517.2165',......................}
The bold ones are the ones that need to be removed and replaced with [].
I tried do a string comparison check and size matching criteria to remove the unnecessary data but it didn't work. Can anyone suggest a better approach? I have also attached the time_mat file for your perusal.
Thanks for your time and help.
Ravi

 Accepted Answer

I would use datetime() to convert your cell array of strings to a datetime array. This will return NaT (not a time) for elements that are not in the specified format.
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
Comparison
table(dtMat(110:115), time_mat(110:115),'VariableNames',{'datetime', 'original'})
ans =
6×2 table
datetime original
____________ ______________
00:22:16.400 '00:22:16.400'
00:22:16.600 '00:22:16.600'
NaT '2019/05/30'
00:22:17.000 '00:22:17.000'
00:22:17.200 '00:22:17.200'
00:22:17.800 '00:22:17.800'
To fill in the missing values with linear interpolation, use fillmissing() (r2016b or later)
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
dtMatFill = fillmissing(dtMat,'linear');
% To see the missing data
natIdx = isnat(dtMat); %index of missing data
dtMatFill(natIdx)
If you'd rather work with the cell array of strings, you can replace the bad elements with empties like this:
badIdx = cellfun(@isempty,regexp(time_mat,'\d{2}:\d{2}:\d{2}.\d{3}'));
time_mat(badIdx) = {[]};

6 Comments

Thanks Adam. That worked. The output that I am reading from a sensor through a serial port should ideally send a data packet which consists of time and x,y and z position data every 0.2s. Meaning, I should ideally get five values across every 1s. However, as this communication is happenig through a radio link there are always missiing data/packets.
Is there a smart way of interpolation to fill up the lost data and obtain a continuous stream (of time and position data).
For example if you were to take a close lool at the time matrix (after removing the bad elements), it would look something like this:
Index Time
110 '00:22:16.400'
111 '00:22:16.600'
112 '00:22:17.000'
113 '00:22:17.200'
...
'00:29:05.600'
'00:29:35.200'
'00:29:35.400'
'00:29:39.600'
'00:29:39.800'
'00:29:40.000'
'00:29:49.200'
'00:29:49.400'
There may be a similar missing pattern in the position record as well.
Thanks,
Ravi
Convert to a timeseries object and see the interpolation schemes there for one builtin solution.
@Ravi I updated my answer to show how to fill in the NaT values but as you mentioned, there are still missing samples. For example:
table(dtMat(110:115), dtMatFill(110:115),'VariableNames',{'datetime', 'filled'})
ans =
6×2 table
datetime filled
____________ ____________
00:22:16.400 00:22:16.400
00:22:16.600 00:22:16.600
NaT 00:22:16.800
00:22:17.000 00:22:17.000
00:22:17.200 00:22:17.200
00:22:17.800 00:22:17.800 % <-- missing .400
dpb 's suggestion above would fill in those as well.
Altho as Guillaume pointed out in another similar thread it's the timetable object that is probably the more useful that I was actually intending to recommend. I've yet to figure out any thing useful to do with the timeseries one itself.
@Ravi, on second thought, if you know the start time (time_mat(1)) and the sampling interval (0.2 sec), you could just produce the vector of time samples instead of reading them in .
% Convert your strings to datetime format
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
% Fill in the NaT values
dtMatFill = fillmissing(dtMat,'linear');
% sample interval
sampInt = seconds(0.2);
% Total duration of series
totalDur = dtMatFill(end) - dtMatFill(1);
% Expected number of samples given total time and sample interval
nSamples = floor(totalDur/sampInt);
% produce time series
dtMatComplete = dtMatFill(1) + (1:nSamples)'*sampInt;
As I am reading the date, time and position values (5 every 1s) real-time, the start time is kind of arbitrary. Since I have a dynamic system, I should check for missing sample(s) in the data flow and interpolate to fill the vacant spots.
I was out testing so didn't get a chance to test it further but I was hoping the method you suggested works on missing position data as well. It works fine for completing the time vector (after a quick check).
Thanks for your time and help.

Sign in to comment.

More Answers (2)

dpb
dpb on 1 Jun 2019
Edited: dpb on 1 Jun 2019
Use the datetime class is probably easiest...see if
tm=datetime(time_mat,'InputFormat','hh:mm:ss.SSS'); % convert to datetime; failures result in NaT
isnt=isnat(tm); % logical vector of those locations
>> time_mat(isnt) % the identified bum records...see if match expectations
ans =
11×1 cell array
{'2019/05/30' }
{'00:22Rover6' }
{'-2620517.2165'}
{'3.6' }
{'2019/05/30' }
{'0.1677' }
{'3954309.3750' }
{'2' }
{'2019/05/30' }
{'00Rover6' }
{'-4250201.7507'}
>> find(isnt) % the locations in the original vector
ans =
112
207
327
333
360
361
430
475
478
547
558
>>
ADDENDUM:
To fill in missing and otherwise clean up the transmission, something like the following:
tu=unique(tm); % there are some duplicated times
tt=timetable(tu,[1:numel(tu)].'); % build a time table from them
tt(isnat(tt.tu),:)=[]; % remove the NaT values to replace
ttnew=retime(tt,tt.tu(1):seconds(0.2):tt.tu(end),'linear'); % build a new table with interpolated values
There were two particular locations with same timestamp--
>> find(diff(t)==0)
ans =
45
139
>> t(40:50)
ans =
11×1 datetime array
...
12:22:00.2
12:22:00.4
12:22:00.4
12:22:00.8
...
What you do with those before you build the timetable I dunno--you could average them or select first/last ignoring the others as the above does...just depends on what's actually happening in your setup as to what you want to do, methinks...
After that, it's just make a new continuous time vector and interpolate -- the existing data will just be replaced with same, you can choose from alternate interpolating schemes as desired depending on the characteristics of the data you're collecting.
ADDENDUM 2:
You can make a more meaningful name for the time vector -- I was keeping separate variables for the original time and then the unique times, etc., so if I made a slip didn't have to go back more than one or two steps--so the tu got morphed into the table as the time variable name. You can fix this to more meaningful as
ttnew.Properties.DimensionNames(1)={'Time'};
for example. If do this before the retime then that's the variable name to use therein instead, of course.

3 Comments

Thank you so much for the input. I also have an issue of missing data packets and I was wondering if there is a method to obtain a continuous data stream. Please see my response to Adam.
See ammended answer...
@ dpb, thanks for your comments. I will explore the timeseries object. The issue is I am reading in date,time and position data from multiple sensors through an RF radio using a single COM port and even with the flow-control I see a lot of missing packets. (Which is usually the case with RF).
I will test your method and also follow Adams inputs to see if I can atleast read a continuous data stream on my end.
Thanks for your time and help.

Sign in to comment.

These don't strike me as being datetime values, they're duration values. The same technique others have suggested (try converting them and look for missing values) will work with duration as worked with datetime. One benefit of converting to duration is that there's no date information added. From the datetime help: "If INFMT does not include a date portion, datetime assumes the current day. If INFMT does not include a time portion, datetime assumes midnight."
time_mat = {'00:21:51.000','00:22:16.200','00:22:16.400','00:22:16.600',...
'2019/05/30','00:22:17.000','22Rover6','2620517.2165'}
dt = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS')
du = duration(time_mat)
Elements 5, 7, and 8 of both dt and du are missing and so can be identified using ismissing or removed with rmmissing.
ismissing(dt)
ismissing(du)
You could use either a datetime or a duration as the RowTimes in a timetable.

2 Comments

I have a hard time (so to speak! :) ) wrapping my head around a sampled timestamp being a duration, Steven. I grok it's the only way with the new classes one can have any time standing alone without an associated date, but it still just doesn't seem right nomenclature.
I've not gotten comfortable-enough as yet with the duration to be able to tell if there's something that doesn't agree with the use that way, but it never occurs to me naturally as yet to make use that way.
I really fail to see why a datetime can't have a void date portion other than it wasn't designed to allow for it...with the venerable datenum it was simple to just save only the fractional day.
Maybe eventually I'll come to grips with "the new normal", but as yet it's still a stretch... :)
A sampled timestamp is the amount of time that has elapsed since a certain basetime, right? The basetime could be the start of an experiment, the time a piece of hardware was turned on, or the start of a new day (midnight.) So the timestamp represents the duration of the experiment so far, the duration of the current run of that hardware, or the duration that's elapsed today.
datetime can answer the question "when?" while duration can answer the question "how long?" Upon rereading the original post, I can see that the data could be the answer to either of those questions. It could be thought of as representing when events occurred, it could also be thought of as representing how long after midnight (or the time the serial port became active) the events occurred. Since the expression in the data representing a date was unwanted, I interpreted it as the latter.

Sign in to comment.

Asked:

on 1 Jun 2019

Commented:

on 6 Jun 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!