Identifying gaps in time-series data

90 views (last 30 days)
Hi there!
I'm fairly new and inexperienced at Matlab so please forgive me if my question is either really easy to solve or if my question is hard to understand. I will try to be as clear as possible!
I am currently trying to write a code which will detect gaps in time-series data. These gaps are not recorded as NaN or 0 values in the raw data, rather they are simply skipped over and no gap is recorded.
Ex: 20110222T180326.761Z 3.216852
20110222T180327.011Z 3.216852
20110222T180844.239Z 3.216751
20110222T180844.489Z 3.216751
Here you can see there is a gap between 18:03:27 and 18:08:44. I want to be able to write a code that can detect these gaps quickly so that I don't have to manually find them.
The process of detecting these gaps is also slightly more complicated because I am not using a full version of Matlab. Rather, the project I am working for (Neptune Canada) provides users the ability to write and run code from our website but this doesn't allow access to many of Matlabs features besides the command window. As a result, I am having trouble manipulating the data because of the specific way the data must be accessed.
Release: R2009a (Coming Soon R2011a) Toolboxes: Database, Statistics
The code that I have at the moment is as follows:
% Get current time for a reference of processing time
tic
% Device and Sensor IDs:
deviceid=22663;
neptune(1).sensorID=7253; % conductivity
neptune(2).sensorID=7254; % pressure
neptune(3).sensorID=7255; % salinity
% Enter start date and end date for desired time series using Oracle timestamp data type. For example:
startdate='22-Feb-2011 06:00:00 PM';
enddate='22-Feb-2011 07:00:00 PM';
% RETRIEVE TIME SERIES AND BASIC METADATA FOR EACH SENSOR
[neptune,meta,nd]=getData(neptune,startdate,enddate,deviceid);
% neptune is a structure array containing the data for each sensor
% meta is a structure containing basic metadata
% nd is a boolean which will be 1 if there is time-series data, and 0 otherwise.
I didn't write this code myself, it was provided as a template to the user to access data from the database for a specific instrument. For this specific instrument I know that the sampling period is 0.250 seconds so I wrote a code to determine the difference between timestamps as:
t=datenum(neptune(1).time,30);
dt=diff(t)*86400;
I thought I could somehow find all the instances where dt>0.500 and insert NaN but I am not sure how to do that and I'm also not sure that would even solve my problem since I want to know the specific time where the gap occurs so that when I plot the data it appears as a gap not as line connecting the two points.
I know this is a long-winded, wordy question but if anyone has any suggestions I could try or resources I could refer to I would really appreciate it. I am a Co-op student and I have used Matlab before but not with large data sets and especially not with data sets that are missing information! Please let me know what you think or if I need to elaborate further.
Cheers.
  1 Comment
Fangjun Jiang
Fangjun Jiang on 18 Jul 2011
You explained your problem well although many of the info are irrelevant. Please provide an example data of "t" and explain what outcome do you expect. I am sure a solution could be found.

Sign in to comment.

Accepted Answer

Oleg Komarov
Oleg Komarov on 18 Jul 2011
EDIT per Fangjun's suggestion
% Suppose input is (two gaps over the threshold)
neptune.time = datenum({'20110222T180326.761'
'20110222T180327.011'
'20110222T180844.239'
'20110222T180844.444'
'20110222T180844.665'
'20110222T181874.239'}, 'yyyymmddTHHMMSS.FFF');
% Threshold, all gaps, idx of bad gaps
threshold = 1/86400*0.25;
gap = diff(neptune.time);
idx = find(gap > threshold);
% Gaps that exceed threshold
gap(idx)
% Dates
strcat(datestr(neptune.time(idx),'yyyy/mm/dd HH:MM:SS'),...
' -',...
datestr(neptune.time(idx+1),' yyyy/mm/dd HH:MM:SS'))
Note: the threshold for 0.25 secs sampling is:
1/86400*0.25
  13 Comments
Fangjun Jiang
Fangjun Jiang on 20 Jul 2011
@Oleg, I like this better. Why don't you include this with the expanded example data to make it a complete answer? It shows a good example illustrating date manipulation.
Brianne Zimmerman
Brianne Zimmerman on 20 Jul 2011
Using strcat(datestr(neptune.time(idx-1),'yyyy/mm/dd HH:MM:SS'),' -', datestr(neptune.time(idx),' yyyy/mm/dd HH:MM:SS')) worked!!
THANK YOU! For some reason the technique below wouldn't work. It had a problem with using datenum(neptune.time,'yyy...') because the "input was not an array of strings" (???). Is this because neptune.time returns the time data in 1.0e+05 * 7.3456 format (vector format if I'm understanding correctly)?
Anyways, I really appreciate all the help the two of you have given me. I have another question about replacing these "gaps" with NaN so that the user can view the entire timeseries with gaps marked by NaN. Should I post a new question to the Answers forum?
Cheers
Brianne

Sign in to comment.

More Answers (1)

Fangjun Jiang
Fangjun Jiang on 19 Jul 2011
Okay, I am not trying to steal the answer from Oleg. Here is an example using the approach he suggested.
In = {'20110222T180326.761'
'20110222T180327.011'
'20110222T180844.239'
'20110222T180854.239'
'20110222T180964.239'
'20110222T181874.239'};
a=datenum(In, 'yyyymmddTHHMMSS.FFF');
idx=find(diff(a)>0.003);
left=In(idx);
right=In(idx+1);
result=[left right]';
output=sprintf('%s -> %s\n',result{:})
  4 Comments
Fangjun Jiang
Fangjun Jiang on 20 Jul 2011
@Walter, I though that (left = In{idx}; right = In{idx+1}; result = [left ' ' right]) would work too. But when I tried it, it only gives one line.
Fangjun Jiang
Fangjun Jiang on 20 Jul 2011
@Brianne Zimmerman, to get your desired format shouldn't be hard. See edited answer above.

Sign in to comment.

Categories

Find more on Structures in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!