Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Replacing gaps with NaN

Subject: Replacing gaps with NaN

From: Pawel

Date: 19 Apr, 2013 15:34:09

Message: 1 of 3

My data looks like this:
ID Date Temperature Humidity Pressure
12100 2012-12-01 00:00:00.000 0,3 96 1029,1
12100 2012-12-01 01:00:00.000 0,9 98 1029,3
12100 2012-12-01 02:00:00.000 1,2 91 1029,2
12100 2012-12-01 04:00:00.000 2,8 80 1029,6
...
12105 2012-12-01 00:00:00.000 2,2 87 1029,9
12105 2012-12-01 01:00:00.000 1,3 88 1029,8
...

Is it for 62 days (2012-12-01 to 2013-01-31) for 24 hours/day and should be every 1 hour. But there are always some gaps in time series (ex. between 2 and 4hr), so I'd like to replace them by NaN's. My idea was to convert dates to single number (datenum) so I probably need to erase all '-' and ':' in date column to get this

12105 2012 12 01 00 00 00.000 2,2 87 1029,9
12105 2012 12 01 01 00 00.000 1,3 88 1029,8

and finally create a matrix like below, with NaN's for every hour:

12105 735204 NaN NaN NaN
12105 735204.041666667 NaN NaN NaN
12105 735204.083333333 NaN NaN NaN
12105 735204.125 NaN NaN NaN

Now my problem. Is it possible to update somehow this matrix with NaN's with data from first matrix to get something like this:

ID Date Temperature Humidity Pressure
12100 735204 0,3 96 1029,1
12100 735204.041666667 0,9 98 1029,3
12100 735204.083333333 1,2 91 1029,2
12100 735204.125 NaN NaN NaN
12100 735204.20833333 2,8 80 1029,6

and so on for every day and every ID? Or maybe you have different and better idea. It will be helpful for me because data file has many thousands of line and I don't really want to manually find these gaps.

Subject: Replacing gaps with NaN

From: dpb

Date: 19 Apr, 2013 18:39:38

Message: 2 of 3

On 4/19/2013 10:34 AM, Pawel wrote:
> My data looks like this:
...

> Is it for 62 days (2012-12-01 to 2013-01-31) for 24 hours/day and should
> be every 1 hour. But there are always some gaps in time series (ex.
> between 2 and 4hr), so I'd like to replace them by NaN's. My idea was to
> convert dates to single number (datenum) so I probably need to erase all
> '-' and ':' in date column to get this
>
> 12105 2012 12 01 00 00 00.000 2,2 87 1029,9
> 12105 2012 12 01 01 00 00.000 1,3 88 1029,8
>
> and finally create a matrix like below, with NaN's for every hour:
>
> 12105 735204 NaN NaN NaN
> 12105 735204.041666667 NaN NaN NaN
> 12105 735204.083333333 NaN NaN NaN
> 12105 735204.125 NaN NaN NaN
>
> Now my problem. Is it possible to update somehow this matrix with NaN's
> with data from first matrix to get something like this:
>
> ID Date Temperature Humidity Pressure
> 12100 735204 0,3 96 1029,1
> 12100 735204.041666667 0,9 98 1029,3
> 12100 735204.083333333 1,2 91 1029,2
> 12100 735204.125 NaN NaN NaN
> 12100 735204.20833333 2,8 80 1029,6
>
> and so on for every day and every ID? Or maybe you have different and
> better idea. It will be helpful for me because data file has many
> thousands of line and I don't really want to manually find these gaps.

A few ideas...flesh out and simplify as suits...

Don't need to do anything w/ the dates but use the proper formatting in
datenum...I made a file that holds just the first section...

 >> type papkin.dat

ID Date Temperature Humidity Pressure
12100 2012-12-01 00:00:00.000 0.3 96 1029.1
12100 2012-12-01 01:00:00.000 0.9 98 1029.3
12100 2012-12-01 02:00:00.000 1.2 91 1029.2
12100 2012-12-01 04:00:00.000 2.8 80 1029.6

Read it in...

 >> fid=fopen('papkin.dat','rt');
 >> fmt=['%d %s %s' repmat('%f',1,3)];
 >> d=textscan(fid,fmt,'headerlines',1,'collectoutput',1);
 >> fid=fclose(fid);

Convert to datenum the datestamp strings..
 >> dn=datenum([char(d{2}{:,1}) char(d{2}{:,2})], ...
             'yyyy-mm-ddHH:MM:SS.FFF');
Make a vector of hourly datenums for insertion purposes...
 >> dhr=datenum(2012,12,1,[0:4]',0,0);
And the data array to fill...
 >> data=nan(length(dhr),5);
 >> data(ismember(dhr,dn),1)=d{1}; %The ID column
 >> data(:,2)=dhr; % stuck in datenum; use timestamp instead
And now paste on the floating point array at the end...
 >> data(ismember(dhr,dn),3:end)=d{3}(1:4,:)
data =
    1.0e+05 *
     0.1210 7.3520 0.0000 0.0010 0.0103
     0.1210 7.3520 0.0000 0.0010 0.0103
     0.1210 7.3520 0.0000 0.0009 0.0103
        NaN 7.3520 NaN NaN NaN
     0.1210 7.3520 0.0000 0.0008 0.0103
 >>

That's in a bunch of steps to give you ideas/practice in manipulating.
It could be done in just a couple of steps; the question I didn't think
too much about is the multiple ID sections.

The only real "trick" in any of the above is the use of ISMEMBER to get
the locations of the right rows that match the dates w/ the logical
addressing vector--

--

Subject: Replacing gaps with NaN

From: Pawel

Date: 19 Apr, 2013 22:31:09

Message: 3 of 3

Thank you so much. I had many questions after first use of your code, but now I think I can understand what you've written. But I can't promise I won't need your help again so more questions possible. Thank you again.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us