18 views (last 30 days)

Show older comments

Does somebody have a idea? In the end the average value for 1 hour should be the same.

John D'Errico
on 19 Jul 2017

Note that any use of a standard interpolation tool will NOT ensure maintaining the hourly average. They simply are not designed to solve that problem.

Could you solve the problem with custom written interpolating spline code, designed to employ that constraint? Yes, I am sure you could do so. But I seriously doubt it is worth the effort to maintain the exact hourly average.

Andrei Bobrov
on 19 Jul 2017

TT = timetable(minutes(10*(1:10)'),randi([-20,100],30,1),'V',{'var'}); % Let TT - your data

TT_15min = retime(TT,minutes(15*(1:25)'),'pchip');

Grzegorz Knor
on 19 Jul 2017

t1 = 10:10:60

av1 = [3, 5, 6, 4, 6, 3]

t2 = 15:15:60

av2 = interp1(t1,av1,t2)

plot(t1,av1,'Marker','.')

hold on

plot(t2,av2,'Marker','o')

John D'Errico
on 19 Jul 2017

Edited: John D'Errico
on 19 Jul 2017

Note that any use of a standard interpolation tool will NOT ensure maintaining the hourly average. They simply are not designed to solve that problem.

Could you solve the problem with custom written interpolating spline code, designed to employ that constraint? Yes, I am sure you could do so. But I seriously doubt it is worth the effort to maintain an exact hourly average. As well, one would need to understand fully the goals.

Lets try a simple example, so we can see what is the problem.

% 3 hours worth of data.

T10 = 0:10:180;

Y10 = rand(size(T))

% Our new sampling should be at increments of

% 15 minutes. So our new times will be at

T15 = 0:15:180;

% simplistically, that means we could just retain

% the data at times [0,30,60,90,120,150,180] minutes.

% then average the values at 10 and 20 minutes to

% gain a new value at 15. Do the dame thing for 45

% minutes, taking the average from 40 and 50. This

% is implicitly what interp1 will do if you use

% linear interpolation. Is that right here? Not really.

Y15 = interp1(T10,Y10,T15);

plot(T10,Y10,'r-o',T15,Y15,'bx-')

So, the red curve is the 10 minute data. The blue curve is the 15 minute data, created using linear interpolation. As you can see, the 0,30,60,90... points are identical to the originals, and the intermediate points are the average of their neighbors.

Is that a good thing? Perhaps. It is simple. Even a spline interpolation would have been easy.

Y15s = interp1(T10,Y10,T15,'spline');

But does either maintain an hourly average? No. The problem arises from what I'll call a weighting issue. Lets compute the hourly averages to see:

% The hourly average of the three hours, for

% the 10 minute sampling was:

Ybar10 = mean(reshape(Y10(1:18),[],3))

Ybar10 =

0.68547 0.57097 0.32839

% Whereas if we compute the same thing for

% the 15 minute data, we get a different result.

Ybar15 = mean(reshape(Y15(1:12),[],3))

Ybar15 =

0.74018 0.60488 0.26241

So it is different. Is is seriously different? Not hugely so.

Again, the problem has an easily identifiable source, IF we write out what we did. The mean for the first hour is essentially:

(Y10(1) + Y10(2) + Y10(3) + Y10(4) + Y10(5) + Y10(6))/6

That is, we tke sae values at times [0,10,20,30,40,50], and average them. Then we do the same for times 60:110, and 120:170. This is a "rectangle rule" based hourly average. I could have been more sophisticated and split the value at time 60 between the two hourly averages, then done the same for the value at time 120. That would have been a "trapezoidal rule" scheme to compute an hourly average.

So why don't we get the same result for the 15 minute average? For the first hour, that would be written as

(Y15(1) + Y15(2) + Y15(3) + Y15(4))/4

So we use the values at times 0,15,30,45. to compute the first hourly average. But if we go back to the original data, this reduces to:

(Y10(1) + (Y10(2) + Y10(3))/2 + Y10(4) + (Y10(5) + y10(6))/2)/4

So we got the 15 minute time by averaging the results for time 10 and 20, etc. Expanding, we see the first hourly average, in terms of the original data:

Y10(1)/4 + Y10(2)/8 + Y10(3)/8 + Y10(4)/4 + Y10(5)/8 + y10(6)/8

So this "average" has different weights for each time than the basic average of 10 minute values. In that original average, the weights were all 1/6. Here the different sample have different importance in the final 15 minute compilation versus the 10 minute compilation.

The point of all this? Just using simple interpolation will almost always fail to produce a result that satisfies your stated goal of maintaining an hourly average. In fact, ANY interpolation, linear, spline, pchip, interp1, retime, etc., ALL will fail in general. (Only for very simple data will the results be correct. I.e., if your data is absolutely constant, then you can do anything you darn well wish to do, and anything will all work.)

Can you fix it? Well, yes I arguably can fix it. Is it worth the effort? Sigh. Do I really need to get that deeply into the linear algebra to show how to make it work perfectly? Note that the simple interpolation will not be too terrible if your curve is fairly smooth.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!