Time based moving average - is my approach correct?
Show older comments
Hello Matlab folks,
I'd like to get a critique for my code because I am not sure if I am right. First of all, I have a set of data like
735728,605428241 80,6717800000000 11,6650710000000
735728,605451389 80,6717800000000 9,12566720000000
735728,605474537 80,1416200000000 9,24074830000000
735728,605497685 80,1416200000000 9,20919460000000
735728,605520833 80,1416200000000 9,47081590000000
735728,605543982 80,3848000000000 9,28115980000000
735728,605567130 80,3848000000000 8,55264740000000
735728,605590278 80,3848000000000 9,47107710000000
735728,605613426 80,5608400000000 9,42203640000000
735728,605636574 80,6196400000000 9,32575490000000
735728,605659722 80,6196400000000 9,25049830000000
735728,605682870 80,6998100000000 9,48914560000000
735728,605706019 80,6998100000000 9,52505890000000
735728,605729167 80,6998100000000 9,12462120000000
The first row is the time in format 'dd.mm.yyyy hh:mm:ss'. The sample rate is two seconds. Now I want to plane that data to get it a bit smoother and I want to do this by using a moving average of 60sec. To do so, I found the function movavg to be correct. I ended up with the following code
clc
clear all
close all
% Get the data
filepath=''; % Note: There is a path in the original code
fprintf('Path: %s\n', filepath)
fprintf('\nLoad totaldev ...\n')
totaldev = xlsread(strcat(filepath,'totaldev.xlsx'));
fprintf(' ...done!\nLoad plant_load ...')
plant_load = xlsread(strcat(filepath,'load.xlsx'));
fprintf(' ...done!\n')
data=[plant_load(:,1),plant_load(:,2),totaldev(:,2)];
data(:,1)=x2mdate(data(:,1));
fprintf('Prepare data ...\n')
% Remove values below 0 and above 100 (plant load is 0% to 100%
data(any(data(:,2)<0,2),:)=[];
data(any(data(:,2)>100,2),:)=[];
% Remove values below -100 and above 100 (error can't be more than +-100%)
data(any(data(:,3)<-100,2),:)=[];
data(any(data(:,3)>100,2),:)=[];
% Compute average of time series
% Sample rate is 2sec, so 30 values cover a minute
data_avg=[data(:,1), movavg(data(:,2),30,30), movavg(data(:,3),30,30)];
fprintf(' ...done!\n')
fprintf('\nCreate plot')
% Plot 'em
plot(data(:,1),data_avg(:,2),data(:,1),data_avg(:,3))
datetick('x','keepticks','keeplimits');
I'd like to know now if I did a mistake or if the average is wrong. Maybe someone can help.
Cheers from Germany
10 Comments
dpb
on 25 May 2014
It should be easy enough to manually compute
mean(data(1:30,2:3))
mean(data(2:31,2:3))
and see, shouldn't it?
>> which movavg
'movavg' not found.
>>
so not sure if it is working as above as a moving window or an averaging window. Is the output length/30 or length-30?
Star Strider
on 25 May 2014
The movavg function is in the Financial Toolbox. Documentation doesn’t say anything other than that it exists, so I imagine it’s a ‘ones(1,60)/60’ filter.
To view its frequency response:
freqz((1/60)*ones(60,1), 1, [], 0.5)
title('Frequency Response of ‘Moving Average’ Filter, Fs = 0.5')
xlabel('Frequency (Hz)')
dpb
on 25 May 2014
Actually, reading that, it's more than that as it is both a leading and lagging moving average. The OP has returned the first return value only which is the leading so I doubt it's what he intended, just guessing.
Star Strider
on 25 May 2014
I thought about doing filter with delays, but that involved generating a signal, filtering, doing ffts, and calculating the transfer function. Using freqz was easier.
dpb
on 25 May 2014
Yeah, I was simply commenting on the guts of movavg and what I suspect the poster had in mind given the question...
Star Strider
on 25 May 2014
The problem as I see it is that with a sampling period of 2s, the Nyquist frequency is 0.25 Hz.
I didn't read the whole discussion (sorry I don't have much time tonight), but I was wondering at first why you were not using a convolution.. e.g.
avgSize = 10 ;
kernel = 1/avgSize * ones( 1, avgSize ) ;
movingAvg = conv( x, kernel, 'same' ) ;
Then I realized that you were removing entries, which breaks the regularity time wise. In such context, all methods which do not explicitly take the time (vector) into account will be wrong (if you have e.g. a 10 minutes gap in your data, the average before the start of the gap will take in account values from just after the gap and vice versa).
EDIT: here is a small example to illustrate:
t = 0: 0.1 : 10 ;
x = sin( t ) - 0.5 + rand( size( t )) ;
avgSize = 20 ;
kernel = 1/avgSize * ones( avgSize, 1 ) ;
xSmooth = conv( x, kernel, 'same' ) ;
figure( 1 ) ; clf ; grid on ; hold on ;
plot( t, x, 'b' ) ;
plot( t, xSmooth, 'g', 'LineWidth', 2 ) ;
id = t >= 2 & t <= 4 ;
t(id) = [] ; x(id) = [] ;
xSmooth = conv( x, kernel, 'same' ) ;
plot( t, xSmooth, 'r' ) ;
hRect = patch( [2,4,4,2], 1.5*[-1,-1,1,1], 'Yellow' ) ;
set( hRect, 'EdgeColor', 'None' ) ;
alpha( hRect, 0.2 ) ;

In this figure, the yellow region indicates missing data. The plot shows that the moving average based on missing data (red curve) on the left of the no-data region decreases dramatically instead of following the green curve (moving average including missing data), because it is skewed by values from the right of the patch (negative values after t=4), and vice versa.
Star Strider
on 26 May 2014
Hi Cedric,
My impression is that what Marc wants are block averages for every minute. I hypothesized some code to do that, so we’ll see if my impression is correct.
Hi Star Strider,
Ok. I'll think more about it, but my guess is that most of the usual methods will fail to produce a correct block average when there are missing data/entries. One solution could be to build a minuteID (=1 for the first n<=29 entries, =2 for the next 29 entries, etc ) before removing invalid entries, and then to use ACCUMARRAY on remaining entries, using the minuteID as a block ID..
Star Strider
on 26 May 2014
Were it my data, I’d isolate the one-minute segments, do the thresholding on the selected block of data, compute the mean of the rest, and continue on. That just seems to me to be the easiest way to do the processing. I assume the data and times are continuous in every record.
I am prepared to stand corrected...
Accepted Answer
More Answers (2)
Marc
on 25 May 2014
0 votes
1 Comment
Star Strider
on 25 May 2014
I believe you are seeing ‘aliasing’. I don’t use moving average filters because they have undesirable frequency characteristics. If you want to filter out noise, I suggest a low pass filter.
You will likely need to do a Fourier transform on your data (if you have not already) to see its frequency content and make the appropriate filter choices. The Signal Processing Toolbox makes filter design and implementation quite easy.
Marc
on 25 May 2014
0 votes
2 Comments
Star Strider
on 25 May 2014
If you want the average over each minute, the moving average filter is not your optimal choice. It takes the mean over samples 1:30 then 2:31, 3:32, etc. If that is what you want, use the moving average filter in spite of its suboptimal passband characteristics. (I don’t have the Financial Toolbox, so I can only simulate what I believe the movavg function does. It is not well-documented from a signal processing perspective.)
I infer by a ‘ 1 min average ’ that you want the average of the first minute (samples 1:30), the second minute (samples 31:60), etc. The documentation for mean has all the information you need to do that.
Marc
on 26 May 2014
Categories
Find more on Statistics and Linear Algebra in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

