Time based moving average - is my approach correct?

Question

0 votes

Hello Matlab folks,

I'd like to get a critique for my code because I am not sure if I am right. First of all, I have a set of data like

735728,605428241  80,6717800000000  11,6650710000000
735728,605451389  80,6717800000000  9,12566720000000
735728,605474537  80,1416200000000  9,24074830000000
735728,605497685  80,1416200000000  9,20919460000000
735728,605520833  80,1416200000000  9,47081590000000
735728,605543982  80,3848000000000  9,28115980000000
735728,605567130  80,3848000000000  8,55264740000000
735728,605590278  80,3848000000000  9,47107710000000
735728,605613426  80,5608400000000  9,42203640000000
735728,605636574  80,6196400000000  9,32575490000000
735728,605659722  80,6196400000000  9,25049830000000
735728,605682870  80,6998100000000  9,48914560000000
735728,605706019  80,6998100000000  9,52505890000000
735728,605729167  80,6998100000000  9,12462120000000

The first row is the time in format 'dd.mm.yyyy hh:mm:ss'. The sample rate is two seconds. Now I want to plane that data to get it a bit smoother and I want to do this by using a moving average of 60sec. To do so, I found the function movavg to be correct. I ended up with the following code

clc
clear all
close all
% Get the data
filepath=''; % Note: There is a path in the original code
fprintf('Path: %s\n', filepath)
fprintf('\nLoad totaldev ...\n')
totaldev = xlsread(strcat(filepath,'totaldev.xlsx'));
fprintf(' ...done!\nLoad plant_load ...')
plant_load = xlsread(strcat(filepath,'load.xlsx'));
fprintf(' ...done!\n')
data=[plant_load(:,1),plant_load(:,2),totaldev(:,2)];
data(:,1)=x2mdate(data(:,1));
fprintf('Prepare data ...\n')
% Remove values below 0 and above 100 (plant load is 0% to 100%
data(any(data(:,2)<0,2),:)=[];
data(any(data(:,2)>100,2),:)=[];
% Remove values below -100 and above 100 (error can't be more than +-100%)
data(any(data(:,3)<-100,2),:)=[];
data(any(data(:,3)>100,2),:)=[];
% Compute average of time series
% Sample rate is 2sec, so 30 values cover a minute
data_avg=[data(:,1), movavg(data(:,2),30,30), movavg(data(:,3),30,30)];
fprintf(' ...done!\n')
fprintf('\nCreate plot')
% Plot 'em
plot(data(:,1),data_avg(:,2),data(:,1),data_avg(:,3))
datetick('x','keepticks','keeplimits');

I'd like to know now if I did a mistake or if the average is wrong. Maybe someone can help.

Cheers from Germany

10 Comments
Show 8 older comments Hide 8 older comments

Cedric on 26 May 2014

Edited: Cedric on 26 May 2014

Open in MATLAB Online

I didn't read the whole discussion (sorry I don't have much time tonight), but I was wondering at first why you were not using a convolution.. e.g.

 avgSize   = 10 ;
 kernel    = 1/avgSize * ones( 1, avgSize ) ;
 movingAvg = conv( x, kernel, 'same' ) ;

Then I realized that you were removing entries, which breaks the regularity time wise. In such context, all methods which do not explicitly take the time (vector) into account will be wrong (if you have e.g. a 10 minutes gap in your data, the average before the start of the gap will take in account values from just after the gap and vice versa).

EDIT: here is a small example to illustrate:

 t = 0: 0.1 : 10 ;
 x = sin( t ) - 0.5 + rand( size( t )) ;
 avgSize = 20 ;
 kernel  = 1/avgSize * ones( avgSize, 1 ) ;
 xSmooth = conv( x, kernel, 'same' ) ;
 figure( 1 ) ;  clf ;  grid on ;  hold on ;
 plot( t, x, 'b' ) ;
 plot( t, xSmooth, 'g', 'LineWidth', 2 ) ;
 id = t >= 2 & t <= 4 ;
 t(id) = [] ;  x(id) = [] ;
 xSmooth = conv( x, kernel, 'same' ) ;
 plot( t, xSmooth, 'r' ) ;
 hRect = patch( [2,4,4,2], 1.5*[-1,-1,1,1], 'Yellow' ) ;
 set( hRect, 'EdgeColor', 'None' ) ;
 alpha( hRect, 0.2 ) ;

In this figure, the yellow region indicates missing data. The plot shows that the moving average based on missing data (red curve) on the left of the no-data region decreases dramatically instead of following the green curve (moving average including missing data), because it is skewed by values from the right of the patch (negative values after t=4), and vice versa.

Cedric on 26 May 2014

Edited: Cedric on 26 May 2014

Hi Star Strider,

Ok. I'll think more about it, but my guess is that most of the usual methods will fail to produce a correct block average when there are missing data/entries. One solution could be to build a minuteID (=1 for the first n<=29 entries, =2 for the next 29 entries, etc ) before removing invalid entries, and then to use ACCUMARRAY on remaining entries, using the minuteID as a block ID..

Star Strider on 26 May 2014

Were it my data, I’d isolate the one-minute segments, do the thresholding on the selected block of data, compute the mean of the rest, and continue on. That just seems to me to be the easiest way to do the processing. I assume the data and times are continuous in every record.

I am prepared to stand corrected...

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Star Strider on 26 May 2014

Open in MATLAB Online

1 vote

My pleasure!

If you only need to run the loops once, saving the results of the one-minute-means in a ‘.mat’ file, a loop isn’t much of a problem since you only have to do it once.

Assuming your matrix to be X, it’s probably as easy as:

X = rand(180,3);
for k1 = 1:fix(size(X,1)/30)
    for k2 = (k1-1)*30+1
        Xm(k1,:) = mean(X(k2:k2+29,:));
    end
end

That runs without error, with Xm storing the one-minute means. Be sure to use the save function to save the averaged matrices, then use load to retrieve them when you need them. Then you only have to do the loop calculation once for each data set.

4 Comments
Show 2 older comments Hide 2 older comments

Marc on 27 May 2014

Edited: John Kelly on 10 Nov 2017

Open in MATLAB Online

Thank you very much again star strider, that does the trick.

However, I was curios so I compared the results.

Figure 1 shows a close up of the two averages and the original data. The dotted red line is your computation, the black dashed line is the movavg version and the blue line shows the original data.

Figure 2 shows a wider range of time, the illustration of the values is the same as in Figure 1. As one can see, both averages are more or less the same, the difference is small.

Figure 3 shows the same time-window as Figure 2 but with the original data included. As one can see, the major aim, to smooth the data, is fulfilled by both computation methods.

The entire code is as follows.

clc
clear all
close all
% Get the data
filepath=' '; % Note: There is of course a path in the original file ;)
fprintf('Path: %s\n', filepath)
if exist(strcat(filepath,'\','tdevload.m'),'file')
    fprintf('Load data...\n')
    load(strcat(filepath,'\','tdevload.m'),'-mat');
    fprintf(' ...done\n')
else
    fprintf('\nLoad from Excel...\nLoad totaldev ...\n')
    totaldev = xlsread(strcat(filepath,'totaldev.xlsx'));
    fprintf(' ...done!\nLoad plant_load ...\n')
    plant_load = xlsread(strcat(filepath,'load.xlsx'));
    fprintf(' ...done!\n')
    data=[plant_load(:,1),plant_load(:,2),totaldev(:,2)];
      data(:,1)=x2mdate(data(:,1));
      fprintf('Prepare data ...\n')
      % Remove values below 0 and above 100 (plant load is 0% to 100%
      data(any(data(:,2)<0,2),:)=[];
      data(any(data(:,2)>100,2),:)=[];
      % Remove values below -100 and above 100 (error can't be more than +-100%)
      data(any(data(:,3)<-100,2),:)=[];
      data(any(data(:,3)>100,2),:)=[];
      % Compute average of time series
      % Sample rate is 2sec, so 30 values cover a minute
      datam=zeros(fix(size(data,1)/30),3);
      for k1 = 1:fix(size(data,1)/30)
          for k2 = (k1-1)*30+1
              datam(k1,:) = mean(data(k2:k2+29,:));
          end
      end
      data_avg=[data(:,1), movavg(data(:,2),30,30), movavg(data(:,3),30,30)];
      data_avg(1:30,2)=data(1:30,2);
      data_avg(1:30,3)=data(1:30,3);
      fprintf(' ...done!\n')
  save(strcat(filepath,'\','tdevload.m'));
end
fprintf('\nCreate plot')
% Plot 'em
hold on
plot(data(:,1),data(:,2),data(:,1),data(:,3))
plot(data_avg(:,1),data_avg(:,2),'--k',data(:,1),data_avg(:,3),'--k')
plot(datam(:,1),datam(:,2),':r',datam(:,1),datam(:,3),':r')
NumTicks = 20;
L = get(gca,'XLim');
set(gca,'XTick',linspace(L(1),L(2),NumTicks))
datetick('x','HH:MM','keepticks','keeplimits');

It is really fast, especially with the pre-computed values stored in the .m file :)

Thanks again, this forum is of great support and help!

Cheers from Germany

Marc on 27 May 2014

No, there's nothing wrong with curiosity. But you should be passionately curious to do science. However, you shouldn't be too curious because then you will write a whole lot of interesting but very unnecessary stuff into your thesis ;)

Anyways, good to know that. There's nothing wrong with your signal processing perspective, your approach is definitely more correct so I will take it. My background is also sort of signal processing (electrical engineering, automation and information systems) but today I am doing more economics, feasibility studies and stuff like that ... so my Matlab times are a bit back in time ;)

Cheers!

Star Strider on 27 May 2014

I completely agree about being passionately curious about science. I pity those who are not. Being too curious is fine. Being selective and focused about your writing is also.

My applicable background here is in biomedical engineering (largely electrical), statistics and biomedical signal processing. So I’m sensitive about filters that aren’t designed for a specific purpose. They can do unkind things to one’s data.

Have fun!

Sign in to comment.

Answer 2

Marc on 25 May 2014

0 votes

A lot of discussion going on here :D So, does my code work as expected or do I see things where there are no?

1 Comment
Show -1 older comments Hide -1 older comments

Star Strider on 25 May 2014

I believe you are seeing ‘aliasing’. I don’t use moving average filters because they have undesirable frequency characteristics. If you want to filter out noise, I suggest a low pass filter.

You will likely need to do a Fourier transform on your data (if you have not already) to see its frequency content and make the appropriate filter choices. The Signal Processing Toolbox makes filter design and implementation quite easy.

Sign in to comment.

Answer 3

Marc on 25 May 2014

0 votes

So, you mean I didn't compute the 1min average!? All I want to do is to produce one minute averages to smooth the data. That can't be too hard, can it?

2 Comments
Show None Hide None

Star Strider on 25 May 2014

If you want the average over each minute, the moving average filter is not your optimal choice. It takes the mean over samples 1:30 then 2:31, 3:32, etc. If that is what you want, use the moving average filter in spite of its suboptimal passband characteristics. (I don’t have the Financial Toolbox, so I can only simulate what I believe the movavg function does. It is not well-documented from a signal processing perspective.)

I infer by a ‘ 1 min average ’ that you want the average of the first minute (samples 1:30), the second minute (samples 31:60), etc. The documentation for mean has all the information you need to do that.

Marc on 26 May 2014

Thank you star strider. I agree with you, that I want the average minute by minute. I also agree with you, that I am not sure if movavg does what I want.

I know that I can use the function mean in a for loop to compute the average window by window. However, my matrices are quite long, around 119.000 rows, and I wanted to avoid using loops because I expect the computation to be quite long (ten matrices with a maximum of 43 vectors and a length of about 119.000...). That is why I wanted to use a built in function :)

Anyways, I don't see anything in the documentation of the function mean if I can use it to compute the average window by window without using a loop. Is that because there is only the loop-way or am I blind ;)?

Thanks in advance.

Sign in to comment.

Time based moving average - is my approach correct?

10 Comments
Show 8 older comments Hide 8 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Time based moving average - is my approach correct?

10 Comments Show 8 older comments Hide 8 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

10 Comments
Show 8 older comments Hide 8 older comments

4 Comments
Show 2 older comments Hide 2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None