Return the moving average of 2D matrix X, given a running window w
Y1 = fastmovav(X, w);
Y2 = accuratemovav(X, w);
The two functions return the moving average for the columns of matrix X, given a running window w. The first element of each column in Y1 or Y2 is the sample mean of the first w elements of the corresponding column of X. Thus, if X is mbyn, Y1 and Y2 are (mw+1)byn. The sliding window w must be greater than 1 and not greater than m.
The main functions are in two mex files, "fastmovingaverage.c" and "accuratemovingaverage.c", which must be compiled before using the functions in "fastmovav.m" and "accuratemovav.m".
The function fastmovav is extremely fast compared to any other alternative Matlab function that can be currently found on the web and that I am aware of. The drawback is that it may be slightly inaccurate.
The function accuratemovav is less fast than fastmovav but more accurate, and it is still very fast compared to any other alternative I am aware of.
% Example
% First of all, compile the mex functions:
mex fastmovingaverage.c
mex accuratemovingaverage.c
% Some data:
T = 5000; N = 300; X = cumsum(randn(T, N)); w = ceil(T / 2);
% Fast moving average
tic, Y1 = fastmovav(X, w); toc
% Elapsed time is 0.034531 seconds.
% Accurate moving average
tic, Y2 = accuratemovav(X, w); toc
% Elapsed time is 6.504260 seconds.
% Check slow alternative with loops
tic
Y3 = zeros(size(X, 1)  w + 1, size(X, 2));
for hh = 1:(size(X, 1)  w + 1)
Y3(hh, :) = mean(X(hh:hh+w1, :));
end
toc
% Elapsed time is 43.973435 seconds.
% Check accuracy
all(abs(Y1(:)  Y3(:)) < 1e12)
all(abs(Y1(:)  Y3(:)) < 1e15)
all(abs(Y2(:)  Y3(:)) < 1e15)
% Another example with fastmovav  try a large matrix
T = 1e4; N = 2e3; X = cumsum(randn(T, N)); w = ceil(T / 2);
tic, Y = fastmovav(X, w); toc
% Elapsed time is 0.597617 seconds.
1.5  minor edits 

1.4  Improvements following Jan Simon's comment 

1.3  Radical improvements. There are two functions now: fastmovav is fast and slightly inaccurate; accuratemovav is slower but more accurate. 

1.2  Small improvement of the first example reported in the description, added some information regarding trade off accuracy vs speed of the computation, added details for sliding window. 

1.1  Added few lines of explanation in the description above. 
Liber Eleutherios (view profile)
@Oleg Komarov In the example I am comparing the accuracy of Y2 and Y3 and I show that 'accuratemovav' is as accurate as can possibly be. Using 'filter' will be faster than looping but obviously **less accurate**:
tic, Y4 = filter(ones(1,w)/w,1,X); toc, Y4(1:2499, :) = [];
all(abs(Y4(:)  Y3(:)) < 1e12)
all(abs(Y4(:)  Y3(:)) < 1e15)
For this reason you should compare its speed with 'fastmovav', not with 'accuratemovav'. In fact you are supposed to use the latter only if you believe that accuracy is going to be an issue.
Liber Eleutherios (view profile)
@Oleg Komarov In the example I am comparing the accuracy of Y1 and Y3 and I show that 'accuratemovav' is as accurate as can possibly be. Using 'filter' will be faster than the loop but obviously **less accurate**:
tic, Y4 = filter(ones(1,w)/w,1,X); toc, Y4(1:2499, :) = [];
all(abs(Y4(:)  Y3(:)) < 1e12)
all(abs(Y4(:)  Y3(:)) < 1e15)
Oleg Komarov (view profile)
A loop is definitely a slow solution, but you can also use filter(), which is still significantly slower than fastmovav() but also significantly faster than the accurate version:
Y2 = filter(ones(1,w)/w,1,X);
Liber Eleutherios (view profile)
@Jan Simon, thank you for your very useful comment.
I have just submitted the new functions with some of the improvements you suggested.
"size(size(X), 2)" is now replaced by "ndims(X)". The inplace operation "a = a + b" is now replaced by "a += b" and "a = a / b" is avoided in favour of "a /= b".
My understanding is that, in the mex functions, counters "hh", "ii" and "jj" should be of type "mwSignedIndex"; "w", "m", "n" and "m2" should be of type "mwSize". This is what I have done, please let me know if you think that different types should be applied.
Checks are still in the .m file as 1) I am not confident enough to implement them in the mex file yet (perhaps I will do this in the next weeks) and 2) I reckon checks are a reasonable price to pay (and if some user thinks otherwise they can still call the mex functions directly).
Jan Simon (view profile)
Calculating the sum at first an dividing by the window length afterwards is not efficient, because this doubles the memory requirements. It would be faster to perform the division directly in the Ccode.
The inplace operations in C are nicer and faster:
y[ii * m2] = y[ii * m2] + x[ii * m + hh];
Better:
y[ii * m2] += x[ii * m + hh]
Using INT as type for indexing operations is a bad idea in times of 32 and 64 bit machines. Better rely on the well defined types mwIndex, mwSignedIndex and mwSize.
Usind NDIMS() is slightly nicer than "size(size(X), 2) == 2". The overhead of checking the size and type of the input is much smaller, when you implement it inside the MEX function. For small arrays almost all of the time is spent with the checks in the Mfile now, while such checks have almost no measurable dely inside a MEX function.
Liber Eleutherios (view profile)
@Jos, thank you for your comment. Your function is fast but has some limitations: 1. apparently accepts only odd numbers for the sliding window; 2. it's certainly slower than my function (please test this with large matrices, for example 10000by3000 with a window equal to 5001); 3. it doesn't seem to provide more accurate results than mine.
Earlier today I have submitted a new version where two functions are proposed, one is very fast and a bit inaccurate and the other is very accurate and a bit slower. In few hours it should be visible. Please let me know your opinion  I think that the moving average is so important that it should have a builtin function, a mex function is probably the closest option.
Possible drawbacks of my functions: they don't accept NaN values; only accept double precision values; only work for column vectors or 2D matrices; etc.
Jos (10584) (view profile)
Did you check other submissions like the one using Acklam's cumsum trick, implemented here:
http://www.mathworks.com/matlabcentral/fileexchange/10113runmean