FFT-based FIR filtering using overlap-add method
y = fftfilt(d,x)
y = fftfilt(d,x,n)
y = fftfilt(gpuArrayb,gpuArrayX,n)
fftfilt filters data using the efficient
FFT-based method of overlap-add, a frequency
domain filtering technique that works only for FIR filters.
the data in vector
x with the filter described
by coefficient vector
b. It returns the data vector
The operation performed by
fftfilt is described
in the time domain by the difference equation:
An equivalent representation is the Z-transform or frequency domain description:
fftfilt chooses an FFT length
and a data block length that guarantee efficient execution time.
x is a matrix,
its columns. If
b is a matrix,
the filter in each column of
b to the signal vector
x are both matrices
with the same number of columns, the ith column
b is used to filter the ith
determine the length of the FFT. See Algorithms for information.
y = fftfilt(d,x,n) uses
determine the length of the FFT.
y = fftfilt(gpuArrayb,gpuArrayX,n) filters
the data in the
with the FIR filter coefficients in the
See Establish Arrays on a GPU (Parallel Computing Toolbox) for
details on gpuArray objects. Using
requires Parallel Computing Toolbox™ software and a CUDA-enabled
NVIDIA GPU with compute capability 1.3 or above. See http://www.mathworks.com/products/parallel-computing/requirements.html for
details. The filtered data,
y, is a gpuArray object.
See Overlap-Add Filtering on the GPU for
example of overlap-add filtering on the GPU.
fftfilt works for both real and complex inputs.
When the input signal is relatively large, it is advantageous
fftfilt instead of
which performs N multiplications for each sample
x, where N is the filter
fftfilt performs 2 FFT operations —
the FFT of the signal block of length L plus the
inverse FT of the product of the FFTs — at the cost of
where L is the block length. It then performs L point-wise multiplications for a total cost of
L + Llog2L = L(1 + log2L)
multiplications. The cost ratio is therefore
L(1+log2L) / (NL) = (1 + log2L)/N
which is approximately log2L / N.
fftfilt becomes advantageous when
log2Lis less than N.
filter is more efficient for smaller operands and
fftfilt is more efficient for large operands. Filter
random numbers with two random filters: a short one, with 20 taps, and a long one, with 2000. Use
toc to measure the execution times. Repeat the experiment 100 times to improve the statistics.
rng default N = 100; shrt = 20; long = 2000; tfs = 0; tls = 0; tfl = 0; tll = 0; for kj = 1:N x = rand(1,1e6); bshrt = rand(1,shrt); tic sfs = fftfilt(bshrt,x); tfs = tfs+toc/N; tic sls = filter(bshrt,1,x); tls = tls+toc/N; blong = rand(1,long); tic sfl = fftfilt(blong,x); tfl = tfl+toc/N; tic sll = filter(blong,1,x); tll = tll+toc/N; end
Compare the average times.
fprintf('%4d-tap filter averages: fftfilt: %f s; filter: %f s\n',shrt,tfs,tls)
20-tap filter averages: fftfilt: 0.477844 s; filter: 0.010168 s
fprintf('%4d-tap filter averages: fftfilt: %f s; filter: %f s\n',long,tfl,tll)
2000-tap filter averages: fftfilt: 0.063682 s; filter: 0.084514 s
The following example requires Parallel Computing Toolbox software and a CUDA-enabled NVIDIA GPU with compute capability 1.3 or above. See http://www.mathworks.com/products/parallel-computing/requirements.html for details.
Create a signal consisting of a sum of sine waves in white Gaussian additive noise. The sine wave frequencies are 2.5, 5, 10, and 15 kHz. The sampling frequency is 50 kHz.
Fs = 50e3; t = 0:1/Fs:10-(1/Fs); x = cos(2*pi*2500*t)+0.5*sin(2*pi*5000*t)+0.25*cos(2*pi*10000*t)+ ... 0.125*sin(2*pi*15000*t)+randn(size(t));
Design a lowpass FIR equiripple filter using
d = designfilt('lowpassfir','SampleRate',Fs, ... 'PassbandFrequency',5500,'StopbandFrequency',6000, ... 'PassbandRipple',0.5,'StopbandAttenuation',50); B = d.Coefficients;
Filter the data on the GPU using the overlap-add method.
Put the data on the GPU using
the output to the MATLAB® workspace using
plot the power spectral density estimate of the filtered data.
y = fftfilt(gpuArray(B),gpuArray(x)); periodogram(gather(y),rectwin(length(y)),length(y),50e3)
implement the overlap-add method , a technique that combines successive
frequency domain filtered blocks of an input sequence.
an input sequence
x into length
blocks, where L must be greater than the filter length N.
and convolves each block with the filter
y = ifft(fft(x(i:i+L-1),nfft).*fft(b,nfft));
nfft is the FFT length.
successive output sections by
n-1 points, where
the length of the filter, and sums them.
fftfilt chooses the key parameters
different ways, depending on whether you supply an FFT length
on the lengths of the filter and signal. If you do not specify a value
n (which determines FFT length),
these key parameters automatically:
length(x)is greater than
values that minimize the number of blocks times the number of flops
length(b) is greater than or
a single FFT of length
2^nextpow2(length(b) + length(x) - 1)
This essentially computes
y = ifft(fft(B,nfft).*fft(X,nfft))
If you supply a value for
an FFT length,
a data block length of
n is less than
 Oppenheim, Alan V., Ronald W. Schafer, and John R. Buck. Discrete-Time Signal Processing. 2nd Ed. Upper Saddle River, NJ: Prentice Hall, 1999.