How to best modify FFT bin amplitudes before IFFT (DFT, windowing)?

Question

0 votes

I wish to do the following:

Read a mono 44.1kHz audio file.

Chop this audio in short overlapping (windowed?) segments.

Do FFT on these segments.

Read best as possible the amplitudes of the frequency bins.

Modify some of the amplitudes of some of these frequency bins (based on an algorithm I wrote).

With IFFT reconstruct the audio segments with these modified amplitudes of some of these frequency bins.

Stich together these audio segments to get an audio file which has the modifies amplitudes at certain frequencies at certain points in time with minimal side effects.

Now I'm mostly just beginning with Matlab and am looking for any relevant examples from which I can learn on how to do the above.

Also, some things are not yet clear to me regarding windowing and FFT.

For windowing. Am I correct in thinking that for the above example I can best window and overlap the short segments in such a way that by simply adding the windowed overlapping segments I get the original audio again? So for instance if I use triangular windowing with 50% overlap on both sides, that I will get the original audio back once I stitch these segments together again? Are there other windows that will work in this way? (for instance Hann?) Or am I altogether thinking wrong on how to best use windowing for what I want to do?

For FFT. I understand that the first half of the resulting frequency bins are the bins with the relevant amplitudes (for FFT length of 512, bins 0 to 255 represent the relevant frequencies and contain their amplitudes, bin 256 contains the nyquist if I understood correctly). The second half of the bins (257 to 512), can I just ignore those when modifying the amplitude of the first half? For instance if I have a 1kHz sine wave, do the FFT, modify the amplitude of the bin that contains the 1kHz tone by dividing the amplitude in half, then do an IFFT. Will the endresult be that 1kHz sine reduced in amplitude by 6dB or am I missing something?

Many thanks for any help / pointers!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

William Rose on 20 Sep 2021

1 vote

@Pythagorean,

You say "For FFT. I understand that the first half of the resulting frequency bins are the bins with the relevant amplitudes (for FFT length of 512, bins 0 to 255 represent the relevant frequencies and contain their amplitudes, bin 256 contains the nyquist if I understood correctly)."

That is not correct. For the FFT of a 512 point long segment, bin 0 is the scaled mean value of the signal. Its imaginary part will always be zero if the original signal is real. Bins 1-255 are the complex numbers representing half of the FFT. Let's call it the bottom half. We could also call it the positive frequency part of the FFT. Bin 256 contains the scaled amplitude of the component sinusoid at the Nyquist freuency (

). Its imaginary part will always be = 0, for any FFT with an even number of samples. Bins 257-511 are the other half ("top half", or negative frequency part) of the FFT. If the original signal is real, and they are, then the top half values will be the complex conjugates of the values in bins 1-255, where bin 257=conj(bin255), bin 258=conj(bin(254), ..., bin 511=conj(bin 1). Whtavever you do on the "low half" you must also do to the corresponding element on the "top half". Before you do the inverse FFT, be sure that the top half of the modified FFT is the complex conjugate of the flipped-around bottom half. If that is not true, then you will get complex numbers for the inverse FFT, and that indicates an error.

The other part of your question is: May I segment the signal, do FFTs, manipulate the FFTs, invert the manipulated FFTs, and paste the results back together, to get a signal whose frequencies have been "shaped", as if with a grpahic equalizer? The answer is you may, but you will probably end up with glitches at the segment boundaries. Initially, the signal is smooth across the segment boundaries. If you do an FFT and inverse FFT of each segment, without mean or trend removal, and without any frequency adjustments, you can paste the inverse FFT segments together and get back the original signal exactly. But if you do mean or trend removal or other adjustment of particular frequencies, then the pasted-together signal will have glitches, or discontinuities, at the segment boundaries. This is true for bothe overlapping and non-overlapping segmentation.

Another way of understanding the issue is that the sampling of the signal in the frequency domain is different with segmented signals than with the original signal. You lose samples of the "in-between" frequencies, including the lowest frequencies. Example: Suppose the original signal is sampled at Fs=1000 Hz, for N=1000 samples. Then the frequencies of the FFT are 0, 0.001, 0.002, ..., 0.498, 0.499, 0.500 Hz. Now I divide it into 10 segments of duration Nseg=100 points each. The frequencies of the FFT of each segment are 0, 0.010, 0.020, ..., 0.480, 0.490, 0.500 Hz.

14 Comments
Show 12 older comments Hide 12 older comments

Pythagorean on 22 Sep 2021

@William Rose Thanks. Though just when I thought I had it figured out a new problem presents itself.

The bin for a certain frequency has two numbers. A real number and an imaginary number. From this we can calculate the complex magnitude by sqrt(a^2+bi^2).

I thought that the real number represented magnitude and the imaginary number represented phase. But apparently I'm incorrect about this as it cannot be the case. Because if I have a bin with a large imaginary number I cannot bring the complex magnitude to 0 by only modifying the real number of that bin and not modifying the imaginary number. For instance, a bin with real number 0.0000 and imaginary number 11.0000 will not get a complex magnitude of 0 but will get a minimum complex magnitude of 11 because sqrt(0^2+11^2) = 11.

So I don't know for sure where my understanding is going wrong. Am I understanding complex magnitude wrongly and should I not see this as representing the amplitude of the bin? (unlikely) Or am I wrong in assuming the complex number represents phase and do I indeed need to modify both the real number and the imaginary number to be able to fully modify the amplitude of a certain bin/frequency? (seems like this is the case based on what I guess now)

If this is indeed the case, then in order to fully modify the amplitude (complex magnitude) of each bin I have to modify both the real number and the imaginary number of that bin. But I have no idea on how much to modify each (though perhaps by running various signals through fft I can get an idea). If it's easy to explain on how to do this / what's behind it then I would be avery thankful if you could help once more. If it's a long complex story, any pointers/links on what I should read up on would be great.

This fft thing has gotten a bit complex all of a sudden but I must get it right so I will not quit untill I do :)

William Rose on 24 Sep 2021

@Pythagorean,

If fs=sampling rate in Hz, and N=number of samples in signal x(i), and y=fft(x), then y is a vector of complex numbers with N elements. The vector of frequencies corresponing to the elements of y is

f=fs*(0:N-1)/N;

About half the frequencies in vector f are higher than the Nyquist frequency (

). Those are the "top half" frequencies of the fft. An alternate name for Nyquist frequency is "folding frequency", since the spectrum above

is the folded-over copy the spectrum from 0 to

.

Pythagorean on 24 Sep 2021

@William Rose Ah yes thanks. I understand how to plot the right frequencies but this isn't relevant to my plugin so I was lazy with the plots.

Still experimenting with different ways of tapering / windowing regarding precise frequency resolution vs spectral leakage. And the amount of resolution I actually need for my algorithm to work best.

The fft I'm doing on the band outputs of a linear phase perfect reconstruction filter bank (already made this in Matlab). So I can do shorter fft's on the higher frequency bands and longer fft's on the lower frequency bands. The endresult should be good enough frequency resolution and good enough time resolution. Trying to find the optimal balance for audio processing results.

Sign in to comment.

How to best modify FFT bin amplitudes before IFFT (DFT, windowing)?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

14 Comments
Show 12 older comments Hide 12 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

How to best modify FFT bin amplitudes before IFFT (DFT, windowing)?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

14 Comments Show 12 older comments Hide 12 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

14 Comments
Show 12 older comments Hide 12 older comments