How to best modify FFT bin amplitudes before IFFT (DFT, windowing)?

I wish to do the following:
Read a mono 44.1kHz audio file.
Chop this audio in short overlapping (windowed?) segments.
Do FFT on these segments.
Read best as possible the amplitudes of the frequency bins.
Modify some of the amplitudes of some of these frequency bins (based on an algorithm I wrote).
With IFFT reconstruct the audio segments with these modified amplitudes of some of these frequency bins.
Stich together these audio segments to get an audio file which has the modifies amplitudes at certain frequencies at certain points in time with minimal side effects.
Now I'm mostly just beginning with Matlab and am looking for any relevant examples from which I can learn on how to do the above.
Also, some things are not yet clear to me regarding windowing and FFT.
For windowing. Am I correct in thinking that for the above example I can best window and overlap the short segments in such a way that by simply adding the windowed overlapping segments I get the original audio again? So for instance if I use triangular windowing with 50% overlap on both sides, that I will get the original audio back once I stitch these segments together again? Are there other windows that will work in this way? (for instance Hann?) Or am I altogether thinking wrong on how to best use windowing for what I want to do?
For FFT. I understand that the first half of the resulting frequency bins are the bins with the relevant amplitudes (for FFT length of 512, bins 0 to 255 represent the relevant frequencies and contain their amplitudes, bin 256 contains the nyquist if I understood correctly). The second half of the bins (257 to 512), can I just ignore those when modifying the amplitude of the first half? For instance if I have a 1kHz sine wave, do the FFT, modify the amplitude of the bin that contains the 1kHz tone by dividing the amplitude in half, then do an IFFT. Will the endresult be that 1kHz sine reduced in amplitude by 6dB or am I missing something?
Many thanks for any help / pointers!

 Accepted Answer

You say "For FFT. I understand that the first half of the resulting frequency bins are the bins with the relevant amplitudes (for FFT length of 512, bins 0 to 255 represent the relevant frequencies and contain their amplitudes, bin 256 contains the nyquist if I understood correctly)."
That is not correct. For the FFT of a 512 point long segment, bin 0 is the scaled mean value of the signal. Its imaginary part will always be zero if the original signal is real. Bins 1-255 are the complex numbers representing half of the FFT. Let's call it the bottom half. We could also call it the positive frequency part of the FFT. Bin 256 contains the scaled amplitude of the component sinusoid at the Nyquist freuency (). Its imaginary part will always be = 0, for any FFT with an even number of samples. Bins 257-511 are the other half ("top half", or negative frequency part) of the FFT. If the original signal is real, and they are, then the top half values will be the complex conjugates of the values in bins 1-255, where bin 257=conj(bin255), bin 258=conj(bin(254), ..., bin 511=conj(bin 1). Whtavever you do on the "low half" you must also do to the corresponding element on the "top half". Before you do the inverse FFT, be sure that the top half of the modified FFT is the complex conjugate of the flipped-around bottom half. If that is not true, then you will get complex numbers for the inverse FFT, and that indicates an error.
The other part of your question is: May I segment the signal, do FFTs, manipulate the FFTs, invert the manipulated FFTs, and paste the results back together, to get a signal whose frequencies have been "shaped", as if with a grpahic equalizer? The answer is you may, but you will probably end up with glitches at the segment boundaries. Initially, the signal is smooth across the segment boundaries. If you do an FFT and inverse FFT of each segment, without mean or trend removal, and without any frequency adjustments, you can paste the inverse FFT segments together and get back the original signal exactly. But if you do mean or trend removal or other adjustment of particular frequencies, then the pasted-together signal will have glitches, or discontinuities, at the segment boundaries. This is true for bothe overlapping and non-overlapping segmentation.
Another way of understanding the issue is that the sampling of the signal in the frequency domain is different with segmented signals than with the original signal. You lose samples of the "in-between" frequencies, including the lowest frequencies. Example: Suppose the original signal is sampled at Fs=1000 Hz, for N=1000 samples. Then the frequencies of the FFT are 0, 0.001, 0.002, ..., 0.498, 0.499, 0.500 Hz. Now I divide it into 10 segments of duration Nseg=100 points each. The frequencies of the FFT of each segment are 0, 0.010, 0.020, ..., 0.480, 0.490, 0.500 Hz.

14 Comments

@Pythagorean, if you use tapered windows, then the glitches at the boundaries will be reduced. You will lose the power in the signal at frequencies lower than 1/Tseg, where Tseg is the segment duration. That happens because of the altered sampling in the frequency domain when you segment, as I epxlained above. For audio files with segment durations of 1/10 or a second or longer, this is not a problem, since we cannot hear frequencies of 10 Hz and below, and normal microphones do not pick up those low frequencies.
I recommend making and using simulated recordings as well as actual recordings to test your code. For example, make a file with a pure sinusoid at middle C. Apply your signal processing algorithm, and listen to the results, and plot the results.
@William Rose Thank you so much for taking the time to explain all this. I understand the things you've mentioned and you explanation has been of great help to me. Thank you again!
@William Rose Actually I do have one more question.
If bin 0 is the scaled mean value of the signal. Then if I modify the amplitude of certain bins (I've figured out how to do this through the Pythagorean theorem) then shouldn't I also modify bin 0 after this because the scaled mean value of the signal will change?
If I'm correct in thinking this. How do I calculate the new scaled mean value of the signal based on the new bin values, to put in bin 0 before ifft?
edit: Nevermind. I understand now this is the DC component. (I understood wrong before what scaled mean value of the signal meant)
@Pythagorean, OK, good, 'm glad you figured it out. Good luck!
@William Rose Thanks. Though just when I thought I had it figured out a new problem presents itself.
The bin for a certain frequency has two numbers. A real number and an imaginary number. From this we can calculate the complex magnitude by sqrt(a^2+bi^2).
I thought that the real number represented magnitude and the imaginary number represented phase. But apparently I'm incorrect about this as it cannot be the case. Because if I have a bin with a large imaginary number I cannot bring the complex magnitude to 0 by only modifying the real number of that bin and not modifying the imaginary number. For instance, a bin with real number 0.0000 and imaginary number 11.0000 will not get a complex magnitude of 0 but will get a minimum complex magnitude of 11 because sqrt(0^2+11^2) = 11.
So I don't know for sure where my understanding is going wrong. Am I understanding complex magnitude wrongly and should I not see this as representing the amplitude of the bin? (unlikely) Or am I wrong in assuming the complex number represents phase and do I indeed need to modify both the real number and the imaginary number to be able to fully modify the amplitude of a certain bin/frequency? (seems like this is the case based on what I guess now)
If this is indeed the case, then in order to fully modify the amplitude (complex magnitude) of each bin I have to modify both the real number and the imaginary number of that bin. But I have no idea on how much to modify each (though perhaps by running various signals through fft I can get an idea). If it's easy to explain on how to do this / what's behind it then I would be avery thankful if you could help once more. If it's a long complex story, any pointers/links on what I should read up on would be great.
This fft thing has gotten a bit complex all of a sudden but I must get it right so I will not quit untill I do :)
@Pythagorean, This is why I review complex number arithmetic near the beginning when I teach signal processing.
The real part (A) of the FFT at a given frequency is the amplitude of the cosine wave, and the imaginary part (B) is the ampltude of the sine wave, at that frequency. When you add the cosine and the sine, you get the total sinusoid at that frequency. It has amplitude C and (negative) phase ϕ, where C and ϕ are related to A and B by the equations below:
.
. (= arctangent)
Another way of saying it with equations is as follows:
is equal to
,
when A,B, C, and ϕ are related by the equations above.
@William Rose Aaah now I get it! Never read this anywhere before. There sure are a lot of confusing descriptions of fft on the web :)
But this I can work with :) Easy now to reduce the amplitude while keeping the phase the same. Thanks again! You've saved me a lot of headaches.
@Pythagorean, I'm glad you get it now. You are right: there are a lot of confusing descriptions of FFTs.
Note that the transforms between (A,B) and (C,ϕ) are identical to the Cartesian-polar coordinate transforms between (x,y) and (r,θ). You probably know that complex numbers are often plotted on a "complex plane" graph. A number on the complex plane can be decribed by its magnitude (i.e. radius) and phase, or by its real and imaginary parts. The arrays for FFTs use the real,imaginary representation.
@William Rose No I just learned about imaginary numbers for the first time by looking at fft on wikipedia and seing a youtube video on fft :) I'm a highschool dropout with little math knowledge and almost no Matlab experience so it's a bit challenging for me. But I'm used to diving in things I don't know anything about yet and always succeed in the end so I'm not worried. I know how to make my algorithm work now (and I understand the limitations of fft), just some minor scripting learning/writing left to do now.
@Pythagorean, Wow, I am very impressed with your knowledge! What you are doing is very advanced. You have an amazing ability to learn quickly and from non-traditional sources. You are not afraid to ask questions, try things, make mistakes, fix them, and continue forward. That is great. Many people, including me, are not so good at that.
Thank you. That may be the best compliment I've ever received :)
And I can compliment you the same for your teaching ability :)
The code is working now btw. Below in fig1 a 6kHz and 9kHz sine, fig2 a 6kHz cosine, fig3 is the amplitude of the cosine substracted from fig1 by modifying the fft bin amplitudes leaving only the 9kHz sine as it is not present in fig2, and fig4 is what it looks like if I generate the 9kHz sine all by itself.
The fft bin amplitude modification method gives practically the same result in this case, only a very slight decrease in the 'side-lobe' / frequency bleeding amplitude but so little that it's of no worry in practice. Perfect result as far as my application is concerned (which is a long story I won't bore you with).
Thanks again!
@Pythagorean, THank you for your kind remark.
Your plots look great.
The x-axis values do not match the frequencies you cited (6 kHz, 9 kHz). Are you familiar with the formula for how the bin numbr (horiz. axis) relates to actual frequency in Hz?
If fs=sampling rate in Hz, and N=number of samples in signal x(i), and y=fft(x), then y is a vector of complex numbers with N elements. The vector of frequencies corresponing to the elements of y is
f=fs*(0:N-1)/N;
About half the frequencies in vector f are higher than the Nyquist frequency (). Those are the "top half" frequencies of the fft. An alternate name for Nyquist frequency is "folding frequency", since the spectrum above is the folded-over copy the spectrum from 0 to .
@William Rose Ah yes thanks. I understand how to plot the right frequencies but this isn't relevant to my plugin so I was lazy with the plots.
Still experimenting with different ways of tapering / windowing regarding precise frequency resolution vs spectral leakage. And the amount of resolution I actually need for my algorithm to work best.
The fft I'm doing on the band outputs of a linear phase perfect reconstruction filter bank (already made this in Matlab). So I can do shorter fft's on the higher frequency bands and longer fft's on the lower frequency bands. The endresult should be good enough frequency resolution and good enough time resolution. Trying to find the optimal balance for audio processing results.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!