## Nonstationary Gabor Frames and the Constant-Q Transform

Nonstationary Gabor frames enable you to implement time-adaptive or frequency-adaptive analysis of signals. The functions `cqt`

and `icqt`

use nonstationary Gabor frames to obtain a constant-Q (frequency-adaptive) transform (CQT) of a signal. A notable strength of nonstationary Gabor frames is that they enable the construction of stable inverses, yielding perfect reconstruction.

The theory of nonstationary Gabor transforms (NSGTs) was introduced by Jaillet [1] and Balazs, Dörfler, Jaillet, Holighaus, and Velasco [2]. The theory enables efficient implementations of NGSTs using FFT-based methods. Dörfler, Holighaus, Grill, and Velasco [3], [4] develop a framework for an efficient, perfectly invertible CQT. The algorithms in [3], [4] implement a phase-locked version of the CQT that does not preserve the same phases that would be obtained by naïve convolution. In [5], Schörkhuber, Klapuri, Holighaus, and Dörfler develop efficient algorithms for the CQT and inverse CQT that do mimic the coefficients obtained by naïve convolution. The Large Time-Frequency Analysis Toolbox [6] provides an extensive set of algorithms for nonstationary Gabor analysis and synthesis.

In standard Gabor analysis, a window of fixed size tiles the time-frequency plane. A nonstationary Gabor frame is a collection of window functions of various sizes that are used to tile the time-frequency plane. Wavelet analysis tiles the time-frequency plane in a similar manner. You have the flexibility to change the sampling density in time or frequency. Nonstationary Gabor frames are useful in areas such as audio signal processing, where fixed-sized time-frequency windows are not optimal. Unlike the short-time Fourier transform, the windows used in the constant-Q transform have adaptable bandwidth and sampling density. In frequency space, the windows are centered at logarithmically spaced center frequencies.

### Decomposing the Time-Frequency Plane

The Fourier transform of *f(t)* is the correlation of *f(t)* with *e ^{j ω t}*:

$$F(\omega )={\displaystyle {\int}_{-\infty}^{\infty}f}(t){e}^{-j\omega t}dt.$$

Since *e ^{j ω t}* does not have compact support, the Fourier transform is not an ideal choice for studying nonstationary signals. If the frequency content of a signal changes over time, the Fourier transform does not capture what those changes are or when those changes occur. The partition of the time-frequency plane shown here represents this Fourier transform behavior.

To perform a time-frequency analysis of a nonstationary signal *f(t)*, use a window function $$g(t)$$ that is:

Even and real-valued.

Effectively nonzero over only a finite interval.

Has norm equal to one.

The Fourier transform of $$g(t)$$ is centered at zero and is lowpass.

Slide the window $$g(t)$$ over *f(t)* and take the Fourier transform of the result:

$$SF(u,\zeta )={\displaystyle \int f}(t)g(t-u){e}^{-j\text{\hspace{0.05em}}\zeta \text{\hspace{0.05em}}t}dt.$$

Correlating *f(t)* with the Gabor atoms $$g(t-u){e}^{j\zeta t}$$ is standard Gabor analysis. By varying *u*, you consider only values of *f(t)* near time *u*. The support of $$g(t)$$ determines the size of the neighborhood near time *u*. The Fourier transform of $${g}_{u,\zeta}(t)=g(t-u){e}^{\zeta t}$$ is the translation by ζ of the Fourier transform of $$g(t)$$ and is given by

$${\widehat{g}}_{u,\zeta}(\omega )={e}^{-(\omega -\zeta )}\widehat{g}(\omega -\zeta ).$$

The energy concentration of $${\widehat{g}}_{u,\zeta}(\omega )$$ has variance σ_{ω} and is centered at ζ. If the window, $${g}_{u,\zeta}(t)=g(t-u){e}^{\zeta t}$$, shifts on a regular grid, the Fourier transform of the product of the
shifted window and f(t) is the short-time Fourier transform (STFT). The STFT tiling of the
time-frequency plane can be represented as a grid of boxes, each centered at (*u*, ζ):

The set of functions $$\left\{{g}_{u,\zeta}\right\}$$ is known as a *Gabor frame*. The elements of this set are called *Gabor atoms*. A frame is a set of functions, *{h _{k}(t)}*, that satisfy the following condition: there exist constants 0 < A ≤ B < ∞ such that for any function

*f(t)*,

$$A\Vert f{\Vert}^{2}\le {\Sigma}_{k}|\langle f,{h}_{k}\rangle {|}^{2}\le B\Vert f{\Vert}^{2}.$$

The energy concentration of $$g(t)$$, in time, has variance σ_{t}. The energy concentration of $$\widehat{g}(\omega )$$, in frequency, has variance σ_{ω}. The energy concentration determines how well the window localizes the signal in time and frequency. By the time-frequency uncertainty principle, there is a limit as to how well you can simultaneously localize in both time and frequency domains, as indicated by

$${\sigma}_{t}{\sigma}_{\omega}\ge \frac{1}{2}.$$

Narrowing the window in one domain results in poorer localization in the other domain. Gabor showed that the area of the window is minimal when $$g(t)$$ is Gaussian.

### Constant-Q Transform

In the CQT, the bandwidth and sampling density in frequency are varied. The windows are constructed and applied directly in the frequency domain. Different windows have different center frequencies and bandwidths, but the ratio of the center frequency to bandwidth remains constant. Maintaining a constant ratio implies:

Resolution in time improves at higher frequencies.

Resolution in frequency improves at lower frequencies.

The time shifts for each window depend on the bandwidth, due to the uncertainty principle.

The CQT depends on:

The window functions

*g*are real-valued, even functions. In the frequency domain, the Fourier transform of_{k}*g*is defined on the interval,_{k}*[-Fs/2, Fs/2]*.The sampling rate, ζ

_{s}.The number of bins per octave,

*b*.The minimum and maximum frequencies, ζ

_{min}and ζ_{max}.

Choose a minimum frequency ζ_{min} and number of bins per octave *b*. Next, form a sequence of geometrically spaced frequencies,

ζ_{k} = ζ_{min} × 2^{k/b}

for *k = 0,...,K* where *K* is an integer such that ζ_{K} is the largest frequency strictly less than the Nyquist frequency ζ_{s}/2. The bandwidth at the *k*th frequency is set to Ω_{k} = ζ_{k+1}-ζ_{k-1}. Given this sampling, the ratio of the *k*th center frequency to the window bandwidth is independent of *k*:

Q = ζ_{k}/Δ_{k} = (2^{1/b}-2^{-1/b})^{-1}.

To ensure perfect reconstruction, the DC component and Nyquist frequency are prepended and appended, respectively, to the sequence.

*W*(ω) forms the window functions *g _{k}*.

*W*(ω) is a real-valued, even continuous function that is centered at 0, positive in the interval [-½,½], and 0 elsewhere.

*W*(ω) is translated to each center frequency ζ

_{k}then scaled. Evaluating a scaled and translated version of

*W*(ω) yields the filter coefficients g

_{k}[

*m*], given by

g_{k}[*m*] = *W*((*m* ζ_{s}/*L* - ζ_{k})/Ω_{k})

for *m = 0, …, L-1*, where *L* is the signal length. By default, `cqt`

uses the `'hann'`

window.

By the uncertainty principle, the size of the bandwidth constrains the value of the time shifts. To satisfy the frame inequality, the shift a_{k}of g_{k} must satisfy

a_{k} ≤ ζ_{k}/Ω_{k}.

As mentioned previously, the window is applied in the frequency domain. The filters, g_{k}, centered at ζ_{k}, are formed and applied to the Fourier transform of the signal. Taking the inverse transform obtains the constant-Q coefficients.

### References

[1] Jaillet, Florent. “Représentation et traitement temps-fréquence des signaux audionumériques pour des applications de design sonore.” Ph.D. dissertation, Université de la Méditerranée, Aix-Marseille II, 2005.

[2] Balazs, P., M. Dörfler, F.
Jaillet, N. Holighaus, and G. Velasco. “Theory, Implementation and Applications of
Nonstationary Gabor Frames.” *Journal of Computational and Applied
Mathematics* 236, no. 6 (October 2011): 1481–96.
https://doi.org/10.1016/j.cam.2011.09.011.

[3] Holighaus, Nicki, M. Dörfler, G. A.
Velasco, and T. Grill. “A Framework for Invertible, Real-Time Constant-Q
Transforms.” *IEEE Transactions on Audio, Speech, and Language
Processing* 21, no. 4 (April 2013): 775–85.
https://doi.org/10.1109/TASL.2012.2234114.

[4] Velasco, G. A., N. Holighaus, M. Dörfler, and T. Grill. "Constructing an invertible constant-Q transform with nonstationary Gabor frames." In *Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11)*. Paris, France: 2011.

[5] Schörkhuber, C., A. Klapuri, N. Holighaus, and M. Dörfler. "A MATLAB^{®} Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution." Submitted to the *AES 53rd International Conference on Semantic Audio*. London, UK: 2014.

[6] Průša, Z., P. L. Søndergaard, N.
Holighaus, C. Wiesmeyr, and P. Balazs. *The Large Time-Frequency Analysis
Toolbox 2.0*. Sound, Music, and Motion, Lecture Notes in Computer
Science 2014, pp 419–442. https://github.com/ltfat