How to write code for f0 extraction in noise signal using autocorrelation

7 views (last 30 days)
F0-ESTIMATION USING SHIFT-ACF
To estimate the time-varying F0 we first compute the shift- ACF for successive time frames of a speech signal y. For this,
we compute the spectrogram SG[y], where the j−th column
SG[y]:,j is obtained by computing the discrete Fourier trans- form of a suitably windowed version of the j-th time frame
(yjS, . . . , yjS+N−1) of length N extracted from y using step
size S. Then the spectral shift-ACF of type t is defined by
[y](s, j) := ACFt
computing the shift-ACF for each spectrogram column.
[SG[y]:,j ](s), i.e., by independently to start or end at each node. The resulting optimization prob-
Fig.3 shows the spectrogram (1) of a clean speech sig- nal (male speaker) of length 2.4 seconds taken from the Kiel
corpus [10]. For illustration, only frequencies up to 2 kHz
are shown. In the center (2), the type 100 spectral shift-ACF
is shown, where columns were postprocessed by normaliza- tion and thresholding by the median. The F0 is cleary visible
by sharp temporal trajectories between 130 and 190 Hz. For
comparison, (3) shows the type 0 spectral shift-ACF, corre- sponding to the classical ACF. Here, trajectories are more
blurred and significant energy is present at harmonic lags.
Now we extract significant time-varying F0 trajectories
from the spectral shift-ACF. First, a peak picking step is
performed. As F0 trajectories evolve in temporal direction,
this is done by successively considering each colum cj :=
SpACFt
[y]:,j . After thresholding cj by a smoothed, median-
filtered version, peaks are picked iteratively. Using a greedy
approach, in each step a maximum position is selected. In
subsequent iterations, the neigborhoods of already chosen po- sitions are ignored. In Fig. 4 (1), peaks extracted from a re- gion of our example in Fig. 3 (2) are shown as white circles.
For trajectory extraction we consider the set of m ex- tracted peaks as nodes in a graph. We then enforce paths
by connecting each node to exactly one successor node by
computing a bijection π : [1 : m] → [1 : m] such that the to- tal cost Pm
i=1 Ci,π(i) of connecting nodes is minimized. The
costs Ci,j of connecting node i to j are chosen to provide rea- sonable F0 trajectories: Ci,j is set to the Euclidean distance
between peaks i and j, where Ci,j := ∞ if peak i temporally
occurs after peak j. Furthermore, Ci,i := ∞ to prohibit 1-
cycles. By introducing additional dummy nodes at a suitable
maximum distance of each node, we furthermore allow a pathlem is a special case of a linear assignment problem (LAP)
which can be efficiently solved using, e.g., the algorithm pro- posed in [11]. A result of the path extraction for our running
example is shown in Fig. 4 (2).
Finally, paths which are too short or have only insignifi-
cant energy are discarded. For this, we use a trajectory sharp- ness measure such as in [1]. This measure, as illustrated
in Fig. 5, basically computes a logarithmic energy ratio be- tween an inner region Iτ around the estimated trajectory and
an outer region Oτ := O1
trajectories result in positive sharpness values. Fig. 5 shows
the sharpness measure evaluated for the finally resulting F0
trajectories in white color.
τ ∪ O2
. By construction, existing
τ

Answers (0)

Categories

Find more on Audio Processing Algorithm Design in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!