Filterbank is versatile but also tricky. Here, I’d like to share what I have learned from my many years of practice (mainly for audio processing). The math and design methods are from my paper. It is a straightforward time domain design framework that can cover most cases (DFT and DCT filterbanks, DWT and complex DWT, constraints like latency, symmetry and phase linearity, control on sidelobe tails, ...).
Topics:
MIRROR symmetry if latency is unconcerned
SAME symmetry if low latency is crucial
Symmetries and least squares reconstruction
Designs with lower aliasing do not necessarily performs better for tasks like AEC
Squeezing synthesis filter does not help low latency filterbank design
Replacing STFT with filterbank
Frequency shift on analytic signal only
Remove the systemic phase biases
DCT filterbank for tasks like AEC?
Optimization of discrete design freedoms
Exact phase linearity, a luxury or a must-have?
Fine balance between resolution and leakage
This script generates the following plot illustrating what is a DCT-IV modulated filterbank.
From top to bottom, we have 1) the prototype filter; 2) the four cosine modulation sequences; 3) the four modulated filters, i.e., element-wise products between the prototype filter and modulation sequences; 4) frequency responses of the four modulated filters. We see that these four filters cover the whole frequency range nicely.
These examples illustrate a few special designs. The wavelet and complex wavelet examples also are good for explaining the concept of filterbank due to their simplicity.
Here, I mainly focus on the DFT modulated filterbanks. Still, all these modulated filterbanks share the same math, and only difference in modulation sequences. Notably,
- The periodicity of modulation sequences induces the polyphase structure.
- Trigonometric modulation series make FFT like fast algorithms possible.
MIRROR symmetry imposes constraint
MIRROR symmetry leads to the least aliasing and also the least squares solution for signal reconstruction, but brings two possible drawbacks:
- Generally it is not good for low latency design as here, latency+1 must equal filter length.
- Typically neither the analysis nor the synthesis filter has good linear phase property (surely, their combined phase is linear as the analysis-synthesis filter is symmetric).
SAME symmetry imposes constraint
I gradually increase filter length until overshoot, i.e., bumpy mainlobe, arises. Another strategy is to start from a large filter length, and then gradually increase
Setting symmetry=[0;0;0] releases all the design freedoms, and typically finds solutions with much lower aliasing. Lowering aliasing benefits certain applications like low, high or band pass filtering (LPF/HPF/BPF), sample-rate conversion (SRC), etc. However, symmetric design has nicer numerical properties and better fits other tasks like AEC. Another merit of the SAME symmetry is that both the analysis and synthesis filters must have approximately linear phase in the pass band as otherwise, nearly perfect reconstruction (NPR) is not possible. In the extreme case of latency+1 = filter length, the SAME symmetry leads to exactly linear phase prototype filters.
Symmetry is important due to the following numerical property:
- With the MIRROR symmetry, filterbank synthesis gives the least squares solution for fitting a given set of analysis results, whether consistent or not.
Synthesis with the SAME symmetry generally only approximates the least squares solution for fitting a given set of analysis results due to the phase linearity in the pass band, not necessarily the stop band. If the phase also is linear in the stop band, then the DFT filterbank has both the MIRROR and SAME symmetries.
To show this property, let's consider a STFT with analysis and synthesis prototype filters
The inverse STFT (ISTFT) on STFT results
By design,
In general, we can show that with the MIRROR symmetry,
One application of this property is signal reconstruction from magnitude spectra
with the Griffin-Lim algorithm, i.e., alternatively optimizing
Lower aliasing is a necessary, but not sufficient, condition for better AEC performance. This script shows what a naive time domain impulse looks like in the frequency domain. Intuitively, subband domain filters using designs with low aliasing but long prototype filters will need many taps to fit even simple time domain filters, causing slow convergence and high misadjustment. Symmetric and compact designs have better numerical condition number and are thus preferred.
Some manually designed low latency filterbanks have long and refined filters for analysis, and short and coarse filters for synthesis. Such a practice may not yield balanced and efficient designs as the analysis filters mainly focus on old samples, not the recent samples that are to be decomposed and re-synthesized. Aliasing caused by poor synthesis filters is another concern. Anyway, our time-frequency analysis cannot break the uncertainty principle The math is still the same. This script generates the following extremely low latency design suitable for applications like hearing aid. Note that since the oversampling ratio is high, I have halved the default mainlobe width by setting the cutoff frequency to
Also see the phase linearity discussions for more examples.
STFT is a special DFT filterbank with prototype filter length equalling FFT size. Except for certain time-frequency analysis requiring nonnegative windows, generally we can replace STFT with filterbanks without much concern. This code generates the following comparison results where both the filterbank and STFT are subject to the same latency, hop size and FFT size. Aliasings of the analysis (same for synthesis) filters of the STFT and filterbank are -14.5 dB and -35.7 dB, respectively. A virtually free design gain of 20+ dB!
The same argument holds for the modified DCT (MDCT) as well. MDCT is a special DCT-IV filterbank with prototype filter length equalling half of the period of modulation sequences. We can get significantly better designs by increasing the prototype filter length while keeping everything else the same.
SRC can be expensive and invokes extra latency. But, a time domain SRC may be unnecessary in many cases. This example shows how we can save a 16KHz
Actually, resampling in the frequency domain could be easier (with the help of a good FFT lib) than in the time domain when the conversion ratio is complicated, e.g., 44.1KHz
Frequency shift is a common operation to cut off the acoustic feedback in systems like hearing aid and public address (PA) systems. One possible choice is to shift the signals in the frequency domain. This practice may be improper as the synthesis filters are not shifted accordingly, and this misalignment may cause artifacts like beats even with shift of a few Hz. This code shows an alternative practice: first split the signal into two analytic parts; then shift the low frequency part less, and high frequency part more. For this example, we see that the absolute normalized cross correlation between input and output reduces to about zero by tens Hz of shifting while without any audible artifacts.
One possible neglect when dealing with the phases, say phase unwrapping, is to ignore the systemic or structural bias of the phases for frequency domain signals obtained by a DFT filterbank. I summarize these biases in the card below. Note that
This script compares measured against predicted phase differences to generate the results as below. They do match well.
The same script also demonstrates how to exploit these structural biases for accurate phase unwrapping. Removal of these biases lets us deal with the inherent transient properties of phases, and leads to consistently and significantly cleaner phase unwrapping across all the frequencies.
DCT filterbanks like the MDCT are widely used in subband codecs. Occasionally these critically decimated DCT filterbanks are used for other tasks like AEC. This may not work well as the aliasing due to downsampling is too high. Still, an oversampled DCT filterbank works for AEC as well as any good oversampled DFT filterbanks. This script compares a critically decimated and an oversampled DCT filterbanks to produce the following plot. The critically decimated one clearly suffers a lot from aliasing, while the oversampled one does not.
Special designs: wavelet, complex wavelet, quadrature mirror filter (QMF), customized, nonsubsampled and nonuniform filterbanks
Discrete wavelet (DWT) and QMF are DFT filterbanks with
DWT is sensitive to signal shift. By replacing the two modulation sequences in DWT with
Nonsubsampled filterbanks are the ones with
Nonuniform filterbanks can be built from uniform ones by either splitting certain bands, e.g., the wavelet packet decomposition (WPD), or merging bands under certain conditions, e.g., shown in this nonuniform design example.
The same method also supports customized designs. See the nonuniform design example, where the modulation sequences are that of DFT's shifted by
The QMF examples generates the following sample designs:
The complex wavelet generate this sample design:
The nonuniform example generates this sample design:
Filterbank design is an optimization problem loaded with local minima. We are almost sure to find the global minima for the continuous design freedoms by starting the filter coefficients from diverse random initial guesses. But, those discrete design freedoms are more tricky to optimize.
Certain discrete design freedoms, say the phase shift pair
Let's consider a practical design example where the block size is
A linear phase filterbank suggests that all the modulated analysis and synthesis filters have linear phases, i.e., either symmetric or anti-symmetric. Hence, a linear phase prototype filter does not always imply a linear phase filterbank. For example, the MDCT, with either the rectangular or sine windows, is not a linear phase DCT filterbank, although the prototype filters, i.e., the rectangular or sine windows, are symmetric. Nevertheless, a DFT filterbank with a linear phase prototype filter is always a linear phase filterbank.
For audio processing, our human ears are deaf to absolute phases. Thus, phase linearity might not be a must-have. For image processing, noticeable phase distortion is not acceptable. But, image processing is acausal as all the neighbors of any pixel are available for filtering. Thus, filters with large latency but linear phase are affordable and good to have there.
This script generates the following comparison results to show how it looks like to impose linear phase and low latency constraints at the same time. Here, I set symmetry=[0;1;1] to force both the analysis and synthesis filters to have linear phases, and different filter lengths and weights to get high resolution analysis filters and low resolution synthesis filters. The performance gap between linear phase and SAME symmetry designs is huge. Not a surprise. The SAME symmetry only forces phase linearity in the pass band, while symmetry=[0;1;1] forces the phase to be linear in both the pass and stop bands, thus wasteful.
Exact phase linearity is such a strong constraint and sometimes cannot coexist with the perfect reconstruction (PR) condition. Two examples:
- By revisiting the frequency domain PR condition, e.g., here, we see that a critically decimated linear phase QMF filter with odd filter lengths (or even order) will have gain
$0$ at$\omega=\pi/2$ , which contradicts the PR condition. Still, odd or even order linear phase oversampled QMF always exists. - For a DCT-IV filterbank with filter length
$T$ , the first cosine modulation sequences can be symmetric or anti-symmetric with a proper shift$i$ , but always has zero mean. Then with a symmetric prototype filter, the first modulated filter must have zero DC gain, and thus fails the PR condition at$\omega=0$ .
Nevertheless, it is trivial to have approximately linear phase in the pass band for either the DFT or DCT filterbank. Luckily, that could be sufficient in most cases.
In the same way as balancing the spectral resolution and leakage in window designs, say Hann vs Hamming, we also provide such choices for filterbank design. The following card summarizes the common stopband losses for filterbank design. One also can come up with his own customized loss. The default choice is momentum
This script generates the following comparison results showing the tradeoff between resolution and leakage. The default setting, i.e., minimizing the 0th momentum, has the narrowest mainlobe, but largest sidelobe tails. The design minimizing the 2nd momentum, i.e., the momentum of inertia, has the widest mainlobe, but significantly lower sidelobe tails.
1, Periodic sequences modulated filter banks, IEEE SPL, 2018. (Notation and math are from this paper. A typo in Table I:
2, The matlab filterbank design code is slightly adapted from the original version here. I also have Tensorflow and Pytorch implementations of the DFT modulated filterbank, not polished but works, useful for end2end differentiable DSP designs.