This project provides an efficient implementation of the Fast Fourier Transform (FFT) algorithm using the Cooley-Tukey radix-2 algorithm. The code supports both complex and real-valued sequences. Additionally, it includes functions for performing the bit-reversal operation and inverse FFT.
- Computes the FFT of both complex and real-valued sequences.
- Uses a radix-2 FFT algorithm for efficient computation.
- Includes a bit-reversal function to reorder sequence indices.
- Inverse FFT function to compute the inverse transform.
- C++11 or later for code compatibility.
- OpenMP for parallel processing of FFT computations.
The Fast Fourier Transform (FFT) is an efficient algorithm to compute the Discrete Fourier Transform (DFT) of a sequence, or its inverse. The DFT is a fundamental operation in signal processing and many other fields such as image analysis, audio processing, and communications.
The DFT of a sequence
Where:
-
$X[k]$ are the frequency domain components, -
$x[n]$ are the time domain samples, -
$i$ is the imaginary unit ($i = \sqrt{-1}$ ), -
$N$ is the size of the sequence
The Cooley-Tukey Radix-2 algorithm is the most commonly used FFT algorithm. It is used when
Using the split approach, the DFT of a sequence
Where we just rearranged the terms. Rewriting the exponential terms:
Now we can call
So we reduced the computation of a DTF of size
If
For each step of the FFT, the size of the problem is halved, resulting in a logarithmic depth of recursion. This allows us to compute the DFT in
If
We can use this property to optimize the computation of the DFT in case of real-valued sequences.
Let
We know that:
Since the DFT is linear:
Knowing that
So we can write
If we now take
Using these DFTs, we can then compute the DFT of
The Inverse DFT is used to recover the original time-domain sequence from its frequency-domain representation. The formula for the IDFT has the same structure of the one for the DFT, minus the exponent sign.
Mathematically, the inverse DFT is defined as:
It is obvious then that the same reasoning can be applied also to compute the Inverse DFT.
In this implementation, we chose an in-place iterative approach for the Fast Fourier Transform (FFT). While an additional array is used for storing the output to maintain separation, the algorithm could theoretically overwrite the input array itself during computation, achieving an in-place solution.
The algorithm iteratively computes the FFT of the sequence by progressively breaking it down into smaller subsequences. Each iteration processes the sequence and updates it based on the FFT computation. By avoiding recursion, the iterative approach reduces the function call overhead, making it faster and more efficient. Although we use an auxiliary array to hold the results for clarity and separation, this is purely for organizational purposes. It is possible to modify the input sequence directly during computation (i.e., overwrite it), which would further reduce memory usage, making the implementation truly in-place. However, for clarity, we opted to keep the input and output separate during the FFT calculation.
The Cooley-Tukey Radix-2 FFT algorithm requires the input sequence to be rearranged according to bit-reversal order. To understand this, consider the sequence indices as binary numbers. At each stage of the algorithm, the sequence is split into two sub-sequences based on whether the indices are even or odd.
In binary terms, this corresponds to right-shifting the index positions and separating those ending in 0 from those ending in 1. By recursively applying this process, we observe that the indices are reordered in a way that reflects the bit-reversal of their original positions.
Ultimately, when the recursion reaches the base case, the sequence is fully rearranged according to the reversed binary order of the indices.
Let's understand it with an example: Let's take a sequence of lenght
For the first step of the algorithm we should consider the two sequences:
Now considering just the first sub-sequence with the successive step we would have:
so we should rearrange our sequence
The new positions of the elements in
To optimize the DFT computation, we can avoid calculating the exponential of a complex number at each iteration. Instead, we precompute all sine and cosine values before the iterations begin, significantly speeding up the process.
This is achieved by creating an array
During the loop, the corresponding sine value is
where
Parallelism was implemented using the OpenMP library by applying parallelization directives to for loops. The inherently parallelizable sections include the bit-reversal computation, the sine values precomputation and finally the calculation of DFT values for each subproblem of a specified size.