|
1 | 1 | # PHFT
|
2 | 2 |
|
3 | 3 | **PH**ast**FT** (PHFT) is a high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in pure
|
4 |
| -and |
5 |
| -safe Rust. |
| 4 | +and safe Rust. It is the fastest pure-Rust FFT library according to our benchmarks. |
6 | 5 |
|
7 |
| -What's with the name? Great question! |
| 6 | +## Features |
8 | 7 |
|
9 |
| -The name, **PHFT**, is derived from the implementation of the |
10 |
| -[Quantum Fourier Transform](https://en.wikipedia.org/wiki/Quantum_Fourier_transform) (QFT). Namely, the |
11 |
| -[quantum circuit implementation of QFT](https://en.wikipedia.org/wiki/Quantum_Fourier_transform#Circuit_implementation) |
12 |
| -consists of the **P**hase gates and **H**adamard gates. Hence, **PH**ast**FT**. |
| 8 | +- Takes advantage of latest CPU features up to and including AVX-512, but performs well even without them. |
| 9 | +- Zero `unsafe` code |
| 10 | +- Python bindings (via [PyO3](https://github.com/PyO3/pyo3)). |
| 11 | +- Optional parallelization of some steps to 2 threads (with even more parallelization planned). |
| 12 | +- Did we mention it is really fast?! |
13 | 13 |
|
14 |
| -In general, the FFT is equivalent to applying gates to all qubits in `[0, n)`. This approach creates to oppurtunity to |
15 |
| -leverage the same memory access patterns as high-performance quantum state simulator. This results in a fast and |
16 |
| -efficient FFT implementation that surpasses the performance of existing Rust FFT crates, including RustFFT. |
| 14 | +## Limitations |
17 | 15 |
|
18 |
| -## Features |
| 16 | + - No runtime CPU feature detection (yet). Right now achieving the highest performance requires compiling with `-C target-cpu=native` or [`cargo multivers`](https://github.com/ronnychevalier/cargo-multivers). |
| 17 | + - Requires nightly Rust compiler due to use of portable SIMD |
| 18 | + |
| 19 | +## How is it so fast? |
| 20 | + |
| 21 | +PHFT is designed around the capabilities and limitations of modern hardware (that is, anything made in the last 10 years or so). |
| 22 | + |
| 23 | +The two major bottlenecks in FFT are the **CPU cycles** and **memory accesses.** |
19 | 24 |
|
20 |
| -- Performance ... |
21 |
| -- Python bindings (via PyO3) ... |
22 |
| -- Safety ... |
| 25 | +We picked an FFT algorithm that maps well to modern CPUs. The implementation can make use of latest CPU features such as AVX-512, but performs well even without them. |
| 26 | + |
| 27 | +Our key insight for speeding up memory accesses is that FFT is equivalent to applying gates to all qubits in `[0, n)`. |
| 28 | +This creates to oppurtunity to leverage the same memory access patterns as a [high-performance quantum state simulator](https://github.com/QuState/spinoza). |
| 29 | + |
| 30 | +We also use the Cache-Optimal Bit Reveral Algorithm ([COBRA](https://csaws.cs.technion.ac.il/~itai/Courses/Cache/bit.pdf)) |
| 31 | +on large datasets and optionally run it on 2 parallel threads, accelerating it even further. |
| 32 | + |
| 33 | +All of this combined results in a fast and efficient FFT implementation that surpasses the performance of existing Rust FFT crates, |
| 34 | +including [RustFFT](https://crates.io/crates/rustfft/), on both large and small inputs and while using significantly less memory. |
23 | 35 |
|
24 | 36 | ## Getting Started
|
25 | 37 |
|
@@ -88,3 +100,10 @@ Finally, run:
|
88 | 100 | ```bash
|
89 | 101 | ./profile.sh
|
90 | 102 | ```
|
| 103 | + |
| 104 | +## What's with the name? |
| 105 | + |
| 106 | +The name, **PHFT**, is derived from the implementation of the |
| 107 | +[Quantum Fourier Transform](https://en.wikipedia.org/wiki/Quantum_Fourier_transform) (QFT). Namely, the |
| 108 | +[quantum circuit implementation of QFT](https://en.wikipedia.org/wiki/Quantum_Fourier_transform#Circuit_implementation) |
| 109 | +consists of the **P**hase gates and **H**adamard gates. Hence, **PH**ast**FT**. |
0 commit comments