diff --git a/README.md b/README.md index cc0c3ad..e9c70b5 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,37 @@ # PHFT **PH**ast**FT** (PHFT) is a high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in pure -and -safe Rust. +and safe Rust. It is the fastest pure-Rust FFT library according to our benchmarks. -What's with the name? Great question! +## Features -The name, **PHFT**, is derived from the implementation of the -[Quantum Fourier Transform](https://en.wikipedia.org/wiki/Quantum_Fourier_transform) (QFT). Namely, the -[quantum circuit implementation of QFT](https://en.wikipedia.org/wiki/Quantum_Fourier_transform#Circuit_implementation) -consists of the **P**hase gates and **H**adamard gates. Hence, **PH**ast**FT**. +- Takes advantage of latest CPU features up to and including AVX-512, but performs well even without them. +- Zero `unsafe` code +- Python bindings (via [PyO3](https://github.com/PyO3/pyo3)). +- Optional parallelization of some steps to 2 threads (with even more parallelization planned). +- Did we mention it is really fast?! -In general, the FFT is equivalent to applying gates to all qubits in `[0, n)`. This approach creates to oppurtunity to -leverage the same memory access patterns as high-performance quantum state simulator. This results in a fast and -efficient FFT implementation that surpasses the performance of existing Rust FFT crates, including RustFFT. +## Limitations -## Features + - No runtime CPU feature detection (yet). Right now achieving the highest performance requires compiling with `-C target-cpu=native` or [`cargo multivers`](https://github.com/ronnychevalier/cargo-multivers). + - Requires nightly Rust compiler due to use of portable SIMD + +## How is it so fast? + +PHFT is designed around the capabilities and limitations of modern hardware (that is, anything made in the last 10 years or so). + +The two major bottlenecks in FFT are the **CPU cycles** and **memory accesses.** -- Performance ... -- Python bindings (via PyO3) ... -- Safety ... +We picked an FFT algorithm that maps well to modern CPUs. The implementation can make use of latest CPU features such as AVX-512, but performs well even without them. + +Our key insight for speeding up memory accesses is that FFT is equivalent to applying gates to all qubits in `[0, n)`. +This creates to oppurtunity to leverage the same memory access patterns as a [high-performance quantum state simulator](https://github.com/QuState/spinoza). + +We also use the Cache-Optimal Bit Reveral Algorithm ([COBRA](https://csaws.cs.technion.ac.il/~itai/Courses/Cache/bit.pdf)) +on large datasets and optionally run it on 2 parallel threads, accelerating it even further. + +All of this combined results in a fast and efficient FFT implementation that surpasses the performance of existing Rust FFT crates, +including [RustFFT](https://crates.io/crates/rustfft/), on both large and small inputs and while using significantly less memory. ## Getting Started @@ -88,3 +100,10 @@ Finally, run: ```bash ./profile.sh ``` + +## What's with the name? + +The name, **PHFT**, is derived from the implementation of the +[Quantum Fourier Transform](https://en.wikipedia.org/wiki/Quantum_Fourier_transform) (QFT). Namely, the +[quantum circuit implementation of QFT](https://en.wikipedia.org/wiki/Quantum_Fourier_transform#Circuit_implementation) +consists of the **P**hase gates and **H**adamard gates. Hence, **PH**ast**FT**. \ No newline at end of file