Skip to content

Commit bdc185f

Browse files
committed
Rewrite README
1 parent 666b7d6 commit bdc185f

File tree

1 file changed

+33
-14
lines changed

1 file changed

+33
-14
lines changed

README.md

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,37 @@
11
# PHFT
22

33
**PH**ast**FT** (PHFT) is a high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in pure
4-
and
5-
safe Rust.
4+
and safe Rust. It is the fastest pure-Rust FFT library according to our benchmarks.
65

7-
What's with the name? Great question!
6+
## Features
87

9-
The name, **PHFT**, is derived from the implementation of the
10-
[Quantum Fourier Transform](https://en.wikipedia.org/wiki/Quantum_Fourier_transform) (QFT). Namely, the
11-
[quantum circuit implementation of QFT](https://en.wikipedia.org/wiki/Quantum_Fourier_transform#Circuit_implementation)
12-
consists of the **P**hase gates and **H**adamard gates. Hence, **PH**ast**FT**.
8+
- Takes advantage of latest CPU features up to and including AVX-512, but performs well even without them.
9+
- Zero `unsafe` code
10+
- Python bindings (via [PyO3](https://github.com/PyO3/pyo3)).
11+
- Optional parallelization of some steps to 2 threads (with even more parallelization planned).
12+
- Did we mention it is really fast?!
1313

14-
In general, the FFT is equivalent to applying gates to all qubits in `[0, n)`. This approach creates to oppurtunity to
15-
leverage the same memory access patterns as high-performance quantum state simulator. This results in a fast and
16-
efficient FFT implementation that surpasses the performance of existing Rust FFT crates, including RustFFT.
14+
## Limitations
1715

18-
## Features
16+
- No runtime CPU feature detection (yet). Right now achieving the highest performance requires compiling with `-C target-cpu=native` or [`cargo multivers`](https://github.com/ronnychevalier/cargo-multivers).
17+
- Requires nightly Rust compiler due to use of portable SIMD
18+
19+
## How is it so fast?
20+
21+
PHFT is designed around the capabilities and limitations of modern hardware (that is, anything made in the last 10 years or so).
22+
23+
The two major bottlenecks in FFT are the **CPU cycles** and **memory accesses.**
1924

20-
- Performance ...
21-
- Python bindings (via PyO3) ...
22-
- Safety ...
25+
We picked an FFT algorithm that maps well to modern CPUs. The implementation can make use of latest CPU features such as AVX-512, but performs well even without them.
26+
27+
Our key insight for speeding up memory accesses is that FFT is equivalent to applying gates to all qubits in `[0, n)`.
28+
This creates to oppurtunity to leverage the same memory access patterns as a [high-performance quantum state simulator](https://github.com/QuState/spinoza).
29+
30+
We also use the Cache-Optimal Bit Reveral Algorithm ([COBRA](https://csaws.cs.technion.ac.il/~itai/Courses/Cache/bit.pdf))
31+
on large datasets and optionally run it on 2 parallel threads, accelerating it even further.
32+
33+
All of this combined results in a fast and efficient FFT implementation that surpasses the performance of existing Rust FFT crates,
34+
including [RustFFT](https://crates.io/crates/rustfft/), on both large and small inputs and while using significantly less memory.
2335

2436
## Getting Started
2537

@@ -88,3 +100,10 @@ Finally, run:
88100
```bash
89101
./profile.sh
90102
```
103+
104+
## What's with the name?
105+
106+
The name, **PHFT**, is derived from the implementation of the
107+
[Quantum Fourier Transform](https://en.wikipedia.org/wiki/Quantum_Fourier_transform) (QFT). Namely, the
108+
[quantum circuit implementation of QFT](https://en.wikipedia.org/wiki/Quantum_Fourier_transform#Circuit_implementation)
109+
consists of the **P**hase gates and **H**adamard gates. Hence, **PH**ast**FT**.

0 commit comments

Comments
 (0)