Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
Shnatsel committed Feb 3, 2024
1 parent 2d1b905 commit 82b2757
Showing 1 changed file with 16 additions and 8 deletions.
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,31 +4,39 @@
# PhastFT

PhastFT is a high-performance, "quantum-inspired" Fast Fourier
Transform (FFT) library written in pure and safe Rust. It is the fastest
pure-Rust FFT library according to our benchmarks.
Transform (FFT) library written in pure Rust.
Despite its simplicity, it is competitive with and often outperforms
the fastest Rust FFT libraries, including [RustFFT](https://crates.io/crates/rustfft/).

## Features

- Takes advantage of latest CPU features up to and including AVX-512, but performs well even without them.
- Simple implementation using a single, general-purpose FFT algorithm.
- Zero `unsafe` code
- Python bindings (via [PyO3](https://github.com/PyO3/pyo3)).
- Simple implementation using a single, general-purpose FFT algorithm and no costly "planning" step
- Optional parallelization of some steps to 2 threads (with even more planned).
- Takes advantage of latest CPU features up to and including AVX-512, but performs well even without them.
- Optional parallelization of some steps to 2 threads (with even more planned)
- 2x lower memory usage than [RustFFT](https://crates.io/crates/rustfft/)
- Python bindings (via [PyO3](https://github.com/PyO3/pyo3))

## Limitations

- No runtime CPU feature detection (yet). Right now achieving the highest performance requires compiling
with `-C target-cpu=native` or [`cargo multivers`](https://github.com/ronnychevalier/cargo-multivers).
- Requires nightly Rust compiler due to use of portable SIMD

## Planned features

- Runtime CPU feature detection
- More multi-threading
- More work on cache-optimal FFT

## How is it so fast?

PhastFT is designed around the capabilities and limitations of modern hardware (that is, anything made in the last 10
years or so).

The two major bottlenecks in FFT are the **CPU cycles** and **memory accesses.**

We picked an FFT algorithm that maps well to modern CPUs. The implementation can make use of latest CPU features such as
We picked an efficient, general-purpose FFT algorithm. Our implementation can make use of latest CPU features such as
AVX-512, but performs well even without them.

Our key insight for speeding up memory accesses is that FFT is equivalent to applying gates to all qubits in `[0, n)`.
Expand All @@ -41,7 +49,7 @@ on large datasets and optionally run it on 2 parallel threads, accelerating it e

All of this combined results in a fast and efficient FFT implementation that surpasses the performance of existing Rust
FFT crates,
including [RustFFT](https://crates.io/crates/rustfft/), on both large and small inputs and while using significantly
including [RustFFT](https://crates.io/crates/rustfft/) on large inputs and while using significantly
less memory.

## Quickstart
Expand Down

0 comments on commit 82b2757

Please sign in to comment.