Not tested on real hardware. Do not use unless you really know what you are doing.
69378 cycles per Curve25519 multiplication (as of commit d0a51a88b0) using 15 17-bit multipliers and a bunch of 42-bit adders. Intended to take constant time.
Public Domain.