Raku package with distance functions implemented in C. Apple's Accelerate library is used if available.
The primary motivation for making this library is to have fast sorting and nearest neighbors computations over collections of LLM-embedding vectors.
Make a large (largish) collection of large vectors and find Euclidean distances over them:
use Math::DistanceFunctions::Native;
my @vecs = (^1000).map({ (^1000).map({1.rand}).cache.Array }).Array;
my @searchVector = (^1000).map({1.rand});
my $start = now;
my @dists = @vecs.map({ euclidean-distance($_, @searchVector)});
my $tend = now;
say "Total time of computing {@vecs.elems} distances: {round($tend - $start, 10 ** -6)} s";
say "Average time of a single distance computation: {($tend - $start) / @vecs.elems} s";
Use CArray
vectors instead:
use NativeCall;
my @cvecs = @vecs.map({ CArray[num64].new($_) });
my $cSearchVector = CArray[num64].new(@searchVector);
$start = now;
my @cdists = @cvecs.map({ euclidean-distance($_, $cSearchVector)});
$tend = now;
say "Total time of computing {@cvecs.elems} distances: {round($tend - $start, 10 ** -6)} s";
say "Average time of a single distance computation: {($tend - $start) / @cvecs.elems} s";
I.e., we get ≈ 200 times speed-up using CArray
vectors and the functions of this package.
The loading of this package automatically loads the (C-implemented) function edit-distance
of
"Math::DistanceFunctions::Edit".
Here is an example usage:
edit-distance('racoon', 'raccoon')