Skip to content

thedatumorg/ATLAS

Repository files navigation

ATLAS

ATLAS: The Landscape of Approximate Similarity Search — Two Decades of Algorithmic Advances

📄 Overview

Similarity search methods enable efficient retrieval of vectors similar to a given query and play a central role in a wide range of applications. Among the variants, approximate similarity search methods offer high accuracy with substantially improved efficiency over exact methods. Despite substantial progress, existing studies suffer from major limitations: (i) omission of key algorithmic families; (ii) overlooking recent methodological advances; (iii) lack of rigorous statistical validation; and (iv) evaluation on limited datasets reflecting modern AI applications. To address these gaps, we introduce ATLAS, the most comprehensive benchmark of approximate nearest neighbor search methods to date. Specifically, our contributions are fourfold: (i) a systematic review of five major algorithmic categories; (ii) a large-scale evaluation of 45 methods across 58 datasets; (iii) the introduction of a new measure that captures latency over a recall range, offering a threshold-free, unbiased assessment of query efficiency; and (iv) statistical analysis to ensure the robustness of the conclusions.Our findings reveal seven key insights: (i) modern quantization-based methods achieve query efficiency comparable to graph-based algorithms while requiring substantially less memory; (ii) across four categories, previously unreported top performers emerge, with two showing statistically significant improvements; (iii) relative algorithm rankings exhibit variation across data modalities and vector dimensionality; (iv) parameter settings do not consistently transfer across datasets, and performance is highly sensitive to data characteristics; (v) hardware-accelerated methods exhibit architecture-dependent performance; (vi) performance is highly sensitive to implementation quality; and (vii) both indexing strategies and hardware acceleration yield substantial throughput gains at the cost of reduced accuracy. Collectively, these findings sharpen our understanding of the ANNS landscape, uncover previously unexplored behaviors, and guide future research.

🗄️ Dataset

Due to limitations in the upload size on GitHub, we host the datasets at a different location. Please download the datasets using the following links

⚙️ Algorithms

Method Folder
SymphonyQG SymphonyQG
VSAG vsag
SVS-LVQ ScalableVectorSearch-0.0.2
SVS ScalableVectorSearch-0.0.2
GLASS-HNSW pyglass
GLASS-NSG pyglass
FLATNAV flatnav
FINGER FINGER
DISKANN DiskANN
NSSG SSG
HCNNG WEAVESS
NSG faiss-1.7.3
DPG WEAVESS
HNSW-PECOS FINGER
HNSW faiss-1.7.3
EFANNA WEAVESS
BOLT bolt
VAQ VAQ
PQFS faiss-1.7.3
OPQ faiss-1.7.3
ITQ faiss-1.7.3
PQ faiss-1.7.3
DB-LSH DB-LSH
PM-LSH PM-LSH
R2-LSH R2LSH
LCCS-LSH lccs-lsh
AWS-LSH aws_alsh
QALSH QALSH_Mem
C2LSH lccs-lsh
SPTAG-KDT SPTAG
SPTAG-BKT SPTAG
SCANN scann
MRPT mrpt
ANNOY annoy
FLANN flann
KD-TREE flann
VP_TREE nmslib
DUMPY-FUZZY Dumpy
DUMPY Dumpy
ISAX2+ lernaean-hydra
DS_TREE lernaean-hydra
VA+FILE lernaean-hydra
IVFPQ faiss-1.7.3
IMI-OPQ faiss-1.7.3
IMI-PQ faiss-1.7.3
RaBitQ RaBitQ-Library

✉️ Contact

If you have any questions or suggestions, feel free to contact:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •