Skip to content

Pure python implementation of product quantization for nearest neighbor search

License

Notifications You must be signed in to change notification settings

matsui528/nanopq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

26c227f · Nov 2, 2023
Oct 26, 2023
Nov 2, 2023
Oct 26, 2023
Sep 7, 2023
Jul 28, 2018
Nov 2, 2023
Jul 18, 2018
Jul 23, 2018
Mar 23, 2021
Oct 26, 2023
Sep 7, 2023
Mar 23, 2021

Repository files navigation

nanopq

Build Status Documentation Status PyPI version Downloads

Nano Product Quantization (nanopq): a vanilla implementation of Product Quantization (PQ) and Optimized Product Quantization (OPQ) written in pure python without any third party dependencies.

Installing

You can install the package via pip. This library works with Python 3.5+ on linux.

pip install nanopq

Example

import nanopq
import numpy as np

N, Nt, D = 10000, 2000, 128
X = np.random.random((N, D)).astype(np.float32)  # 10,000 128-dim vectors to be indexed
Xt = np.random.random((Nt, D)).astype(np.float32)  # 2,000 128-dim vectors for training
query = np.random.random((D,)).astype(np.float32)  # a 128-dim query vector

# Instantiate with M=8 sub-spaces
pq = nanopq.PQ(M=8)

# Train codewords
pq.fit(Xt)

# Encode to PQ-codes
X_code = pq.encode(X)  # (10000, 8) with dtype=np.uint8

# Results: create a distance table online, and compute Asymmetric Distance to each PQ-code 
dists = pq.dtable(query).adist(X_code)  # (10000, ) 

Author

Contributors

  • @Hiroshiba fixed a bug of importlib (#3)
  • @calvinmccarter implemented parametric initialization for OPQ (#14)
  • @de9uch1 exntended the interface to the faiss so that OPQ can be handled (#19)
  • @mpskex implemented (1) initialization of clustering and (2) dot-product for computation (#24)
  • @lsb fixed a typo (#26)

Reference