Skip to content

argmaxml/vecsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VecSim - A unified interface for similarity servers

A standard, light-weight interface to all popular similarity servers.

The problems we are trying to solve:

  1. Standard API - Different vector similarity servers have different APIs - so switching is not trivial.
  2. Identifiers - Some vector similarity servers support string IDs, some do not - we keep track of the mapping.
  3. Partitions - In most cases, pre-filtering is needed prior to querying, we abstract this concept away.
  4. Aggregations - In some cases, one item is being indexed to multiple vectors.

Supported engines:

  1. Scikit-learn, via NearestNeighbors
  2. RediSearch
  3. Faiss
  4. ElasticSearch
  5. Pinecone

QuickStart example

import numpy as np
# Import a similarity server of your choice:
# SKlearn (best for small datasets or testing)
from vecsim import SciKitIndex
sim = SciKitIndex(metric='cosine', dim=32)

user_ids = ["user_"+str(1+i) for i in range(100)]
user_data = np.random.random((100,32))
item_ids=["item_"+str(101+i) for i in range(100)]
item_data = np.random.random((100,32))
sim.add_items(user_data, user_ids, partition="users")
sim.add_items(item_data, item_ids, partition="items")
# Index the data
sim.init()
# Run nearest neighbor vector search
query = np.random.random(32)
dists, items = sim.search(query, k=10) # returns a list of users and items
dists, items = sim.search(query, k=10, partition="users") # returns a list of users only

For more examples, please read our documentation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages