GitHub - microsoft/VBench: An Approximate Vector-Analytics Benchmark for Relational Databases

VBench is a benchmark for evaluating vector analytic-queries based on SQL interface. VBench uses Recipe1M dataset augmented with scalar attributes, and provides a comprehensive set of vector analytic-queries that utilize standard SQL operators, including Join, GroupBy, Filter and TopK.

In this repo, we provides instructions on

how to cook the VBench dataset
how to evaluate the vector-analytic engines on it

VBench Dataset

VBench dataset consists of two tables: Recipe Table and Tag Table.

Recipe Table

Column Name	Data Type	Example	Notes
recipe_id	Identifier	1	primary key
images	list of String	['data/images/1/0.jpg', ...]	paths of images
description	Text	[ingredients] + [instruction]	sparse vector
images_embedding	Vector	[-0.0421, 0.0296, ...,0.0273]	dense vector, 1024 dimensions
description_embedding	Vector	[0.0056,-0.0487,..., 0.0034]	dense vect, 1024 dimensions
price	Integer	18	price of the dish

Tag Table

Column Name	Data Type	Example	Notes
id	Identifier	1	primary key
tag_name	Text	"salad"	name of the tag
tag_vector	Vector	[-0.0137, 0.0421,...,0.0183]	embedding or weight vector, 1024 dimensions

Please refer to dataset_generation/README.md for detail insructions on how to generate these two tables.

VBench Queries

VBench has 12 queries, which can be divided into four categories:

Top-K
Vector filtering
Join
Group By The queries utilize standard SQL operators over vector and scalar columns Please refer to quereis.sql for detail.

Evaluation

Please refer to evaluation/README.md for detail insructions on how to evaluate different vector search engines.

License

The entire codebase is under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataset_generation		dataset_generation
evaluation		evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pylintrc		pylintrc
queries.sql		queries.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

VBench Dataset

VBench Queries

Evaluation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

microsoft/VBench

Folders and files

Latest commit

History

Repository files navigation

VBench Dataset

VBench Queries

Evaluation

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages