[FEA] Binary IVF Flat Index by tarang-jain · Pull Request #1099 · rapidsai/cuvs

tarang-jain · 2025-07-09T23:15:26Z

Depends on rapidsai/raft#2770

Implementation of binary ivf flat index (bitwise hamming metric for the IVF Flat index)

Key Features

1. Binary Index Structure

Added binary_centers_ field to store cluster centers as packed uint8_t arrays for binary data
Index automatically detects BitwiseHamming metric and configures itself for binary operation
Only support uint8_t inputs with BitwiseHamming and add only single instantiations of newly added kernels

2. K-means Clustering for Binary Data

The clustering approach for binary data required special handling:

Expanded Space Clustering: Binary data (uint8_t) is expanded to signed representation (int8_t) where each bit becomes ±1
- 0 → -1, 1 → +1 transformation enables meaningful centroid computation
- Clustering performed using L2 distance in the expanded dimensional space
Centroid Quantization: After computing float centroids in expanded space, they are converted back to binary format:
- Centroids are stored as packed uint8_t arrays
- KMeans (coarse) prediction is done on these quantized centroids with the BitwiseHamming distance.

3. Distance Kernels

Coarse Search (Cluster Selection)

Implemented specialized bitwise_hamming_distance_op for query-to-centroid distances in order to compute PairwiseDistances

Fine-Grained Search (Within Clusters)

Extended the interleaved scan kernel (ivf_flat_interleaved_scan.cuh) with specialized templates for BitwiseHamming:

Veclen-based optimization: Different code paths based on vectorization width
- Veclen=16,8,4: Load data as uint32_t, use __popc(x ^ y) for 4-byte Hamming distance
- Veclen=1,2: Byte-wise XOR and population count
Efficient memory access patterns:
- Maintains interleaved data layout for coalesced memory access
- Specialized loadAndComputeDist templates for uint8_t that leverage vectorized loads

as of 10/17/2025
Binary size increase:
branch-25.12 (CUDA 12.9 + X86): 1232.414 MB
This PR (CUDA 12.9 + X86): 1251.051 MB

…binary-kmeans

copy-pr-bot · 2025-07-09T23:15:30Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…binary-kmeans

…nto binary-kmeans

…means

tarang-jain · 2025-12-24T18:25:46Z

/ok to test 07354d1

tarang-jain · 2025-12-30T17:40:23Z

/ok to test 07e1837

cjnolet · 2026-01-28T02:45:57Z

/ok to test d546471

tarang-jain added 2 commits July 9, 2025 16:11

first commit

bedcf4c

Merge branch 'branch-25.08' of https://github.com/rapidsai/cuvs into …

09f9a22

…binary-kmeans

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Jul 9, 2025

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Jul 9, 2025

github-actions bot added the cpp label Jul 9, 2025

tarang-jain self-assigned this Jul 9, 2025

tarang-jain added feature request New feature or request non-breaking Introduces a non-breaking change DO NOT MERGE labels Jul 9, 2025

index header

51836d4

cjnolet moved this from Todo to In Progress in Vector Search, ML, & Data Mining Release Board Jul 11, 2025

tarang-jain and others added 18 commits July 11, 2025 10:54

populate functions;ivf_list type;kmeans_predict

23ef877

Merge branch 'branch-25.08' of https://github.com/rapidsai/cuvs into …

a7fce8e

…binary-kmeans

hamming_op

6a98a88

Merge branch 'branch-25.08' into binary-kmeans

76c9ee5

rm binary_ivf

916a4cf

Merge branch 'binary-kmeans' of https://github.com/tarang-jain/cuvs i…

8ec4d59

…nto binary-kmeans

modify ivf_flat_build

1941b2e

rm binary_ivf_flat

cd00b83

rm unused

4cffe84

updates

2bc9007

quantize

7803850

cleanup

ff7be4a

pre-commit

3149192

update kmeans_predict

dd1b0d4

src kmeans

2b9bef4

style

6ec32d8

corrections to logic

5c59753

clang

2271809

tarang-jain and others added 14 commits December 4, 2025 14:50

doc

a9599bd

Merge branch 'main' of https://github.com/rapidsai/cuvs into binary-k…

ffbcdcc

…means

simplify ivf-flat build

3ffba85

fix compilation errors

c0a99e2

bug fixes

dbb6423

debug

248911c

more corrections to kmeans

0656ee6

Merge branch 'main' into binary-kmeans

1423356

Merge branch 'main' into binary-kmeans

a25ddac

Merge branch 'main' of https://github.com/rapidsai/cuvs into binary-k…

e8a8152

…means

bug fixes

8feebb8

debug

997ddde

working impl;rm debug statements

89b54a1

rm debug prints:

07354d1

Merge branch 'main' into binary-kmeans

07e1837

tarang-jain changed the base branch from main to release/26.02 January 20, 2026 21:42

tarang-jain added 3 commits January 20, 2026 13:42

Merge branch 'release/26.02' into binary-kmeans

510bafd

Merge branch 'release/26.02' into binary-kmeans

0ddce5d

Merge branch 'release/26.02' into binary-kmeans

d546471

AyodeAwe and others added 2 commits February 4, 2026 15:46

REL v26.02.00 release

a2f5a8b

Merge branch 'release/26.02' into binary-kmeans

19cfe12

tarang-jain changed the base branch from release/26.02 to main March 17, 2026 00:38

tarang-jain requested a review from a team as a code owner March 17, 2026 00:38

tarang-jain and others added 4 commits March 16, 2026 17:49

rebase

e143514

fix compilation

2505349

Merge branch 'main' into binary-kmeans

61ca5b7

Merge branch 'main' into binary-kmeans

18f0caf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Binary IVF Flat Index#1099

[FEA] Binary IVF Flat Index#1099
tarang-jain wants to merge 152 commits intorapidsai:mainfrom
tarang-jain:binary-kmeans

tarang-jain commented Jul 9, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jul 9, 2025

Uh oh!

tarang-jain commented Dec 24, 2025

Uh oh!

tarang-jain commented Dec 30, 2025

Uh oh!

cjnolet commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tarang-jain commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

1. Binary Index Structure

2. K-means Clustering for Binary Data

3. Distance Kernels

Coarse Search (Cluster Selection)

Fine-Grained Search (Within Clusters)

Uh oh!

copy-pr-bot bot commented Jul 9, 2025

Uh oh!

tarang-jain commented Dec 24, 2025

Uh oh!

tarang-jain commented Dec 30, 2025

Uh oh!

cjnolet commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tarang-jain commented Jul 9, 2025 •

edited

Loading