Skip to content
This repository has been archived by the owner on Jan 24, 2025. It is now read-only.

How to set the page size? #31

Open
TiyCHEN opened this issue Jan 21, 2025 · 3 comments
Open

How to set the page size? #31

TiyCHEN opened this issue Jan 21, 2025 · 3 comments

Comments

@TiyCHEN
Copy link

TiyCHEN commented Jan 21, 2025

Hi,
Thank you for your excellent work. I now want to use Starling to run a 1024-dimensional vector, but I encountered a Float overflow error. It runs successfully with smaller dimensions, so I guess that the default page size of 4KB might be insufficient to store a 1024-dimensional vector. If that's the case, could you please guide me on how to adjust the page size settings?

@PwzXxm
Copy link
Collaborator

PwzXxm commented Jan 22, 2025

Thank you for trying it out. Currently, it does not support tunable page size. Both the graph partitioner and the search process fixed to 4KB page size to align with the SSD minimum page size.

You could try to search for SECTOR_LEN and change it to a larger size, i.e. 8192 or larger (recommended to be a multiple of 4KB). It may require some debugging unfortunately.

@TiyCHEN
Copy link
Author

TiyCHEN commented Jan 22, 2025

Thanks for your feedback!
When the pagesize remains unchanged, what is the standard approach for processing 1024-dimensional vector data? Currently, I set disk_PQ=256 (as uncompressed vectors would trigger an error in the first step) and then apply sq=1 during graph partitioning (It can build the index, but it looks report low recall). Could you advise on the best practices for this scenario?

Building disk index...
./run_benchmark.sh: line 51: 30823 Floating point exception(core dumped) ${EXE_PATH}/tests/build_disk_index --data_type $DATA_TYPE --dist_fn $DIST_FN --data_path $BASE_PATH --index_path_prefix $INDEX_PREFIX_PATH -R $R -L $BUILD_L -B $B -M $M -T $BUILD_T > ${INDEX_PREFIX_PATH}build.log

@PwzXxm
Copy link
Collaborator

PwzXxm commented Jan 24, 2025

I think you are on the right track. You can either increase page size, or quantize the vectors so that more vectors can fit in a 4KB page. Another approach would be looking into techniques to reduce dimensions, such as PCA.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants