Skip to content

Conversation

mccullocht
Copy link
Contributor

@mccullocht mccullocht commented Sep 29, 2025

Partial implementation of #15155

So far this is not any faster than the alternative. On an AMD RYZEN AI MAX+ 395

baseline:
Results:
recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.913        1.635   1.630        0.997  1000000   100     100       32        250     8 bits     6824      0.00      Infinity            0.04             1         3759.67      3677.368      747.681       HNSW

candidate:
Results:
recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.913        1.671   1.661        0.994  1000000   100     100       32        250     8 bits     6824      0.00      Infinity            0.04             1         3759.67      3677.368      747.681       HNSW

DO NOT MERGE
Performance observations: on an avx512 host the profiles are quite different. the original path spends a most time in dotProductBody512 followed by Int512Vector.reduceLanes(). the new path spends much more time in reduceLanes() but also spends more time loading from a memory segment for the input vectors -- a 128 bit load from a memory segment instead of a heap array. this could be memory latency but in that case why doesn't the load into the heap array show up in the profile?

thecoop and others added 28 commits September 24, 2025 09:32
Slowly moving the legacy formats to the backwards codecs.

I have most of the logic moved, but there are additional things to figure out.

I don't think we can easily move the Lucene99ScalarQuantizedVectorScorer just yet, but we should be able to prevent users from using the old quantized formats.
…s for non-accountable query size (apache#15124)

* Use RamUsageEstimator to calculate query size instead of the default 1024 bytes for non accountable queries

* Use try-with-resources for directory and indexWriter

* Adding changes to cache query size and queries per clause to reduce impact of repeated visit() calls during RamUsageEstimator.sizeOf()

* Adding changelog entry

* Making queries per clause list immutable

* Adding unit test to verify query size is cached correctly

* Renmaing QueryMetadata to Record

* Changing query metadata to record type and removing boolean query changes
Fix some issues found by `actionlint`, `shellcheck`, and `zizmor -p`
More issues remain, this is just incremental progress.
Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com>
…e array filter applied and never cache dictionaries from custom locations (apache#15237)

* Fix SmartChinese to only serialize data from classpath with a native array filter applied and never cache dictionaries from custom locations

* fix exception handling

* add CHANGES.txt

* fix exception handling

* fix typo in CHANGES.txt

* Restore the code to regenerate the serialized file

* Disallow any serialization in test-framework as early as possible (we can't do that via sysprops due to Gradle)

* Install a serialization filter that only allows Gradle's test runner config deserialization

* fix errorprone

* use same logic like for security manager to install the filter

* add new filter to CHANGES.txt

* Improve performance by only installing a filter if we're not called from Gradle

* simplify

* Add a test for the deserialization filter

* fix typo

* improve test
…pache#15242)

Add .github/actionlint.yaml and .github/workflows/actions.yml to enable
workflow validation with actionlint and security scanning with zizmor.

High severity issues are addressed but other issues remain. This is just
an incremental step
* ci: tune codeql for java to use default query pack

Set security-extended queries only for actions and python.

Security-extended for java contains too many noisy checks (e.g. every
single place an implicit narrowing cast happens from a compound
assignment).
Some of the security-extended checks were actually useful, it only has
one extremely noisy rule, just like the default queries have one
extremely noisy rule.

Disable both of the noisy rules via configuration file instead.
@benwtrent
Copy link
Member

@mccullocht maybe we only do the byte part of the comparisons off-heap? Then apply the corrections all on heap. I would assume applying corrections is pretty cheap, but even then if we did it in bulk, maybe on-heap bulk correction application is pretty fast?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants