-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-15946: Revamp SAI metrics for fetched/returned keys/partitions/rows/tombstones #2132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
457f399 to
2b837e7
Compare
|
The three test failures above are junit timeouts, probably unrelated since those tests take quite long in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to finish reviewing some of the tests, you added so many, that's great! But I wanted to push my first comments and questions for consideration. I am a bit nervous around the transformations' changes, (reason - I haven't done a lot in that area of the code) so I am trying to be very carefully reasoning about all the cases.
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/CountFetchedRowsTransformation.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/CountFetchedTransformation.java
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/CountFetchedRowsTransformation.java
Outdated
Show resolved
Hide resolved
2b837e7 to
69e1d77
Compare
pkolaczk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Those new metric names make a lot more sense.
Much more readable and complete than before!
Thank you! :)
|
Looks like the only new CI failures are unrelated timeouts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between QueryContextTest and SlowSAIQueryLoggerTest, I was wondering what other scenarios we may want to add. The only thing I can think of are collections and TTL. WDYT? Does it make sense to add a few small tests?
Also, I think it is quite surprising with the current documentation and naming how partitionsFetched works with ANN queries. It is confusing for users. I guess changing the metric for ANN queries is hard but then we should change name/document better? Thoughts?
CC @pkolaczk
The rest looks great!
src/java/org/apache/cassandra/index/sai/plan/CountFetchedTransformation.java
Show resolved
Hide resolved
I don't see lots of value, but why not. I have added tests to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job fixing and testing these metrics, thanks.
I see CI errored-out so the merge is pending on clean CI and approval of the other PR.
…ows/tombstones The metrics partitionsRead, rowsFiltered, rowsPreFiltered and shadowedPrimaryKeyCount, which dodn't always work as expected, are replaced by these metrics: * keysFetched: Number of partition/row keys that will be used to fetch rows from the base table. * partitionsFetched: Number of live partitions fetched from the base table, before post-filtering and sorting. Note that currently ANN fetches one partition per index key, without any grouping of same-partition clusterings. * partitionsReturned: Number of live partitions returned to the coordinator, after post-filtering and sorting. * partitionTombstonesFetched: Number of deleted partitions found when reading the base table. * rowsFetched: Number of live rows fetched from the base table, before post-filtering and sorting. * rowsReturned: Number of live rows returned to the coordinator, after post-filtering and sorting. * rowTombstonesFetched: Number deleted rows or row ranges found when reading the base table. StorageAttachedIndexSearcher is modified to use the command timestamp, rather than getting the current time everytime a query timestamp is needed, which was possibly buggy and inefficient.
5ef6014 to
6affe01
Compare
|
❌ Build ds-cassandra-pr-gate/PR-2132 rejected by Butler2 regressions found Found 2 new test failures
Found 4 known test failures |
|
The only new test failures are unrelated junit timeouts, so CI looks good to me. |



What is the issue
The metrics partitionsRead, rowsFiltered, rowsPreFiltered and shadowedPrimaryKeyCount don't always work as expected. Details on https://github.com/riptano/cndb/issues/15946.
What does this PR fix and why was it fixed
Replace those metrics with:
keysFetched: Number of partition/row keys that will be used to fetch rows from the base table.partitionsFetched: Number of live partitions fetched from the base table, before post-filtering and sorting. Note that currently ANN fetches one partition per index key, without any grouping of same-partition clusterings.partitionsReturned: Number of live partitions returned to the coordinator, after post-filtering and sorting.partitionTombstonesFetched: Number of deleted partitions found when reading the base table.rowsFetched: Number of live rows fetched from the base table, before post-filtering and sorting.rowsReturned: Number of live rows returned to the coordinator, after post-filtering and sorting.rowTombstonesFetched: Number deleted rows or row ranges found when reading the base table.StorageAttachedIndexSearcheris modified to use the command timestamp, rather than getting the currenttime everytime a query timestamp is needed, which was possibly buggy and inefficient.
Here are some examples on how to interpret these metrics for AA partition-aware disk format:
keysFetched>partitionsFetched: The indexed data has changed since it was indexed and there has been deletions or updates.partitionsFetched<rowsFetched: The partitions are wide, if the difference is large queries with high within-partition selectivity might behave poorly compared to later disk formats, and vice versa.rowsFetched>rowsReturned: The queries have someALLOW FILTERINGrestriction or the partition-aware nature of the index is producing filtering.And for row-aware disk formats later than AA :
keysFetched>rowsFetched: The indexed data has changed since it was indexed and there has been deletions or updates.partitionsFetched<rowsFetched: The partitions are wide, if the difference is large queries with low within-partition selectivity might behave worse than with AA, and vice versa.rowsFetched>rowsReturned: The queries have someALLOW FILTERINGrestriction that is producing filtering.