Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setSelect(ALL_ATTRIBUTES) to retrieve snapshots #12

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

spangaer
Copy link
Contributor

By doing so, the indexes don't need to have ProjectionStrategy ALL, but KEYS_ONLY suffices. Thus, a duplicate of the payload doesn't need to be stored in the index, which is more space efficient.

This allows to make the choice between time and space efficiency, by selecting the projection strategy for the created index. Depending on the choice, setting this flag for retrieval will make either strategy work as efficient as intended.

Draft note

I'm well aware that the rename refactor and deployment processes are still ongoing. We'll adapt these commits to the new reality as it materializes but wanted to stage the contributions already.

Also FYI @coreyoconnor

@pjfanning pjfanning added this to the v1.1.0 milestone Feb 10, 2023
@coreyoconnor
Copy link
Contributor

Looks like a nice change!

Two thoughts:

  1. That comment on the tradeoff depending on how the index is configured should be copied here: https://github.com/apache/incubator-pekko-persistence-dynamodb/blob/27f9e016b2cc610e66e28b75bb7305d1e5ef4cee/src/main/scala/akka/persistence/dynamodb/query/scaladsl/DynamoDBCurrentPersistenceIdsQuery.scala#L64
  2. I think this kind of change will need a minor version bump. Depending on how the secondary index is configured, an existing user may see a difference in behavior/performance with the version that includes this.

@spangaer
Copy link
Contributor Author

We'll address point 1 upon rebase.

Point 2, while I don't mind the version bump, I don't think there will be a performance impact.

It didn't work without projecting all before. If you project all, it will keep on working in the same way it did before (inside Dynamo). Only if you don't project all with will behave differently on on the Dynamo side, but that way of working didn't work before this fix.

@spangaer spangaer marked this pull request as ready for review December 18, 2023 19:31
By doing so, the indexes don't need to have ProjectionStrategy ALL,
but KEYS_ONLY suffices. Thus, a duplicate of the payload doesn't need
to be stored in the index, which is more space efficient.

This allows to make the choice between time and space efficiency,
by selecting the projection strategy for the created index.
Depending on the choice, setting this flag for retrieval will make
either strategy work as efficient as intended.
@spangaer
Copy link
Contributor Author

I guess the build failures are unrelated to the feature.

@pjfanning
Copy link
Contributor

something weird getting cached on GitHub CI cache storage - I've cleared all the caches and will see what happens

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@mdedetrich mdedetrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just going to quickly block this because judging from the PR description this should be an option as there are tradeoffs, i.e. speed vs storage?

@spangaer Is this correct? If by default the current behaviour should remain and a config option should be added that enables this behaviour. @coreyoconnor Mayve you can comment here as well?

@spangaer
Copy link
Contributor Author

This allows to make the choice between time and space efficiency, by selecting the projection strategy for the created index. Depending on the choice, setting this flag for retrieval will make either strategy work as efficient as intended.

It is optional. You choose which strategy to follow by means of projection setting for the Dynamo index.

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html

ALL_ATTRIBUTES - Returns all of the item attributes from the specified table or index. If you query a local secondary index, then for each matching item in the index, DynamoDB fetches the entire item from the parent table. If the index is configured to project all item attributes, then all of the data can be obtained from the local secondary index, and no fetching is required.

So if you configure your index like you had to before, it has no performance implications, but if you switch to keys-only mode, you get the space-time trade-off.

@mdedetrich
Copy link
Contributor

This allows to make the choice between time and space efficiency, by selecting the projection strategy for the created index. Depending on the choice, setting this flag for retrieval will make either strategy work as efficient as intended.

It is optional. You choose which strategy to follow by means of projection setting for the Dynamo index.

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html

ALL_ATTRIBUTES - Returns all of the item attributes from the specified table or index. If you query a local secondary index, then for each matching item in the index, DynamoDB fetches the entire item from the parent table. If the index is configured to project all item attributes, then all of the data can be obtained from the local secondary index, and no fetching is required.

So if you configure your index like you had to before, it has no performance implications, but if you switch to keys-only mode, you get the space-time trade-off.

Ah I think I am a bit confused here, just to clarify all of these settings are done server side on Amazon Dynamo and so this flag is going to make zero difference wrt behaviour unless you toggle something in AWS Dynamo?

@spangaer
Copy link
Contributor Author

Correct. The Dynamo Index configuration (projection) determines the performance characteristics, the query setting here is just an enabler to the choice, without impact to existing setups.

The change takes effect by switching ALL for KEYS_ONLY.

It's also good to have the choices as best practices on the matter tend to be context dependent.

Copy link
Contributor

@mdedetrich mdedetrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay lgtm, just a misunderstanding on my part then

@pjfanning pjfanning merged commit 8f5de84 into apache:main Oct 17, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants