-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node ids are random not sequential #319
Conversation
We want this to avoid conflicts in (at least) two scenarios: * distributed writers, where more than one creates a new node * rebasing of commits, where multiple commits create nodes This is not strictly needed, we could "renumber" the nodes as we merge them, but it would increase complexity. The downside of this change: slightly bigger type for nodes, slightly more cloning. WARNING: this is an on-disk format breaking change, Icechunk versions with this change cannot read repositories written with previous versions.
@@ -192,7 +192,7 @@ async def test_icechunk_can_read_old_repo(): | |||
] | |||
assert sorted( | |||
[p async for p in store.list_dir("group2/group3/group4/group5/inner")] | |||
) == ["zarr.json"] | |||
) == ["c", "zarr.json"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anybody understand why I need to change this? This change makes sense, I think, but why it was passing before and not now with this apparently unrelated change?
impl private::Sealed for SnapshotTag {} | ||
impl private::Sealed for ManifestTag {} | ||
impl private::Sealed for ChunkTag {} | ||
impl private::Sealed for AttributesTag {} | ||
impl private::Sealed for NodeTag {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this 🦭 business?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to use a single class ObjectId<T>
but have different types for SnapshotIds, ManifestIds, etc. That way you get a compile time error if you pass a ChunkId when an SnapshotId is expected. These Tag classes are "markers" to fill in the T
in ObjectId<T>
, it's what makes them different types for different objects. Sealed is so a user cannot create their own Id types by mistake ... for example, ObjectId<u8>
is not going to work.
node.ok(), | ||
Some(NodeSnapshot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference here? They seem equivalent, and the new version feels less readable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, very unfortunate ... we need to start writing some helpers for our tests. The issue is that I can no longer predict the node_id we'll get, so I cannot use eq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I did think that using sequential integers was weird, but didn't vocalize. Shucks.
We want this to avoid conflicts in (at least) two scenarios:
This is not strictly needed, we could "renumber" the nodes as we merge them, but it would increase complexity.
The downside of this change: slightly bigger type for nodes, slightly more cloning.
WARNING: this is an on-disk format breaking change, Icechunk versions with this change cannot read repositories written with previous versions.