Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node ids are random not sequential #319

Merged
merged 1 commit into from
Oct 24, 2024
Merged

Node ids are random not sequential #319

merged 1 commit into from
Oct 24, 2024

Conversation

paraseba
Copy link
Contributor

We want this to avoid conflicts in (at least) two scenarios:

  • distributed writers, where more than one creates a new node
  • rebasing of commits, where multiple commits create nodes

This is not strictly needed, we could "renumber" the nodes as we merge them, but it would increase complexity.

The downside of this change: slightly bigger type for nodes, slightly more cloning.

WARNING: this is an on-disk format breaking change, Icechunk versions with this change cannot read repositories written with previous versions.

We want this to avoid conflicts in (at least) two scenarios:

* distributed writers, where more than one creates a new node
* rebasing of commits, where multiple commits create nodes

This is not strictly needed, we could "renumber" the nodes as we merge
them, but it would increase complexity.

The downside of this change: slightly bigger type for nodes, slightly
more cloning.

WARNING: this is an on-disk format breaking change, Icechunk versions
with this change cannot read repositories written with previous
versions.
@paraseba paraseba requested review from mpiannucci and dcherian and removed request for mpiannucci October 23, 2024 22:21
@@ -192,7 +192,7 @@ async def test_icechunk_can_read_old_repo():
]
assert sorted(
[p async for p in store.list_dir("group2/group3/group4/group5/inner")]
) == ["zarr.json"]
) == ["c", "zarr.json"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anybody understand why I need to change this? This change makes sense, I think, but why it was passing before and not now with this apparently unrelated change?

impl private::Sealed for SnapshotTag {}
impl private::Sealed for ManifestTag {}
impl private::Sealed for ChunkTag {}
impl private::Sealed for AttributesTag {}
impl private::Sealed for NodeTag {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this 🦭 business?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to use a single class ObjectId<T> but have different types for SnapshotIds, ManifestIds, etc. That way you get a compile time error if you pass a ChunkId when an SnapshotId is expected. These Tag classes are "markers" to fill in the T in ObjectId<T>, it's what makes them different types for different objects. Sealed is so a user cannot create their own Id types by mistake ... for example, ObjectId<u8> is not going to work.

node.ok(),
Some(NodeSnapshot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference here? They seem equivalent, and the new version feels less readable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, very unfortunate ... we need to start writing some helpers for our tests. The issue is that I can no longer predict the node_id we'll get, so I cannot use eq.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I did think that using sequential integers was weird, but didn't vocalize. Shucks.

@paraseba paraseba merged commit d0b9aad into main Oct 24, 2024
3 checks passed
@paraseba paraseba deleted the push-ttrruolqwzzo branch October 24, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants