Node ids are random not sequential #319

paraseba · 2024-10-23T22:21:35Z

We want this to avoid conflicts in (at least) two scenarios:

distributed writers, where more than one creates a new node
rebasing of commits, where multiple commits create nodes

This is not strictly needed, we could "renumber" the nodes as we merge them, but it would increase complexity.

The downside of this change: slightly bigger type for nodes, slightly more cloning.

WARNING: this is an on-disk format breaking change, Icechunk versions with this change cannot read repositories written with previous versions.

We want this to avoid conflicts in (at least) two scenarios: * distributed writers, where more than one creates a new node * rebasing of commits, where multiple commits create nodes This is not strictly needed, we could "renumber" the nodes as we merge them, but it would increase complexity. The downside of this change: slightly bigger type for nodes, slightly more cloning. WARNING: this is an on-disk format breaking change, Icechunk versions with this change cannot read repositories written with previous versions.

paraseba · 2024-10-23T22:22:46Z

icechunk-python/tests/test_can_read_old.py

@@ -192,7 +192,7 @@ async def test_icechunk_can_read_old_repo():
    ]
    assert sorted(
        [p async for p in store.list_dir("group2/group3/group4/group5/inner")]
-    ) == ["zarr.json"]
+    ) == ["c", "zarr.json"]


Does anybody understand why I need to change this? This change makes sense, I think, but why it was passing before and not now with this apparently unrelated change?

dcherian · 2024-10-24T15:07:26Z

icechunk/src/format/mod.rs

 impl private::Sealed for SnapshotTag {}
 impl private::Sealed for ManifestTag {}
 impl private::Sealed for ChunkTag {}
 impl private::Sealed for AttributesTag {}
+impl private::Sealed for NodeTag {}


what is this 🦭 business?

This allows us to use a single class ObjectId<T> but have different types for SnapshotIds, ManifestIds, etc. That way you get a compile time error if you pass a ChunkId when an SnapshotId is expected. These Tag classes are "markers" to fill in the T in ObjectId<T>, it's what makes them different types for different objects. Sealed is so a user cannot create their own Id types by mistake ... for example, ObjectId<u8> is not going to work.

dcherian · 2024-10-24T15:09:57Z

icechunk/src/repository.rs

            node.ok(),
-            Some(NodeSnapshot {


what's the difference here? They seem equivalent, and the new version feels less readable to me.

yeah, very unfortunate ... we need to start writing some helpers for our tests. The issue is that I can no longer predict the node_id we'll get, so I cannot use eq.

dcherian

LGTM. I did think that using sequential integers was weird, but didn't vocalize. Shucks.

paraseba requested review from mpiannucci and dcherian and removed request for mpiannucci October 23, 2024 22:21

paraseba commented Oct 23, 2024

View reviewed changes

paraseba requested a review from mpiannucci October 23, 2024 22:27

dcherian reviewed Oct 24, 2024

View reviewed changes

dcherian approved these changes Oct 24, 2024

View reviewed changes

mpiannucci approved these changes Oct 24, 2024

View reviewed changes

paraseba merged commit d0b9aad into main Oct 24, 2024
3 checks passed

paraseba deleted the push-ttrruolqwzzo branch October 24, 2024 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node ids are random not sequential #319

Node ids are random not sequential #319

paraseba commented Oct 23, 2024

paraseba Oct 23, 2024

dcherian Oct 24, 2024

paraseba Oct 24, 2024

dcherian Oct 24, 2024

paraseba Oct 24, 2024

dcherian left a comment

Node ids are random not sequential #319

Node ids are random not sequential #319

Conversation

paraseba commented Oct 23, 2024

paraseba Oct 23, 2024

Choose a reason for hiding this comment

dcherian Oct 24, 2024

Choose a reason for hiding this comment

paraseba Oct 24, 2024

Choose a reason for hiding this comment

dcherian Oct 24, 2024

Choose a reason for hiding this comment

paraseba Oct 24, 2024

Choose a reason for hiding this comment

dcherian left a comment

Choose a reason for hiding this comment