-
Notifications
You must be signed in to change notification settings - Fork 35
Manifest Splitting #767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifest Splitting #767
Conversation
8630650
to
fd1c572
Compare
icechunk/src/format/manifest.rs
Outdated
pub struct ManifestShards(Vec<ManifestExtents>); | ||
|
||
impl ManifestShards { | ||
pub fn default(ndim: usize) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this, but it is certainly tied to ndim
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ManifestSplits is an enum to avoid this?
enum ManifestSplits {
Single,
Multiple(Vec<ManifestExtents>)
}
What I don't like is the empty vector. I wonder if Rust has a NonEmptyVec type, otherwise, a trick people use is:
...
Multiple{ first: ManifestExtents, rest: Vec<ManifestExtents>}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -37,9 +33,77 @@ impl ManifestExtents { | |||
Self(v) | |||
} | |||
|
|||
pub fn contains(&self, coord: &[u32]) -> bool { | |||
self.iter().zip(coord.iter()).all(|(range, that)| range.contains(that)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to start checking on writes that indexes have the proper size for the metadata
e7d9221
to
09476a4
Compare
9c1605f
to
34126a8
Compare
34126a8
to
d816f8b
Compare
76478b1
to
9a8bbc0
Compare
9a8bbc0
to
a64252a
Compare
3dbac59
to
8c4cc59
Compare
icechunk/src/repository.rs
Outdated
Ok(()) | ||
} | ||
|
||
// #[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
Co-authored-by: Sebastián Galkin <code@amisdelabc.com>
add ndim based condition (3D vs 4D)(if someone asks for it)Minimal docs here: https://icechunk--767.org.readthedocs.build/en/767/icechunk-python/performance/
I rewrote the ERA5 manifests to put 1 year per manifest (~9000 chunks); This gets us 3X speedup.