-
Notifications
You must be signed in to change notification settings - Fork 18
(refactor): remove ArraySubset
unchecked methods
#156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Yeah, all of these unchecked variants probably make negligible difference to perf. I'll do a benchmark when your PR is ready to go.
Definitely don't want to parameterize on the dimensionality, as that would cause massive code bloat to monomorphisation. |
…::new_with_start_end_inc_unchecked`
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #156 +/- ##
==========================================
- Coverage 80.91% 79.98% -0.93%
==========================================
Files 190 190
Lines 27106 26611 -495
==========================================
- Hits 21933 21286 -647
- Misses 5173 5325 +152 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
let ranges = subset | ||
.start() | ||
.iter() | ||
.zip(shape_out) | ||
.map(|(&st, sh)| st..(st + sh)) | ||
.collect::<Vec<_>>(); | ||
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this pattern a number of times to minimize the amount of new errors we need to deal with i.e., we have a start and a shape, either not checked previously in the code or, as here, checked above for having the same length, and then we use new_with_ranges
. So the ranges
creation could become a utility but not sure the best place to put it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the main review comment
let overlap = array_subset.overlap(&chunk_subset).unwrap(); // FIXME: unwrap | ||
let chunk_subset_in_array_subset = | ||
unsafe { overlap.relative_to_unchecked(array_subset.start()) }; | ||
overlap.relative_to(array_subset.start()).unwrap(); | ||
let array_subset_in_chunk_subset = | ||
unsafe { overlap.relative_to_unchecked(chunk_subset.start()) }; | ||
overlap.relative_to(chunk_subset.start()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More unwrap
in place of unchecked
didn't seem like a regression necessarily...could refactor so that all errors are handled here but seemed like an all-or-nothing thing and I wanted to keep this PR tight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about it in this PR, but I am gradually trying to hoover up the unwrap()
in zarrs
and replace them with expect()
that clarifies why they should not panic in the same way that SAFETY docs clarify why _unchecked
is okay to use.
zarrs/src/array.rs
Outdated
chunk_subset | ||
.bound(self.shape()) | ||
.map_err(std::convert::Into::into) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be happy to learn if there was a cleaner way of going from "enum with two values into enum with many more values, including those two"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is! Provided you have the From
, just do Ok(chunk_subset.bound(self.shape())?)
Also I'm not sure what's up with code cov - it almost always complained about lines that were not covered before. I think we could add comments to prevent this on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This all looks pretty reasonable and the runtime cost should be negligible. Happy to merge this if you just address the few little comments.
Here is something to ponder though. I think I mentioned previously on Zulip or somewhere that the ArraySubset
API could benefit from using iterators. This could simplify a lot of the code, with fewer checks and unnecessary allocations. For example, the new_with_ranges
and new_with_start_shape
constructors could be changed to:
#[must_use]
// pub fn new_with_ranges(ranges: &[Range<u64>]) -> Self
pub fn new_with_ranges(ranges: impl IntoIterator<Item = Range<u64>>) -> Self {
let (start, shape) = ranges
.into_iter()
.map(|range| (range.start, range.end.saturating_sub(range.start)))
.unzip();
Self { start, shape }
}
// pub fn new_with_start_shape(start: ArrayIndices, shape: ArrayShape) -> Result<Self, ...
pub fn new_with_start_shape(start_shape: impl IntoIterator<Item = (u64, u64)>) -> Self {
let (start, shape) = start_shape.into_iter().unzip();
Self { start, shape }
}
Now new_with_start_shape
is infallible! So code like this
let ranges = subset
.start()
.iter()
.zip(shape_out)
.map(|(&st, sh)| st..(st + sh))
.collect::<Vec<_>>();
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);
can avoid the .collect()
, or just be simplified to
use itertools::zip_eq; // NOTE: zip_eq is a sanity check, but not required if the length of subset/shape have been checked
let subset_contiguous_start = ArraySubset::new_with_start_shape(zip_eq(subset.start(), shape_out))
Would you like to look into that with this PR or a follow-up? Otherwise, I am also happy to tackle it.
zarrs/src/array.rs
Outdated
chunk_subset | ||
.bound(self.shape()) | ||
.map_err(std::convert::Into::into) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is! Provided you have the From
, just do Ok(chunk_subset.bound(self.shape())?)
@@ -648,16 +648,15 @@ impl<TStorage: ?Sized + AsyncReadableStorageTraits + 'static> Array<TStorage> { | |||
chunk_subset.overlap(array_subset)?; | |||
|
|||
let mut output_view = unsafe { | |||
// SAFETY: chunks represent disjoint array subsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs a SAFETY doc
let overlap = array_subset.overlap(&chunk_subset).unwrap(); // FIXME: unwrap | ||
let chunk_subset_in_array_subset = | ||
unsafe { overlap.relative_to_unchecked(array_subset.start()) }; | ||
overlap.relative_to(array_subset.start()).unwrap(); | ||
let array_subset_in_chunk_subset = | ||
unsafe { overlap.relative_to_unchecked(chunk_subset.start()) }; | ||
overlap.relative_to(chunk_subset.start()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about it in this PR, but I am gradually trying to hoover up the unwrap()
in zarrs
and replace them with expect()
that clarifies why they should not panic in the same way that SAFETY docs clarify why _unchecked
is okay to use.
let ranges = subset | ||
.start() | ||
.iter() | ||
.zip(shape_out) | ||
.map(|(&st, sh)| st..(st + sh)) | ||
.collect::<Vec<_>>(); | ||
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the main review comment
Oh and don't stress about codecov, but if you do know a way to ignore previously uncovered lines that would be handy |
I'll address the comments here first and then do a bit of pondering on the iterator thought. I have some thoughts about it now from doing the PR i.e., what can be considered "safe" and how to expose that. I'll comment once I do the clean-up here (not opposed to putting it in this PR, just want to gather my thoughts) |
… into ig/arraysubset_api_audit
Re: iterators, I am not sure how infallible they are on their own with pub fn new_with_start_shape(
start: ArrayIndices,
shape: ArrayShape,
) -> Self {
if start.len() == shape.len() {
Ok(Self { start, shape })
} else {
panic!("shapes don't match")
}
} I am not a huge fan of My experience was that So I'm not opposed to making everything iterators but I am not sure I see it as more robust than what we have now. I think we can make clear to users maybe that |
The existing zarrs/zarrs/src/array_subset/iterators/chunks_iterator.rs Lines 149 to 154 in cd859f2
Can become let chunk_subset = ArraySubset::new_with_start_shape_iter(
zip(&chunk_indices, self.chunk_shape).map(|(i, c)| (i * c, *c)),
);
This is a library, so panics are a last resort. I think the only methods that panic now are because an array / offset exceeds But let's just not worry about the iterator stuff this PR, are you happy if I merge this as-is? |
Towards #52, making the API surface of
ArraySubset
smaller will help determine what common operations can be moved to atrait
The two commit messages (so far) should signal what is being removed (so far), although they might have a non-trivial dragnet i.e., removing one function entailed removing others.
I haven't benchmarked this yet, but the size of the code-diff smells positive. Hopefully there's no performance knoc
While going through this PR, though, I learned about https://doc.rust-lang.org/reference/items/generics.html - we could maybe remove the dimensionality checks altogether if we parametrize the dimension? I'm not sure about this one though, would be curious to learn more