Skip to content

(refactor): remove ArraySubset unchecked methods #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link

@ilan-gold ilan-gold commented Mar 4, 2025

Towards #52, making the API surface of ArraySubset smaller will help determine what common operations can be moved to a trait

The two commit messages (so far) should signal what is being removed (so far), although they might have a non-trivial dragnet i.e., removing one function entailed removing others.

I haven't benchmarked this yet, but the size of the code-diff smells positive. Hopefully there's no performance knoc

While going through this PR, though, I learned about https://doc.rust-lang.org/reference/items/generics.html - we could maybe remove the dimensionality checks altogether if we parametrize the dimension? I'm not sure about this one though, would be curious to learn more

@LDeakin
Copy link
Owner

LDeakin commented Mar 5, 2025

Yeah, all of these unchecked variants probably make negligible difference to perf. I'll do a benchmark when your PR is ready to go.

While going through this PR, though, I learned about https://doc.rust-lang.org/reference/items/generics.html - we could maybe remove the dimensionality checks altogether if we parametrize the dimension? I'm not sure about this one though, would be curious to learn more

Definitely don't want to parameterize on the dimensionality, as that would cause massive code bloat to monomorphisation.

Copy link

codecov bot commented Apr 1, 2025

Codecov Report

Attention: Patch coverage is 83.88626% with 34 lines in your changes missing coverage. Please review.

Project coverage is 79.98%. Comparing base (cd859f2) to head (a697a3b).

Files with missing lines Patch % Lines
zarrs/src/array_subset.rs 68.51% 17 Missing ⚠️
...arrs/src/array_subset/iterators/chunks_iterator.rs 81.81% 4 Missing ⚠️
...rray_to_bytes/sharding/sharding_partial_decoder.rs 90.00% 3 Missing ⚠️
zarrs/src/array/array_async_readable.rs 33.33% 2 Missing ⚠️
zarrs/src/array/array_bytes_fixed_disjoint_view.rs 88.23% 2 Missing ⚠️
...rc/array/chunk_cache/array_chunk_cache_ext_sync.rs 33.33% 2 Missing ⚠️
zarrs/src/array.rs 75.00% 1 Missing ⚠️
zarrs/src/array/array_sync_readable.rs 66.66% 1 Missing ⚠️
zarrs/src/array/array_sync_sharded_readable_ext.rs 50.00% 1 Missing ⚠️
...ay/codec/array_to_bytes/sharding/sharding_codec.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #156      +/-   ##
==========================================
- Coverage   80.91%   79.98%   -0.93%     
==========================================
  Files         190      190              
  Lines       27106    26611     -495     
==========================================
- Hits        21933    21286     -647     
- Misses       5173     5325     +152     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +73 to +79
let ranges = subset
.start()
.iter()
.zip(shape_out)
.map(|(&st, sh)| st..(st + sh))
.collect::<Vec<_>>();
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this pattern a number of times to minimize the amount of new errors we need to deal with i.e., we have a start and a shape, either not checked previously in the code or, as here, checked above for having the same length, and then we use new_with_ranges. So the ranges creation could become a utility but not sure the best place to put it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the main review comment

Comment on lines +281 to +285
let overlap = array_subset.overlap(&chunk_subset).unwrap(); // FIXME: unwrap
let chunk_subset_in_array_subset =
unsafe { overlap.relative_to_unchecked(array_subset.start()) };
overlap.relative_to(array_subset.start()).unwrap();
let array_subset_in_chunk_subset =
unsafe { overlap.relative_to_unchecked(chunk_subset.start()) };
overlap.relative_to(chunk_subset.start()).unwrap();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More unwrap in place of unchecked didn't seem like a regression necessarily...could refactor so that all errors are handled here but seemed like an all-or-nothing thing and I wanted to keep this PR tight

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it in this PR, but I am gradually trying to hoover up the unwrap() in zarrs and replace them with expect() that clarifies why they should not panic in the same way that SAFETY docs clarify why _unchecked is okay to use.

Comment on lines 787 to 789
chunk_subset
.bound(self.shape())
.map_err(std::convert::Into::into)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be happy to learn if there was a cleaner way of going from "enum with two values into enum with many more values, including those two"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is! Provided you have the From, just do Ok(chunk_subset.bound(self.shape())?)

@ilan-gold ilan-gold marked this pull request as ready for review April 2, 2025 10:40
@ilan-gold ilan-gold requested a review from LDeakin as a code owner April 2, 2025 10:41
@ilan-gold
Copy link
Author

ilan-gold commented Apr 2, 2025

Also I'm not sure what's up with code cov - it almost always complained about lines that were not covered before. I think we could add comments to prevent this on main? Or is it fine as-is for now?

Copy link
Owner

@LDeakin LDeakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This all looks pretty reasonable and the runtime cost should be negligible. Happy to merge this if you just address the few little comments.


Here is something to ponder though. I think I mentioned previously on Zulip or somewhere that the ArraySubset API could benefit from using iterators. This could simplify a lot of the code, with fewer checks and unnecessary allocations. For example, the new_with_ranges and new_with_start_shape constructors could be changed to:

#[must_use]
// pub fn new_with_ranges(ranges: &[Range<u64>]) -> Self
pub fn new_with_ranges(ranges: impl IntoIterator<Item = Range<u64>>) -> Self {
    let (start, shape) = ranges
        .into_iter()
        .map(|range| (range.start, range.end.saturating_sub(range.start)))
        .unzip();
    Self { start, shape }
}

// pub fn new_with_start_shape(start: ArrayIndices, shape: ArrayShape) -> Result<Self, ...
pub fn new_with_start_shape(start_shape: impl IntoIterator<Item = (u64, u64)>) -> Self {
    let (start, shape) = start_shape.into_iter().unzip();
    Self { start, shape }
}

Now new_with_start_shape is infallible! So code like this

let ranges = subset
    .start()
    .iter()
    .zip(shape_out)
    .map(|(&st, sh)| st..(st + sh))
    .collect::<Vec<_>>();
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);

can avoid the .collect(), or just be simplified to

use itertools::zip_eq; // NOTE: zip_eq is a sanity check, but not required if the length of subset/shape have been checked
let subset_contiguous_start = ArraySubset::new_with_start_shape(zip_eq(subset.start(), shape_out))

Would you like to look into that with this PR or a follow-up? Otherwise, I am also happy to tackle it.

Comment on lines 787 to 789
chunk_subset
.bound(self.shape())
.map_err(std::convert::Into::into)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is! Provided you have the From, just do Ok(chunk_subset.bound(self.shape())?)

@@ -648,16 +648,15 @@ impl<TStorage: ?Sized + AsyncReadableStorageTraits + 'static> Array<TStorage> {
chunk_subset.overlap(array_subset)?;

let mut output_view = unsafe {
// SAFETY: chunks represent disjoint array subsets
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs a SAFETY doc

Comment on lines +281 to +285
let overlap = array_subset.overlap(&chunk_subset).unwrap(); // FIXME: unwrap
let chunk_subset_in_array_subset =
unsafe { overlap.relative_to_unchecked(array_subset.start()) };
overlap.relative_to(array_subset.start()).unwrap();
let array_subset_in_chunk_subset =
unsafe { overlap.relative_to_unchecked(chunk_subset.start()) };
overlap.relative_to(chunk_subset.start()).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it in this PR, but I am gradually trying to hoover up the unwrap() in zarrs and replace them with expect() that clarifies why they should not panic in the same way that SAFETY docs clarify why _unchecked is okay to use.

Comment on lines +73 to +79
let ranges = subset
.start()
.iter()
.zip(shape_out)
.map(|(&st, sh)| st..(st + sh))
.collect::<Vec<_>>();
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the main review comment

@LDeakin
Copy link
Owner

LDeakin commented Apr 2, 2025

Oh and don't stress about codecov, but if you do know a way to ignore previously uncovered lines that would be handy

@ilan-gold
Copy link
Author

ilan-gold commented Apr 3, 2025

I'll address the comments here first and then do a bit of pondering on the iterator thought. I have some thoughts about it now from doing the PR i.e., what can be considered "safe" and how to expose that. I'll comment once I do the clean-up here (not opposed to putting it in this PR, just want to gather my thoughts)

@ilan-gold
Copy link
Author

ilan-gold commented Apr 3, 2025

Re: iterators, I am not sure how infallible they are on their own with zip - zip_eq is a different story although panicking isn't amazing. Also, at that point, I would just leave the API as-is in terms of what we accept and just panic if shapes don't match instead of erroring:

    pub fn new_with_start_shape(
        start: ArrayIndices,
        shape: ArrayShape,
    ) -> Self {
        if start.len() == shape.len() {
            Ok(Self { start, shape })
        } else {
            panic!("shapes don't match")
        }
    }

I am not a huge fan of impl IntoIterator<Item = (u64, u64)> representing a start + shape object, but maybe rust users are accustomed to this sort of thing. Could you elaborate on that?

My experience was that new_with_ranges helped to sidestep the various errors when they weren't necessary (good) but was not "infallible" (bad) in so far as zip is not infallible (and that is often how ranges are made). Fundamentally that is the question - how do we handle potentially mismatched-size inputs and at the same time inputs that we "know" are correct like when something is made from start and end from the same ArraySubset? Do we panic if they don't match? Error? Provide a back door that does neither (new_with_ranges in this case?)?

So I'm not opposed to making everything iterators but I am not sure I see it as more robust than what we have now. I think we can make clear to users maybe that new_with_ranges provides a "back-door" where there is no error-handling, but all other methods will check, which may be desireable. I think it's mainly a question of how "safe" we want to be.

@LDeakin
Copy link
Owner

LDeakin commented Apr 3, 2025

I am not sure how infallible they are on their own with zip - zip_eq is a different story although panicking isn't amazing

zarrs always does dimensionality checks on ArraySubsets, so it wouldn't matter it the user supplied a mismatched start/shape to such a constructor, they would get an error when they try to use it with array/codec methods. Internally, the use of zip_eq could just be a sanity check in the same way that all the _unchecked methods had debug assertions.

I am not a huge fan of impl IntoIterator<Item = (u64, u64)> representing a start + shape object, but maybe rust users are accustomed to this sort of thing. Could you elaborate on that?

The existing new_with_start_shape should probably remain for API compatibility anyway, but an iterator-based constructor is very useful in cases where iterator/s are already producing a start/shape. Example:

let start = std::iter::zip(&chunk_indices, self.chunk_shape)
.map(|(i, c)| i * c)
.collect();
let chunk_subset = unsafe {
ArraySubset::new_with_start_shape_unchecked(start, self.chunk_shape.to_vec())
};

Can become

let chunk_subset = ArraySubset::new_with_start_shape_iter(
    zip(&chunk_indices, self.chunk_shape).map(|(i, c)| (i * c, *c)),
);

Also, at that point, I would just leave the API as-is in terms of what we accept and just panic if shapes don't match instead of erroring

This is a library, so panics are a last resort. I think the only methods that panic now are because an array / offset exceeds usize::MAX (which only really impacts 32-bit platforms) or there is an allocation failure. Internal .expects() / .unwraps() are intended not to panic unless there is a developer error.

But let's just not worry about the iterator stuff this PR, are you happy if I merge this as-is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants