(refactor): remove `ArraySubset` unchecked methods #156

ilan-gold · 2025-03-04T15:37:25Z

Towards #52, making the API surface of ArraySubset smaller will help determine what common operations can be moved to a trait

The two commit messages (so far) should signal what is being removed (so far), although they might have a non-trivial dragnet i.e., removing one function entailed removing others.

I haven't benchmarked this yet, but the size of the code-diff smells positive. Hopefully there's no performance knoc

While going through this PR, though, I learned about https://doc.rust-lang.org/reference/items/generics.html - we could maybe remove the dimensionality checks altogether if we parametrize the dimension? I'm not sure about this one though, would be curious to learn more

LDeakin · 2025-03-05T22:42:54Z

Yeah, all of these unchecked variants probably make negligible difference to perf. I'll do a benchmark when your PR is ready to go.

While going through this PR, though, I learned about https://doc.rust-lang.org/reference/items/generics.html - we could maybe remove the dimensionality checks altogether if we parametrize the dimension? I'm not sure about this one though, would be curious to learn more

Definitely don't want to parameterize on the dimensionality, as that would cause massive code bloat to monomorphisation.

…audit

…::new_with_start_end_inc_unchecked`

codecov · 2025-04-01T13:48:15Z

Codecov Report

Attention: Patch coverage is 83.88626% with 34 lines in your changes missing coverage. Please review.

Project coverage is 79.98%. Comparing base (cd859f2) to head (a697a3b).

Files with missing lines	Patch %	Lines
zarrs/src/array_subset.rs	68.51%	17 Missing ⚠️
...arrs/src/array_subset/iterators/chunks_iterator.rs	81.81%	4 Missing ⚠️
...rray_to_bytes/sharding/sharding_partial_decoder.rs	90.00%	3 Missing ⚠️
zarrs/src/array/array_async_readable.rs	33.33%	2 Missing ⚠️
zarrs/src/array/array_bytes_fixed_disjoint_view.rs	88.23%	2 Missing ⚠️
...rc/array/chunk_cache/array_chunk_cache_ext_sync.rs	33.33%	2 Missing ⚠️
zarrs/src/array.rs	75.00%	1 Missing ⚠️
zarrs/src/array/array_sync_readable.rs	66.66%	1 Missing ⚠️
zarrs/src/array/array_sync_sharded_readable_ext.rs	50.00%	1 Missing ⚠️
...ay/codec/array_to_bytes/sharding/sharding_codec.rs	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #156      +/-   ##
==========================================
- Coverage   80.91%   79.98%   -0.93%     
==========================================
  Files         190      190              
  Lines       27106    26611     -495     
==========================================
- Hits        21933    21286     -647     
- Misses       5173     5325     +152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ehavior

ilan-gold · 2025-04-02T10:37:13Z

zarrs/src/array_subset/iterators/contiguous_indices_iterator.rs

+        let ranges = subset
+            .start()
+            .iter()
+            .zip(shape_out)
+            .map(|(&st, sh)| st..(st + sh))
+            .collect::<Vec<_>>();
+        let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);


I used this pattern a number of times to minimize the amount of new errors we need to deal with i.e., we have a start and a shape, either not checked previously in the code or, as here, checked above for having the same length, and then we use new_with_ranges. So the ranges creation could become a utility but not sure the best place to put it.

See the main review comment

ilan-gold · 2025-04-02T10:38:40Z

zarrs/src/array/array_async_readable_writable.rs

+                let overlap = array_subset.overlap(&chunk_subset).unwrap(); // FIXME: unwrap
                let chunk_subset_in_array_subset =
-                    unsafe { overlap.relative_to_unchecked(array_subset.start()) };
+                    overlap.relative_to(array_subset.start()).unwrap();
                let array_subset_in_chunk_subset =
-                    unsafe { overlap.relative_to_unchecked(chunk_subset.start()) };
+                    overlap.relative_to(chunk_subset.start()).unwrap();


More unwrap in place of unchecked didn't seem like a regression necessarily...could refactor so that all errors are handled here but seemed like an all-or-nothing thing and I wanted to keep this PR tight

Don't worry about it in this PR, but I am gradually trying to hoover up the unwrap() in zarrs and replace them with expect() that clarifies why they should not panic in the same way that SAFETY docs clarify why _unchecked is okay to use.

ilan-gold · 2025-04-02T10:40:46Z

zarrs/src/array.rs

+        chunk_subset
+            .bound(self.shape())
+            .map_err(std::convert::Into::into)


Would be happy to learn if there was a cleaner way of going from "enum with two values into enum with many more values, including those two"

There is! Provided you have the From, just do Ok(chunk_subset.bound(self.shape())?)

ilan-gold · 2025-04-02T10:41:22Z

Also I'm not sure what's up with code cov - it almost always complained about lines that were not covered before. I think we could add comments to prevent this on main? Or is it fine as-is for now?

LDeakin

Nice! This all looks pretty reasonable and the runtime cost should be negligible. Happy to merge this if you just address the few little comments.

Here is something to ponder though. I think I mentioned previously on Zulip or somewhere that the ArraySubset API could benefit from using iterators. This could simplify a lot of the code, with fewer checks and unnecessary allocations. For example, the new_with_ranges and new_with_start_shape constructors could be changed to:

#[must_use]
// pub fn new_with_ranges(ranges: &[Range<u64>]) -> Self
pub fn new_with_ranges(ranges: impl IntoIterator<Item = Range<u64>>) -> Self {
    let (start, shape) = ranges
        .into_iter()
        .map(|range| (range.start, range.end.saturating_sub(range.start)))
        .unzip();
    Self { start, shape }
}

// pub fn new_with_start_shape(start: ArrayIndices, shape: ArrayShape) -> Result<Self, ...
pub fn new_with_start_shape(start_shape: impl IntoIterator<Item = (u64, u64)>) -> Self {
    let (start, shape) = start_shape.into_iter().unzip();
    Self { start, shape }
}

Now new_with_start_shape is infallible! So code like this

let ranges = subset
    .start()
    .iter()
    .zip(shape_out)
    .map(|(&st, sh)| st..(st + sh))
    .collect::<Vec<_>>();
let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);

can avoid the .collect(), or just be simplified to

use itertools::zip_eq; // NOTE: zip_eq is a sanity check, but not required if the length of subset/shape have been checked
let subset_contiguous_start = ArraySubset::new_with_start_shape(zip_eq(subset.start(), shape_out))

Would you like to look into that with this PR or a follow-up? Otherwise, I am also happy to tackle it.

LDeakin · 2025-04-02T20:12:04Z

zarrs/src/array.rs

+        chunk_subset
+            .bound(self.shape())
+            .map_err(std::convert::Into::into)


There is! Provided you have the From, just do Ok(chunk_subset.bound(self.shape())?)

LDeakin · 2025-04-02T20:38:53Z

zarrs/src/array/array_async_readable.rs

@@ -648,16 +648,15 @@ impl<TStorage: ?Sized + AsyncReadableStorageTraits + 'static> Array<TStorage> {
                                        chunk_subset.overlap(array_subset)?;

                                    let mut output_view = unsafe {
-                                        // SAFETY: chunks represent disjoint array subsets


This still needs a SAFETY doc

LDeakin · 2025-04-02T21:32:04Z

zarrs/src/array/array_async_readable_writable.rs

+                let overlap = array_subset.overlap(&chunk_subset).unwrap(); // FIXME: unwrap
                let chunk_subset_in_array_subset =
-                    unsafe { overlap.relative_to_unchecked(array_subset.start()) };
+                    overlap.relative_to(array_subset.start()).unwrap();
                let array_subset_in_chunk_subset =
-                    unsafe { overlap.relative_to_unchecked(chunk_subset.start()) };
+                    overlap.relative_to(chunk_subset.start()).unwrap();


Don't worry about it in this PR, but I am gradually trying to hoover up the unwrap() in zarrs and replace them with expect() that clarifies why they should not panic in the same way that SAFETY docs clarify why _unchecked is okay to use.

LDeakin · 2025-04-02T21:52:17Z

zarrs/src/array_subset/iterators/contiguous_indices_iterator.rs

+        let ranges = subset
+            .start()
+            .iter()
+            .zip(shape_out)
+            .map(|(&st, sh)| st..(st + sh))
+            .collect::<Vec<_>>();
+        let subset_contiguous_start = ArraySubset::new_with_ranges(&ranges);


See the main review comment

LDeakin · 2025-04-02T22:07:18Z

Oh and don't stress about codecov, but if you do know a way to ignore previously uncovered lines that would be handy

ilan-gold · 2025-04-03T11:11:09Z

I'll address the comments here first and then do a bit of pondering on the iterator thought. I have some thoughts about it now from doing the PR i.e., what can be considered "safe" and how to expose that. I'll comment once I do the clean-up here (not opposed to putting it in this PR, just want to gather my thoughts)

… into ig/arraysubset_api_audit

ilan-gold · 2025-04-03T12:21:59Z

Re: iterators, I am not sure how infallible they are on their own with zip - zip_eq is a different story although panicking isn't amazing. Also, at that point, I would just leave the API as-is in terms of what we accept and just panic if shapes don't match instead of erroring:

    pub fn new_with_start_shape(
        start: ArrayIndices,
        shape: ArrayShape,
    ) -> Self {
        if start.len() == shape.len() {
            Ok(Self { start, shape })
        } else {
            panic!("shapes don't match")
        }
    }

I am not a huge fan of impl IntoIterator<Item = (u64, u64)> representing a start + shape object, but maybe rust users are accustomed to this sort of thing. Could you elaborate on that?

My experience was that new_with_ranges helped to sidestep the various errors when they weren't necessary (good) but was not "infallible" (bad) in so far as zip is not infallible (and that is often how ranges are made). Fundamentally that is the question - how do we handle potentially mismatched-size inputs and at the same time inputs that we "know" are correct like when something is made from start and end from the same ArraySubset? Do we panic if they don't match? Error? Provide a back door that does neither (new_with_ranges in this case?)?

So I'm not opposed to making everything iterators but I am not sure I see it as more robust than what we have now. I think we can make clear to users maybe that new_with_ranges provides a "back-door" where there is no error-handling, but all other methods will check, which may be desireable. I think it's mainly a question of how "safe" we want to be.

LDeakin · 2025-04-03T21:24:33Z

I am not sure how infallible they are on their own with zip - zip_eq is a different story although panicking isn't amazing

zarrs always does dimensionality checks on ArraySubsets, so it wouldn't matter it the user supplied a mismatched start/shape to such a constructor, they would get an error when they try to use it with array/codec methods. Internally, the use of zip_eq could just be a sanity check in the same way that all the _unchecked methods had debug assertions.

I am not a huge fan of impl IntoIterator<Item = (u64, u64)> representing a start + shape object, but maybe rust users are accustomed to this sort of thing. Could you elaborate on that?

The existing new_with_start_shape should probably remain for API compatibility anyway, but an iterator-based constructor is very useful in cases where iterator/s are already producing a start/shape. Example:

zarrs/zarrs/src/array_subset/iterators/chunks_iterator.rs

Lines 149 to 154 in cd859f2

    
           let start = std::iter::zip(&chunk_indices, self.chunk_shape) 
        
               .map(|(i, c)| i * c) 
        
               .collect(); 
        
           let chunk_subset = unsafe { 
        
               ArraySubset::new_with_start_shape_unchecked(start, self.chunk_shape.to_vec()) 
        
           };

Can become

let chunk_subset = ArraySubset::new_with_start_shape_iter(
    zip(&chunk_indices, self.chunk_shape).map(|(i, c)| (i * c, *c)),
);

Also, at that point, I would just leave the API as-is in terms of what we accept and just panic if shapes don't match instead of erroring

This is a library, so panics are a last resort. I think the only methods that panic now are because an array / offset exceeds usize::MAX (which only really impacts 32-bit platforms) or there is an allocation failure. Internal .expects() / .unwraps() are intended not to panic unless there is a developer error.

But let's just not worry about the iterator stuff this PR, are you happy if I merge this as-is?

ilan-gold added 2 commits March 4, 2025 11:59

(fix): remove extract_elements_unchecked

a8e73af

(chore): remove new_with_start_shape_unchecked

378402e

ilan-gold added 9 commits March 30, 2025 16:31

Merge remote-tracking branch 'upstream/main' into ig/arraysubset_api_…

dc76e5d

…audit

(refactor): remove Chunks::chunk_indices_unchecked and `ArraySubset…

6611ac3

…::new_with_start_end_inc_unchecked`

(chore): remove new_with_start_end_exc_unchecked

6698ce2

(refactor): no more bound_unchecked

6f77615

(chore): remove overlap_unchecked

baf204b

(refactor): remove relative_to_unchecked

9010059

(chore): remove unused method

da827f9

(refactor): go back to new_with_start_end_exc

fb8287e

(refactor): async fixes + chunks iterator does not return results

3c7a0e4

ilan-gold added 10 commits April 1, 2025 15:53

(fix): use ranges constructor

71b2a2f

(fix): remove unused functions

563f1ee

(refactor): smaller ChunkGridTraits change

17f86c5

(refactor): bring chunk_index_to_subset back to original no-error b…

44888b9

…ehavior

(fix): lint + clippy hopefully

abe657e

(chore): fmt

13a2183

(chore): clippy

8661a7d

(chore): fmt 2

8608885

(fix): documented error

b48d78a

(refactor): more new_with_ranges

78c876f

ilan-gold commented Apr 2, 2025

View reviewed changes

ilan-gold marked this pull request as ready for review April 2, 2025 10:40

ilan-gold requested a review from LDeakin as a code owner April 2, 2025 10:41

Merge branch 'main' into ig/arraysubset_api_audit

6e128f7

LDeakin reviewed Apr 2, 2025

View reviewed changes

ilan-gold added 3 commits April 3, 2025 13:12

(fix): not Into map

7e4fa27

(fix): bring back safety docs

de999f7

Merge branch 'ig/arraysubset_api_audit' of github.com:ilan-gold/zarrs…

a697a3b

… into ig/arraysubset_api_audit

(chore): fix docs

a9a3329

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(refactor): remove `ArraySubset` unchecked methods #156

(refactor): remove `ArraySubset` unchecked methods #156

ilan-gold commented Mar 4, 2025 •

edited

Loading

LDeakin commented Mar 5, 2025 •

edited

Loading

codecov bot commented Apr 1, 2025 •

edited

Loading

ilan-gold Apr 2, 2025

LDeakin Apr 2, 2025

ilan-gold Apr 2, 2025

LDeakin Apr 2, 2025

ilan-gold Apr 2, 2025

LDeakin Apr 2, 2025

ilan-gold commented Apr 2, 2025 •

edited

Loading

LDeakin left a comment •

edited

Loading

LDeakin Apr 2, 2025

LDeakin Apr 2, 2025

LDeakin Apr 2, 2025

LDeakin Apr 2, 2025

LDeakin commented Apr 2, 2025

ilan-gold commented Apr 3, 2025 •

edited

Loading

ilan-gold commented Apr 3, 2025 •

edited

Loading

LDeakin commented Apr 3, 2025

(refactor): remove ArraySubset unchecked methods #156

Are you sure you want to change the base?

(refactor): remove ArraySubset unchecked methods #156

Conversation

ilan-gold commented Mar 4, 2025 • edited Loading

LDeakin commented Mar 5, 2025 • edited Loading

codecov bot commented Apr 1, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold commented Apr 2, 2025 • edited Loading

LDeakin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LDeakin commented Apr 2, 2025

ilan-gold commented Apr 3, 2025 • edited Loading

ilan-gold commented Apr 3, 2025 • edited Loading

LDeakin commented Apr 3, 2025

(refactor): remove `ArraySubset` unchecked methods #156

(refactor): remove `ArraySubset` unchecked methods #156

ilan-gold commented Mar 4, 2025 •

edited

Loading

LDeakin commented Mar 5, 2025 •

edited

Loading

codecov bot commented Apr 1, 2025 •

edited

Loading

ilan-gold commented Apr 2, 2025 •

edited

Loading

LDeakin left a comment •

edited

Loading

ilan-gold commented Apr 3, 2025 •

edited

Loading

ilan-gold commented Apr 3, 2025 •

edited

Loading