-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: cleanup internal nparray_to_*
methods
#80
Conversation
py_untyped_array_to_array_object
nparray_to_*
methods
On second though, I am pretty sure it is sound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but oof unsafe is scary. That lifetime bug you fixed could have easily caused us to crash and burn.
we really should replace as many unsafe APIs numpy
and zarrs
with safe alternatives
@@ -155,16 +155,24 @@ impl CodecPipelineImpl { | |||
} | |||
|
|||
fn py_untyped_array_to_array_object<'a>( | |||
value: &Bound<'a, PyUntypedArray>, | |||
value: &'a Bound<'_, PyUntypedArray>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof good catch. this is why unsafe is scary, this was extremely wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, is there a tl;dr for why this was wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I’ll rename the lifetimes for easier understanding.
First, Bound
is a GIL binding, that means it’s a kind of reference1 that lives as long as a certain part of our code holds Python’s GIL (Global Interpreter Lock). This guarantees that nothing in Python land tries to mutate that object while we’re accessing it at the same time. So:
&'x Bound<'py, PyUntypedArray>
means “a reference with lifetime'x
to a GIL binding with lifetime'py
to a PyArrayObject”. So the reference needs to live (most likely) shorter than the GIL binding2 or maximally equally long.&'y PyArrayObject
means “a reference with lifetime'y
to a PyArrayObject”.
The previous version returned a reference &'y PyArrayObject
with 'y: 'py
, which means that the returned reference is valid as long as the GIL binding is held. In reality the returned reference is derived from the input reference &'x
. Remember that this &'x
reference is probably shorter-lived than the GIL binding. So before this fix, nothing stopped us from dropping the &'x
reference (because as far as rustc is concerned, nothing derives from it), creating a new, mutable reference to the same object, and mutating the object through that, while other parts of our code are still allowed to read that object through the &'y
reference. Something like:
let mut value: Bound<'py, PyUntypedArray> = ...;
let array: &'py PyUntypedArray = {
let readonly_ref = &value;
py_untyped_array_to_array_object(readonly_ref)
}; // readonly_ref is dropped here
let mut_ref = value.borrow_mut();
thread::spawn(move || {
write_into(mut_ref); // race condition
}
thread::spawn(move || {
println!("{?:array}"); // race condition
}
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to say thanks for the detailed explanation! It still makes my head spin, which is a good reason to avoid unsafe in my own code 😅
I approved, but maybe #80 (comment) should be addressed. If the numpy crate grows a safe API we have to think about this anyway |
Addresses #78. I don't think there is/was any unsoundness, but doesn't hurt to get more eyes on it.
nparray_to_slice
andnparray_to_unsafe_cell_slice
now validate internally.