Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider clarifying use of unsafe in README example #433

Open
grothesque opened this issue Jul 4, 2024 · 4 comments
Open

Please consider clarifying use of unsafe in README example #433

grothesque opened this issue Jul 4, 2024 · 4 comments

Comments

@grothesque
Copy link

Thanks for providing this amazing library!

The code example given in this project's README file demonstrates the use of PyO3. I guess that it should also serve as a pedagogical model of how PyO3 should be used. After all, this is often going to be the first bit of PyO3-using code that prospective users are going to see.

Now that example features a short unsafe block without any explanation. But the usage of unsafe in Rust code means that the safety of the featured block has been carefully verified and the compiler should trust that.

In practice the usage of unsafe is quite often accompanied by a comment that explains why it is safe. This should certainly be the case in pedagogical code. I think that such a comment would be very helpful here.

Perhaps this particular use of unsafe is obvious to seasoned PyO3 users, but it certainly isn't to newbies for whom this code will often be the first contact with PyO3. Here are some question answers that may come up:

  • Why is this particular use of unsafe safe after all?
  • Are there any assumptions without which this would no longer be the case?
  • The first function of the module takes PyReadonlyArrayDyn<'py, f64> arguments. It seems that the second function could similarly take a PyReadwriteArrayDyn<'py, f64> argument (which provides a safe as_array_mut method), but it takes &Bound<'py, PyArrayDyn<f64>> instead. Is this indeed a choice and what is the reason behind it?

I'm not suggesting that a long discussion of this issue should be added to the README. Probably a short comment and a few pointers into the documentation would be enough.

@grothesque
Copy link
Author

grothesque commented Jul 4, 2024

I did some experimenting.

The function mult_py of the example can be indeed rewritten in a safe way as follows:

#[pyfn(m)]
fn mult_safe<'py>(a: f64, mut x: PyReadwriteArrayDyn<'py, f64>) {
    let x = x.as_array_mut();
    mult(a, x);
}

But this seems to come at a slight cost in terms of performance (release mode):

% python3 -m timeit -s 'import rust_python as rp; import numpy as np; a = np.identity(3)' 'rp.mult(3, a)'
1000000 loops, best of 5: 289 nsec per loop
% python3 -m timeit -s 'import rust_python as rp; import numpy as np; a = np.identity(3)' 'rp.mult_safe(3, a)'
500000 loops, best of 5: 406 nsec per loop

Is this the reason why the variant using unsafe is shown in the README example? If yes, then this deserves a comment IMHO. Otherwise, shouldn't the other variant be shown?

I wonder about the reason for the performance difference. After all the checking that prevents mutable aliasing with PyReadwriteArrayDyn is done at compile time. The following does not compile:

#[pyfn(m)]
fn mult_bad<'py>(a: f64, mut x: PyReadwriteArrayDyn<'py, f64>) {
    let y = x.as_array_mut();
    let x = x.as_array_mut();
    mult(a, x);
    mult(a, y);
}

While the following does:

#[pyfn(m)]
fn mult_dangerous<'py>(a: f64, x: &Bound<'py, PyArrayDyn<f64>>) {
    let y = unsafe { x.as_array_mut() };
    let x = unsafe { x.as_array_mut() };
    mult(a, x);
    mult(a, y);
}

@grothesque
Copy link
Author

I think I finally understand now, but this took me many hours of studying PyO3/pyo3#2885 and the like. Would a patch that introduces some hints into the documentation (notably the README file) be welcome?

Here is my current understanding (I am grateful for any corrections):
PyReadwriteArrayDyn (and other similar types in numpy::borrow) is not just a zero-runtime-cost wrapper that allows for compile-time borrow checking. In addition it uses an internal "database" (implemented as a hashtable from NumPy array data pointers to borrow flags) ` to do dynamic borrow checking. This functionality is even available via a C-API in the hope that it will be used outside of rust-numpy.

So, if the above mult_safe is called from two different Python threads on the same array, the dynamic borrow checking will work and enforce Rust's discipline. This is even true if two separate CPython extensions (both implemented in Rust and using rust-numpy) try to simultaneously access NumPy arrays that share memory.

But, and this is a huge but, this borrow checking API (to my knowledge) is not used by other extensions, most notably it's not used by NumPy itself. So it is possible to simultaneously mutate a NumPy array using just rust-numpy and pure Python/Numpy. (Just like it's possible to simultaneously mutate a NumPy array using exclusively pure Python without rust-numpy.)

Rust-numpy takes the view that this is the most that it can do:

In summary, this crate takes the position that all unchecked code - unsafe Rust, Python, C, Fortran, etc. - must be checked for correctness by its author. Safe Rust code can then rely on this correctness, but should not be able to introduce memory safety issues on its own.

I think that's OK if we can be sure that this will never trigger undefined behavior. Are we?

@grothesque
Copy link
Author

grothesque commented Jul 9, 2024

In summary, unless my understanding is flawed, the following changes seem appropriate:

(1)
In the README example, no longer opt-out unsafely and without comment from the dynamic borrow checking that this crate itself provides. I.e. replace

let x = unsafe { x.as_array_mut() };

by

let mut x = x.readwrite();
let x = x.as_array_mut();

(or by a function that takes mut x: PyReadwriteArrayDyn<'py, f64> instead of x: &Bound<'py, PyArrayDyn<f64>>.

(2)
In numpy::array::PyArray::as_array (and other similar methods there) explain clearly what needs to be respected to be able to use these methods safely: namely that some form of locking must be used to exclude illegal borrows. (Are there any cases where such locking might not be necessary?)

@adamreichold
Copy link
Member

After all the checking that prevents mutable aliasing with PyReadwriteArrayDyn is done at compile time. The following does not compile:

Note that the PyReadwriteArrayDyn "is the borrow", i.e. reifies the lifecycle of dynamically acquiring and releasing the internal borrow key. as_array_mut only applies the existing borrow to produce a mutable array view ArrayViewMut and hence acquires a mutable reference to the PyReadwriteArrayDyn so that one unique borrow cannot result in multiple mutable array views.

It is however possible to produce multiple PyReadwriteArrayDyn from a given Bound<PyArray> by calling PyArrayMethods::readwrite multiple times. This will compile but fail at runtime.

Would a patch that introduces some hints into the documentation (notably the README file) be welcome?

The current situation w.r.t. the soundness of rust-numpy's safe API is certainly precarious and to be honest, I am somewhat surprised about the lack of backlash and disagreement when I made the choice to introduce dynamic borrow checking and consider Python code trusted. Meaning that I would definitely welcome improvements and clarifications.

I am also somewhat grateful that you did all the work of following up on your questions and reading the existing documentation and code in detail. I think it would be great if this would result in improving the situation for anybody having the same questions as you had.

I think that's OK if we can be sure that this will never trigger undefined behavior. Are we?

Well, I am not aware of any way to break aliasing discipline into NumPy arrays using only safe Rust code. But there is no formal proof for that (or even a formal description of the problem). Currently, this fully depends on the construction of the borrow module.

And one can certainly trigger data races using e.g. Python code and safe Rust code, but as you said yourself, this is no worse than pure Python code which can also trigger data races and thereby undefined behaviour. This is why rust-numpy considers Python code exercising the NumPy API unsafe/trusted.

(1) In the README example, no longer opt-out unsafely and without comment from the dynamic borrow checking that this crate itself provides.

I explicitly decided against making all of the example use the safe API to show case that the escape hatch exists, but for performance reasons and because the dynamic borrow checking is not exact (just like the Rust compiler's borrow checking which is also an approximation).

But this would certainly justify a safety comment. I would be grateful if you could add one referencing aliasing discipline for the array interior and note on the escape hatch character of the API w.r.t. runtime overhead and interleaved array views.

(2) In numpy::array::PyArray::as_array (and other similar methods there) explain clearly what needs to be respected to be able to use these methods safely: namely that some form of locking must be used to exclude illegal borrows. (Are there any cases where such locking might not be necessary?)

Actually, in common situations, the mutual exclusion is a property of the combined Python and Rust code and human review is sufficient to ensure aliasing discipline. For example, functions taking a number of read-only array views and producing a new owned array as output that are called from a single Python thread are very common in scientific code. This pattern has long been used to write extensions before dynamic borrow checking was added and it working most of the time was the reason why dynamic borrow checking is really only checking and for example not locking.

I also think the safety doc comments on these methods are correct, but of course, they are extremely terse and I would certainly be glad if they were extended. Especially towards discussing that aliasing discipline has to be respected across the Rust-Python language boundary as the current wording could easily be read as being limited to the Rust side of things.

Also note that breaking aliasing discipline does not require threads, i.e. having a function

fn add_in_place(x: PyReadwriteArrayDyn<f64>, y: PyReadonlyArray<f64>) {
  x.as_array_mut() += y.as_array();
}

must also fail when called as add_in_place(x, x) from Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants