diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 269770261b8..ec0b0b361c3 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -673,6 +673,7 @@ peps/pep-0791.rst @vstinner peps/pep-0792.rst @dstufft peps/pep-0793.rst @encukou peps/pep-0794.rst @brettcannon +peps/pep-0797.rst @ZeroIntensity peps/pep-0798.rst @JelleZijlstra peps/pep-0799.rst @pablogsal peps/pep-0800.rst @JelleZijlstra diff --git a/peps/pep-0797.rst b/peps/pep-0797.rst new file mode 100644 index 00000000000..d420b2506c5 --- /dev/null +++ b/peps/pep-0797.rst @@ -0,0 +1,368 @@ +PEP: 797 +Title: Shared Object Proxies +Author: Peter Bierma +Discussions-To: Pending +Status: Draft +Type: Standards Track +Created: 08-Aug-2025 +Python-Version: 3.15 +Post-History: `01-Jul-2025 `__ + + +Abstract +======== + +This PEP introduces a new :func:`~concurrent.interpreters.share` function to +the :mod:`concurrent.interpreters` module, which allows any arbitrary object +to be shared across interpreters using an object proxy, at the cost of being +less efficient under multithreaded code. + +For example: + +.. code-block:: python + + from concurrent import interpreters + + with open("spanish_inquisition.txt") as unshareable: + interp = interpreters.create() + proxy = interpreters.share(unshareable) + interp.prepare_main(file=proxy) + interp.exec("file.write('I didn't expect the Spanish Inquisition')") + +Motivation +========== + +Many Objects Cannot be Shared Between Subinterpreters +----------------------------------------------------- + +In Python 3.14, the new :mod:`concurrent.interpreters` module can be used to +create multiple interpreters in a single Python process. This works well for +code without shared state, but since one of the primary applications of +subinterpreters is to bypass the :term:`global interpreter lock`, it is +fairly common for programs to require highly-complex data structures that are +not easily shareable. In turn, this damages the practicality of +subinterpreters for concurrency. + +As of writing, subinterpreters can only share :ref:`a handful of types +` natively, relying on the :mod:`pickle` module +for other types. This can be very limited, as many types of objects cannot be +serialized with ``pickle`` (such as file objects returned by :func:`open`). +Additionally, serialization can be a very expensive operation, which is not +ideal for multithreaded applications. + +Rationale +========= + +A Fallback for Object Sharing +----------------------------- + +A shared object proxy is designed to be a fallback for sharing an object +between interpreters. A shared object proxy should only be used as +a last-resort for highly complex objects that cannot be serialized or shared +in any other way. + +This means that even if this PEP is accepted, there is still benefit in +implementing other methods to share objects between interpreters. + + +Specification +============= + +.. class:: concurrent.interpreters.SharedObjectProxy + + A proxy type that allows access to an object across multiple interpreters. + This cannot be constructed from Python; instead, use the + :func:`~concurrent.interpreters.share` function. + + +.. function:: concurrent.interpreters.share(obj) + + Wrap *obj* in a :class:`~concurrent.interpreters.SharedObjectProxy`, + allowing it to be used in other interpreter APIs as if it were natively + shareable. + + If *obj* is natively shareable, this function does not create a proxy and + simply returns *obj*. + + +Interpreter Switching +--------------------- + +When interacting with the wrapped object, the proxy will switch to the +interpreter in which the object was created. This must happen for any access +to the object, such accessing attributes or making modifications to the object's +:term:`reference count`. To visualize, ``foo`` in the following code is only +ever called in the main interpreter, despite being accessed in subinterpreters +through a proxy: + +.. code-block:: python + + from concurrent import interpreters + + def foo(): + assert interpreters.get_current() == interpreters.get_main() + + interp = interpreters.create() + proxy = interpreters.share(foo) + interp.prepare_main(foo=proxy) + interp.exec("foo()") + + +Multithreaded Scaling +--------------------- + +To switch to a wrapped object's interpreter, an object proxy must swap the +:term:`attached thread state` of the current thread, which will in turn wait +on the :term:`GIL` of the target interpreter, if it is enabled. This means that +a shared object proxy will experience contention when accessed concurrently, +but are still useful for multicore threading, since other threads in the +interpreter are free to execute while waiting on the GIL of the target +interpreter. + +As an example, imagine that multiple interpreters want to write a log through +a proxy for the main interpreter, but don't want to constantly wait on the log. +By accessing the proxy in a separate thread for each interpreter, the thread +performing the computation can still execute while accessing the proxy. + +.. code-block:: python + + from concurrent import interpreters + + def write_log(message): + print(message) + + def execute(n, write_log): + from threading import Thread + from queue import Queue + + log = Queue() + + # By performing this in a separate thread, 'execute' can still run + # while the log is being accessed by the main interpreter. + def log_queue_loop(): + while True: + write_log(log.get()) + + thread = Thread(target=log_queue_loop) + thread.start() + + for i in range(100000): + n ** i + log.put(f"Completed an iteration: {i}") + + thread.join() + + proxy = interpreters.share(write_log) + for n in range(4): + interp = interpreters.create() + interp.call_in_thread(execute, n, proxy) + + +Proxy Copying +------------- + +Contrary to what one might think, a shared object proxy itself can only be used +in one interpreter, because the proxy's reference count is not thread-safe +(and thus cannot be accessed from multiple interpreters). Instead, when crossing +an interpreter boundary, a new proxy is created for the target interpreter that +wraps the same object as the original proxy. + +For example, in the following code, there are two proxies created, not just one. + +.. code-block:: python + + from concurrent import interpreters + + interp = interpreters.create() + foo = object() + proxy = interpreters.share(foo) + + # The proxy crosses an interpreter boundary here. 'proxy' is *not* directly + # send to 'interp'. Instead, a new proxy is created for 'interp', and the + # reference to 'foo' is merely copied. Thus, both interpreters have their + # own proxy that are wrapping the same object. + interp.prepare_main(proxy=proxy) + + +Thread-local State +------------------ + +Accessing an object proxy will retain information stored on the current +:term:`thread state`, such as thread-local variables stored by +:class:`threading.local` and context variables stored by :mod:`contextvars`. +This allows the following case to work correctly: + +.. code-block:: python + + from concurrent import interpreters + from threading import local + + thread_local = local() + thread_local.value = 1 + + def foo(): + assert thread_local.value == 1 + + interp = interpreters.create() + proxy = interpreters.share(foo) + interp.prepare_main(foo=proxy) + interp.exec("foo()") + +In order to retain thread-local data when accessing an object proxy, each +thread will have to keep track of the last used thread state for +each interpreter. In C, this behavior looks like this: + +.. code-block:: c + + // Error checking has been omitted for brevity + PyThreadState *tstate = PyThreadState_New(interp); + + // By swapping the current thread state to 'interp', 'tstate' will be + // associated with 'interp' for the current thread. That means that accessing + // a shared object proxy will use 'tstate' instead of creating its own + // thread state. + PyThreadState *save = PyThreadState_Swap(tstate); + + // 'save' is now the most recently used thread state, so shared object + // proxies in this thread will use it instead of 'tstate' when accessing + // 'interp'. + PyThreadState_Swap(save); + +In the event that no thread state exists for an interpreter in a given thread, +a shared object proxy will create its own thread state that will be owned by +the interpreter (meaning it will not be destroyed until interpreter +finalization), which will persist across all shared object proxy accesses in +the thread. In other words, a shared object proxy ensures that thread local +variables and similar state will not disappear. + + +Memory Management +----------------- + +All proxy objects hold a :term:`strong reference` to the object that they +wrap. As such, destruction of a shared object proxy may trigger destruction +of the wrapped object if the proxy holds the last reference to it, even if +the proxy belongs to a different interpreter. For example: + +.. code-block:: python + + from concurrent import interpreters + + interp = interpreters.create() + foo = object() + proxy = interpreters.share(foo) + interp.prepare_main(proxy=proxy) + del proxy, foo + + # 'foo' is still alive at this point, because the proxy in 'interp' still + # holds a reference to it. Destruction of 'interp' will then trigger the + # destruction of 'proxy', and subsequently the destruction of 'foo'. + interp.close() + + +Shared object proxies support the garbage collector protocol, but will only +traverse the object that they wrap if the garbage collection is occurring +in the wrapped object's interpreter. To visualize: + +.. code-block:: python + + from concurrent import interpreters + import gc + + proxy = interpreters.share(object()) + + # This prints out [], because the object is owned + # by this interpreter. + print(gc.get_referents(proxy)) + + interp = interpreters.create() + interp.prepare_main(proxy=proxy) + + # This prints out [], because the wrapepd object must be invisible to this + # interpreter. + interp.exec("import gc; print(gc.get_referents(proxy))") + + +Interpreter Lifetimes +********************* + +When an interpreter is destroyed, proxies wrapping objects from that +interpreter may still exist elsewhere. To prevent this from causing crashes, +an interpreter will invalidate all proxies pointing its objects by overwriting +their wrapped object with ``None``. + +To demonstrate, the following snippet first prints out ``Alive``, and then +``None`` after deleting the interpreter: + +.. code-block:: python + + from concurrent import interpreters + + def test(): + from concurrent import interpreters + + class Test: + def __str__(self): + return "Alive" + + return interpreters.share(Test()) + + interp = interpreters.create() + wrapped = interp.call(test) + print(wrapped) # Alive + interp.close() + print(wrapped) # None + +Note that the proxy is not physically replaced (``wrapped`` in the above example +is still a ``SharedObjectProxy`` instance), but instead has its wrapped object +replaced to ``None``. + + +Backwards Compatibility +======================= + +This PEP has no known backwards compatibility issues. + +Security Implications +===================== + +This PEP has no known backwards security implications. + +How to Teach This +================= + +New APIs and important information about how to use them will be added to the +:mod:`concurrent.interpreters` documentation. + +Reference Implementation +======================== + +The reference implementation of this PEP can be found +`here `_. + +Rejected Ideas +============== + +Directly Sharing Proxy Objects +------------------------------ + +The initial revision of this proposal took an approach where an instance of +:class:`~concurrent.interpreters.SharedObjectProxy` was :term:`immortal`. This +allowed proxy objects to be directly shared across interpreters, because their +reference count was thread-safe (since it never changed due to immortality). + +This proved to make the implementation significantly more complicated, and +also ended up with a lot of edge cases that would have been a burden on +CPython maintainers. + +Acknowledgements +================ + +This PEP would not have been possible without discussion and feedback from +Eric Snow, Petr Viktorin, Kirill Podoprigora, Adam Turner, and Yury Selivanov. + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.