Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: a long-standing bug in the handling of Python multiple inheritance #4762

Merged
merged 35 commits into from
Nov 8, 2023

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Jul 25, 2023

Description

The equivalent of this PR was merged as google/pybind11clif#30056 on September 12, 2023.

COPY of the google/pybind11clif#30056 PR description (for easy reference):

The core of the added test is this:

struct CppBase {};
struct CppDrvd : CppBase {};
class PC(m.CppBase):
    pass

class PPCC(PC, m.CppDrvd):
    pass

Without this PR, the runtime behavior depends on the order in which PC or PPCC are first instantiated, which can be highly surprising and potentially very difficult to debug. See #4762 (comment) for a full analysis.

However, use cases appear to be surprisingly rare (#4762 (comment)). The only known use cases are through PyCLIF.

More-or-less as a side-effect, this PR also:

  • resolves a secondary bug that forces overriding __init__ unnecessarilty.
  • fixes an improper reinterpret_cast here:
    auto *instance = reinterpret_cast<detail::instance *>(self);

Original PR description:

The core of the test is this:

struct CppBase {};
struct CppDrvd : CppBase {};
class PC(m.CppBase):
    pass

class PPCC(PC, m.CppDrvd):
    pass

pybind11 forces adding an __init__ override (which happens to not disturb PyCLIF or Boost.Python).

A wrapped PPPC instance evidently has two independent C++ objects (CppBase and CppDerived) with pybind11, while it has only one with PyCLIF and Boost.Python (CppDerived).

The nanobind metaclass refuses to create the PPCC subtype.

Suggested changelog entry:

A long-standing bug in the handling of Python multiple inheritance was fixed. See PR #4762 for the rather complex details.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 26, 2023

Logging a very surprising observation:

  • The new test has the same behavior as PyCLIF and Boost.Python if run with python3 -m pytest -n 2 test_python_multiple_inheritance.py.
-    assert d.get_base_value() == 11
+    assert d.get_base_value() == 12
     assert d.get_base_value_from_drvd() == 12
     d.reset_base_value(20)
     assert d.get_base_value() == 20
-    assert d.get_base_value_from_drvd() == 12
+    assert d.get_base_value_from_drvd() == 20
     d.reset_base_value_from_drvd(30)
-    assert d.get_base_value() == 20
+    assert d.get_base_value() == 30

pytest -n 2:

( cd /usr/local/google/home/rwgk/forked/pybind11/tests && PYTHONPATH=/usr/local/google/home/rwgk/forked/build_gcc/lib /usr/bin/python3 -m pytest -n 2 test_python_multiple_inheritance.py )

=========================================================== test session starts ============================================================
platform linux -- Python 3.11.4, pytest-7.2.1, pluggy-1.0.0+repack
C++ Info: 12.2.0 C++17 __pybind11_internals_v4_gcc_libstdcpp_cxxabi1017__ PYBIND11_SIMPLE_GIL_MANAGEMENT=False
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
plugins: xdist-3.3.1
2 workers [2 items]
..                                                                                                                                   [100%]
============================================================ 2 passed in 0.32s =============================================================

pytest -n 1:

( cd /usr/local/google/home/rwgk/forked/pybind11/tests && PYTHONPATH=/usr/local/google/home/rwgk/forked/build_gcc/lib /usr/bin/python3 -m pytest -n 1 test_python_multiple_inheritance.py )

=========================================================== test session starts ============================================================
platform linux -- Python 3.11.4, pytest-7.2.1, pluggy-1.0.0+repack
C++ Info: 12.2.0 C++17 __pybind11_internals_v4_gcc_libstdcpp_cxxabi1017__ PYBIND11_SIMPLE_GIL_MANAGEMENT=False
rootdir: /usr/local/google/home/rwgk/forked/pybind11/tests, configfile: pytest.ini
plugins: xdist-3.3.1
1 worker [2 items]
.F                                                                                                                                   [100%]
================================================================= FAILURES =================================================================
______________________________________________________________ test_PPCCInit _______________________________________________________________
[gw0] linux -- Python 3.11.4 /usr/bin/python3

    def test_PPCCInit():
        d = PPCCInit(11)
        assert d.get_drvd_value() == 36
        d.reset_drvd_value(55)
        assert d.get_drvd_value() == 55

        # CppBase is initialized and used when CppBase methods are called, but
        # CppDrvd is used when CppDrvd methods are called.
>       assert d.get_base_value() == 12
E       assert 11 == 12
E        +  where 11 = <bound method PyCapsule.get_base_value of <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>>()
E        +    where <bound method PyCapsule.get_base_value of <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>> = <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>.get_base_value

d          = <test_python_multiple_inheritance.PPCCInit object at 0x7f3cfad79490>

test_python_multiple_inheritance.py:32: AssertionError
========================================================= short test summary info ==========================================================
FAILED test_python_multiple_inheritance.py::test_PPCCInit - assert 11 == 12
======================================================= 1 failed, 1 passed in 0.34s ========================================================

ERROR: completed_process.returncode=1

Ralf W. Grosse-Kunstleve added 7 commits July 26, 2023 11:51
```
The CXX compiler identification is Clang 17.0.0
```

```
pytypes.h:1615:23: error: identifier '_s' preceded by whitespace in a literal operator declaration is deprecated [-Werror,-Wdeprecated-literal-operator]
```

```
cast.h:1380:26: error: identifier '_a' preceded by whitespace in a literal operator declaration is deprecated [-Werror,-Wdeprecated-literal-operator]
```
```
pytypes.h:1615:12: error: missing space between '""' and suffix identifier
```
@rwgk
Copy link
Collaborator Author

rwgk commented Jul 27, 2023

Follow-on to the previous comment:

The test also passes with pytest -n 1 (or just pytest) if I move def test_PC(): after def test_PPCCInit():.

The source of the order dependence is the algorithm here:

  • PYBIND11_NOINLINE void all_type_info_populate(PyTypeObject *t, std::vector<type_info *> &bases) {
    std::vector<PyTypeObject *> check;
    for (handle parent : reinterpret_borrow<tuple>(t->tp_bases)) {
    check.push_back((PyTypeObject *) parent.ptr());
    }
    auto const &type_dict = get_internals().registered_types_py;
    for (size_t i = 0; i < check.size(); i++) {
    auto *type = check[i];
    // Ignore Python2 old-style class super type:
    if (!PyType_Check((PyObject *) type)) {
    continue;
    }
    // Check `type` in the current set of registered python types:
    auto it = type_dict.find(type);
    if (it != type_dict.end()) {
    // We found a cache entry for it, so it's either pybind-registered or has pre-computed
    // pybind bases, but we have to make sure we haven't already seen the type(s) before:
    // we want to follow Python/virtual C++ rules that there should only be one instance of
    // a common base.
    for (auto *tinfo : it->second) {
    // NB: Could use a second set here, rather than doing a linear search, but since
    // having a large number of immediate pybind11-registered types seems fairly
    // unlikely, that probably isn't worthwhile.
    bool found = false;
    for (auto *known : bases) {
    if (known == tinfo) {
    found = true;
    break;
    }
    }
    if (!found) {
    bases.push_back(tinfo);
    }
    }
    } else if (type->tp_bases) {
    // It's some python type, so keep follow its bases classes to look for one or more
    // registered types
    if (i + 1 == check.size()) {
    // When we're at the end, we can pop off the current element to avoid growing
    // `check` when adding just one base (which is typical--i.e. when there is no
    // multiple inheritance)
    check.pop_back();
    i--;
    }
    for (handle parent : reinterpret_borrow<tuple>(type->tp_bases)) {
    check.push_back((PyTypeObject *) parent.ptr());
    }
    }
    }
    }

With Python code in this order

    PC(0)
    PPCC(0)

the result of the algorithm is:

registered_types_py = {PC: {CppBase}, PPCC: {CppBase, CppDrvd}}

With Python code in this order

    PPCC(0)
    PC(0)

the result of the algorithm is:

registered_types_py = {PPCC: {CppDrvd, CppBase}, PC: {CppBase}}

For completeness, code to reproduce: d37b540

Note that the type of registered_types_py is std::unordered_map<PyTypeObject *, std::vector<type_info *>>:

I.e. the order of the keys is undetermined, which is totally fine, it doesn't matter.

However, the order of the bases for a given key is completely deterministic in both cases above.

This is a subtle but very serious flaw:

Assume some complex user code currently works as expected. It happens to call PPCC(i) first, PC(j) second. Then one day someone makes a harmless looking addition somewhere so that the order becomes PC(k), PPCC(i), PC(j). All the sudden the existing code behaves differently. This will be extremely difficult to troubleshoot.

rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Jul 28, 2023
Ralf W. Grosse-Kunstleve added 4 commits July 30, 2023 23:41
… *>()` introduced with PR pybind#2152

The `reinterpret_cast<instance *>(self)` is unsafe if `__new__` is mocked,
which was actually found in the wild: the mock returned `None` for `self`.
This was inconsequential because `inst` is currently cast straight back to
`PyObject *` to compute `all_type_info()`, which is empty if `self` is not
a pybind11 `instance`, and then `inst` is never dereferenced. However, the
unsafe detour through `instance *` is easily avoided and the updated
implementation is less prone to accidents while debugging or refactoring.
It turns out the previous commit message is incorrect, the `inst` pointer is actually dereferenced, in the `value_and_holder` ctor here:

https://github.com/pybind/pybind11/blob/f3e0602802c7840992c97f4960515777cad6a5c7/include/pybind11/detail/type_caster_base.h#L262-L263

```
259     // Main constructor for a found value/holder:
260     value_and_holder(instance *i, const detail::type_info *type, size_t vpos, size_t index)
261         : inst{i}, index{index}, type{type},
262           vh{inst->simple_layout ? inst->simple_value_holder
263                                  : &inst->nonsimple.values_and_holders[vpos]} {}
```
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 2, 2023
rwgk pushed a commit that referenced this pull request Aug 3, 2023
* Copy clang 17 compatibility fixes from PR #4762 to a separate PR.

* Add gcc:13 C++20

* Add silkeh/clang:16-bullseye C++20
@@ -203,6 +205,18 @@ def __init__(self):
assert msg(exc_info.value) == expected


@pytest.mark.parametrize(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new feature enabled by this PR, or a bug fixed along with the PR?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bug fixed along with this PR.

.def("get_base_value", &CppBase::get_base_value)
.def("reset_base_value", &CppBase::reset_base_value);

py::class_<CppDrvd, CppBase>(m, "CppDrvd")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, do we require users to list all of the base classes in the template arg list? I thought the usual way is

py::class_, but maybe just doing that Python won't be aware the Py object should have a base class of Py.CppBase?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the convention is to list all of the base classes if we want the base class methods to surface to the Pybind class.

@rainwoodman
Copy link

Interesting diamand case.

struct CppBase {int base;};
struct CppDrvd1 : CppBase {};
struct CppDrvd2 : CppBase {};

class PCB(m.CppBase):
    pass

class PC1(m.CppDrvd1):
    pass
class PC2(m.CppDrvd2):
    pass

class PCD(PC1, PC2):   # consider banning this.
   pass

PC2().set_base

py::class<CppBase>().method('set_base')

py::class<CppDrvd1, CppBase>().method('get_base')

py::class<CppDrvd2, CppBase>().method('get_base')

@rwgk rwgk force-pushed the python_multiple_inheritance_test branch from 90e8914 to 86ca4e1 Compare November 5, 2023 09:48
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 6, 2023

@rainwoodman FYI

After our meeting reviewing this PR I decided it's worth the effort catching all ill-behaved situations (not just fix the particular situation I stumbled over).

0a9599f captures some of them, but not all.

To get all, we need to bring in the C++ bases as well. I'll try to work on that asap.

Intermediate results: I ran Google global testing with 0a9599f:

  1. just changing pybind11: the "diverging" error message did not appear anywhere.

  2. PyCLIF-pybind11: again, he "diverging" error message did not appear anywhere.

So plugging this hole could turn out to be relatively high effort for very little practical gain, but we can only be sure after putting in the effort. :-/


For my own reference:

Global testing just changing pybind11: OCL:579584090:BASE:579672844:1699221331111:c625089a

Global testing changing PyCLIF-pybind11: OCL:579673181:BASE:579726990:1699249978254:6ab60a42

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 7, 2023

Regarding 5f5fd6a:

This bug goes pretty deep: registered_types_py does not enumerate all C++ bases for pybind11-wrapped types. Therefore I had to go into type_caster_generic::load_impl<> to detect divergent bases.

The current state of this PR is definitely not meant to be for merging. It's only meant to enable global testing, to see if there are hidden problems in the wild. For that I have to move this work to the pywrapcc branch/repo, because I have to apply a similar patch in smart_holder_type_casters.h.

```
include/pybind11/detail/type_caster_base.h:795:21: error: the 'empty' method should be used to check for emptiness instead of 'size' [readability-container-size-empty,-warnings-as-errors]
                if (matching_bases.size() != 0) {
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~
                    !matching_bases.empty()
```
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 7, 2023

I have to apply a similar patch in smart_holder_type_casters.h.

That's under google/pybind11clif#30077 now.

Unit testing with all sanitizers in the Google toolchain passes.

Global testing initiated. Results will probably only be available very late today.

Note: the divergence check in type_caster_generic::load_impl<> only catches problems when a Python object is passed for a C++ base type in a function call. It would be much better to catch all divergence issues in all_type_info_populate(), but currently I don't see a way to achieve that easily.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 8, 2023

Global testing initiated. Results will probably only be available very late today.

No failures.

It would be much better to catch all divergence issues in all_type_info_populate(), but currently I don't see a way to achieve that easily.

That's the only situation we're still not sure about.

@rwgk
Copy link
Collaborator Author

rwgk commented Nov 8, 2023

Conclusion from a text chat with @rainwoodman:

I will revert the last 3 commits here (not counting merge commits), to get back to the state reviewed in a meeting with @rainwoodman last week (Oct 31), then merge.

But also: Create a new PR bringing back the 3 commits (0a9599f, 5f5fd6a, df27188), and do follow-on work separately, later.

Rationale: I think it's fair to defer work on catching "diverging" (best term I could think of) inheritance situations because I could not find any problems in the wild (Google global testing). The new PR will serve as documentation and reminder.

Copy link

@rainwoodman rainwoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rainwoodman
Copy link

I agree with your assessment: we can revisit the divergence case when there is evidence that someone actually wants to use it.

@rwgk rwgk merged commit e250155 into pybind:master Nov 8, 2023
84 checks passed
@rwgk rwgk deleted the python_multiple_inheritance_test branch November 8, 2023 20:44
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Nov 8, 2023
@rwgk
Copy link
Collaborator Author

rwgk commented Nov 8, 2023

The follow-on PR is #4928.

@henryiii henryiii changed the title Fix a long-standing bug in the handling of Python multiple inheritance fix: a long-standing bug in the handling of Python multiple inheritance Nov 15, 2023
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants