You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am getting the following crash in very specific conditions, while the Python GC is running.
I can reliably reproduce it in a pytest test suite of an (unfortunately) non-trivial code base.
Each test of the suite starts a producer, writes a few messages, then starts a multiprocessing.Manager,
then a consumer and consumes topics to the end.
The third case to run segfaults when I manually trigger the GC in a specific place (or the fourth,
when the GC eventually runs on its own).
==3552160== Invalid read of size 4
==3552160== at 0x4A28B17: __pthread_clockjoin_ex (pthread_join_common.c:43)
==3552160== by 0x4A2F96F: thrd_join@@GLIBC_2.34 (thrd_join.c:26)
==3552160== by 0x774B3DB: rd_kafka_destroy_app (in /home/dev/.local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1)
==3552160== by 0x75A86C3: Producer_dealloc (Producer.c:103)
==3552160== by 0x5CCB14: UnknownInlinedFun (object.c:2390)
==3552160== by 0x5CCB14: UnknownInlinedFun (object.h:538)
==3552160== by 0x5CCB14: UnknownInlinedFun (dictobject.c:5566)
==3552160== by 0x5CCB14: subtype_clear.lto_priv.0 (typeobject.c:1288)
==3552160== by 0x50ABCD: UnknownInlinedFun (gcmodule.c:1013)
==3552160== by 0x50ABCD: gc_collect_main.lto_priv.0 (gcmodule.c:1287)
==3552160== by 0x6303E6: gc_collect_with_callback.lto_priv.0 (gcmodule.c:1400)
==3552160== by 0x6756CF: UnknownInlinedFun (gcmodule.c:1538)
==3552160== by 0x6756CF: gc_collect.lto_priv.0 (gcmodule.c.h:110)
==3552160== by 0x53ACF6: cfunction_vectorcall_FASTCALL_KEYWORDS.lto_priv.0 (methodobject.c:443)
==3552160== by 0x53AC2B: UnknownInlinedFun (pycore_call.h:92)
==3552160== by 0x53AC2B: PyObject_Vectorcall (call.c:299)
==3552160== by 0x52B93F: _PyEval_EvalFrameDefault (ceval.c:4772)
==3552160== by 0x5855A3: UnknownInlinedFun (pycore_ceval.h:73)
==3552160== by 0x5855A3: UnknownInlinedFun (ceval.c:6435)
==3552160== by 0x5855A3: UnknownInlinedFun (call.c:393)
==3552160== by 0x5855A3: _PyObject_VectorcallTstate.lto_priv.0 (pycore_call.h:92)
==3552160== Address 0x1201f990 is not stack'd, malloc'd or (recently) free'd
==3552160==
Fatal Python error: Segmentation fault
Current thread 0x0000000004b80880 (most recent call first):
Garbage-collecting
File "/home/dev/swh-environment/swh-dataset/build/__editable__.swh.dataset-1.5.2.dev7+g4fccea4.d20240523-py3-none-any/swh/dataset/journalprocessor.py", line 348 in progress_worker
File "/usr/lib/python3.11/concurrent/futures/process.py", line 257 in _process_worker
File "/usr/lib/python3.11/multiprocessing/process.py", line 108 in run
File "/usr/lib/python3.11/multiprocessing/process.py", line 314 in _bootstrap
File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 71 in _launch
File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19 in __init__
File "/usr/lib/python3.11/multiprocessing/context.py", line 281 in _Popen
File "/usr/lib/python3.11/multiprocessing/process.py", line 121 in start
File "/usr/lib/python3.11/concurrent/futures/process.py", line 761 in _spawn_process
File "/usr/lib/python3.11/concurrent/futures/process.py", line 750 in _launch_processes
File "/usr/lib/python3.11/concurrent/futures/process.py", line 723 in _start_executor_manager_thread
File "/usr/lib/python3.11/concurrent/futures/process.py", line 785 in submit
File "/home/dev/swh-environment/swh-dataset/build/__editable__.swh.dataset-1.5.2.dev7+g4fccea4.d20240523-py3-none-any/swh/dataset/journalprocessor.py", line 316 in run
File "/home/dev/swh-environment/swh-dataset/swh/dataset/test/test_journal_processor.py", line 170 in test_parallel_journal_processor_origin_visit_status
File "/home/dev/swh-environment/swh-dataset/swh/dataset/test/test_journal_processor.py", line 50 in newf
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/python.py", line 195 in pytest_pyfunc_call
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/python.py", line 1772 in runtest
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/runner.py", line 172 in pytest_runtest_call
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/runner.py", line 240 in <lambda>
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/runner.py", line 340 in from_call
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/runner.py", line 239 in call_and_report
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/runner.py", line 134 in runtestprotocol
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/runner.py", line 115 in pytest_runtest_protocol
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/main.py", line 339 in _main
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/main.py", line 285 in wrap_session
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
File "/home/dev/.local/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/config/__init__.py", line 174 in main
File "/home/dev/.local/lib/python3.11/site-packages/_pytest/config/__init__.py", line 197 in console_main
File "/home/dev/.local/bin/pytest", line 8 in <module>
Extension modules: confluent_kafka.cimpl, msgpack._cmsgpack, psutil._psutil_linux, psutil._psutil_posix, psycopg_c.pq, psycopg_c._psycopg, markupsafe._speedups, _brotli, simplejson._speedups, zmq.backend.cython.context, zmq.backend.cython.message, zmq.backend.cython.socket, zmq.backend.cython._device, zmq.backend.cython._poll, zmq.backend.cython._proxy_steerable, zmq.backend.cython._version, zmq.backend.cython.error, zmq.backend.cython.utils, tornado.speedups, plyvel._plyvel (total: 20)
==3552160==
==3552160== Process terminating with default action of signal 11 (SIGSEGV)
==3552160== at 0x4A28E2C: __pthread_kill_implementation (pthread_kill.c:44)
==3552160== by 0x49D9FB1: raise (raise.c:26)
==3552160== by 0x49DA04F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==3552160== by 0x4A28B16: __pthread_clockjoin_ex (pthread_join_common.c:43)
==3552160==
==3552160== HEAP SUMMARY:
==3552160== in use at exit: 29,294,834 bytes in 30,069 blocks
==3552160== total heap usage: 219,797 allocs, 189,728 frees, 522,042,917 bytes allocated
==3552160==
==3552160== LEAK SUMMARY:
==3552160== definitely lost: 4,547 bytes in 25 blocks
==3552160== indirectly lost: 12,610 bytes in 30 blocks
==3552160== possibly lost: 423,923 bytes in 214 blocks
==3552160== still reachable: 28,853,754 bytes in 29,800 blocks
==3552160== suppressed: 0 bytes in 0 blocks
==3552160== Rerun with --leak-check=full to see details of leaked memory
==3552160==
==3552160== For lists of detected and suppressed errors, rerun with: -s
==3552160== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==3552159==
==3552159== Process terminating with default action of signal 15 (SIGTERM)
==3552159== at 0x4A23E96: __futex_abstimed_wait_common64 (futex-internal.c:57)
==3552159== by 0x4A23E96: __futex_abstimed_wait_common (futex-internal.c:87)
==3552159== by 0x4A2683B: __pthread_cond_wait_common (pthread_cond_wait.c:503)
==3552159== by 0x4A2683B: pthread_cond_timedwait@@GLIBC_2.3.2 (pthread_cond_wait.c:643)
==3552159== by 0x4A2F5D8: cnd_timedwait@@GLIBC_2.34 (cnd_timedwait.c:27)
==3552159== by 0x7782B10: rd_kafka_q_serve_rkmessages (in /home/dev/.local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1)
==3552159== by 0x7743C3B: rd_kafka_consume_batch_queue (in /home/dev/.local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1)
==3552159== by 0x75A5BB6: Consumer_consume (Consumer.c:1056)
==3552159== by 0x550BBD: method_vectorcall_VARARGS_KEYWORDS.lto_priv.0 (descrobject.c:364)
==3552159== by 0x53AC2B: UnknownInlinedFun (pycore_call.h:92)
==3552159== by 0x53AC2B: PyObject_Vectorcall (call.c:299)
==3552159== by 0x52B93F: _PyEval_EvalFrameDefault (ceval.c:4772)
==3552159== by 0x5855A3: UnknownInlinedFun (pycore_ceval.h:73)
==3552159== by 0x5855A3: UnknownInlinedFun (ceval.c:6435)
==3552159== by 0x5855A3: UnknownInlinedFun (call.c:393)
==3552159== by 0x5855A3: _PyObject_VectorcallTstate.lto_priv.0 (pycore_call.h:92)
==3552159== by 0x585173: method_vectorcall.lto_priv.0 (classobject.c:59)
==3552159== by 0x56A380: UnknownInlinedFun (call.c:257)
==3552159== by 0x56A380: UnknownInlinedFun (call.c:328)
==3552159== by 0x56A380: PyObject_Call (call.c:355)
==3552159==
==3552159== HEAP SUMMARY:
==3552159== in use at exit: 29,630,515 bytes in 30,288 blocks
==3552159== total heap usage: 220,309 allocs, 190,021 frees, 522,280,354 bytes allocated
==3552159==
==3552143==
==3552143== HEAP SUMMARY:
==3552143== in use at exit: 29,243,757 bytes in 29,955 blocks
==3552143== total heap usage: 218,912 allocs, 188,957 frees, 522,173,350 bytes allocated
==3552143==
==3552143== LEAK SUMMARY:
==3552143== definitely lost: 576 bytes in 18 blocks
==3552143== indirectly lost: 0 bytes in 0 blocks
==3552143== possibly lost: 371,032 bytes in 200 blocks
==3552143== still reachable: 28,872,149 bytes in 29,737 blocks
==3552143== suppressed: 0 bytes in 0 blocks
==3552143== Rerun with --leak-check=full to see details of leaked memory
==3552143==
==3552143== For lists of detected and suppressed errors, rerun with: -s
==3552143== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==3552159== LEAK SUMMARY:
==3552159== definitely lost: 608 bytes in 19 blocks
==3552159== indirectly lost: 0 bytes in 0 blocks
==3552159== possibly lost: 676,795 bytes in 245 blocks
==3552159== still reachable: 28,953,112 bytes in 30,024 blocks
==3552159== suppressed: 0 bytes in 0 blocks
==3552159== Rerun with --leak-check=full to see details of leaked memory
==3552159==
==3552159== For lists of detected and suppressed errors, rerun with: -s
==3552159== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
How to reproduce
I'm afraid I was not able to reproduce it with a simple script.
confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): ('2.4.0', 33816576) and ('2.4.0', 33816831)
Description
I am getting the following crash in very specific conditions, while the Python GC is running.
I can reliably reproduce it in a pytest test suite of an (unfortunately) non-trivial code base.
Each test of the suite starts a producer, writes a few messages, then starts a
multiprocessing.Manager
,then a consumer and consumes topics to the end.
The third case to run segfaults when I manually trigger the GC in a specific place (or the fourth,
when the GC eventually runs on its own).
How to reproduce
I'm afraid I was not able to reproduce it with a simple script.
Clone https://gitlab.softwareheritage.org/swh/devel/swh-dataset.git , check out
f8b5c8665ac802f8b9dd15feb11dad2c2eadaee3
, then applying this patch (segfaultsalso happen without it, but it makes them more deterministic):
Installed Python packages
Checklist
Please provide the following information:
confluent-kafka-python and librdkafka version (
confluent_kafka.version()
andconfluent_kafka.libversion()
):('2.4.0', 33816576)
and('2.4.0', 33816831)
Apache Kafka broker version: none
Client configuration:
{'auto.offset.reset': 'earliest', 'bootstrap.servers': '127.0.0.1:45305', 'debug': 'all', 'enable.auto.commit': False, 'enable.partition.eof': True, 'error_cb': <function _error_cb at 0x7efc3959aa20>, 'group.id': 'test-consumer-gqizofzwretest_parallel_journal_processor', 'logger': <Logger swh.journal.client.rdkafka (DEBUG)>, 'message.max.bytes': '524288000', 'on_commit': <function _on_commit at 0x7efc3959bce0>}
Operating system: Debian 11
Provide client logs (with
'debug': '..'
as necessary): crash_log.txtProvide broker log excerpts
Critical issue
The text was updated successfully, but these errors were encountered: