Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: MessageMeta.copy_dataframe() causes SIGSEGV error with certain cudf dataframes #1934

Open
2 tasks done
ashsong-nv opened this issue Oct 9, 2024 · 3 comments · May be fixed by #1945
Open
2 tasks done

[BUG]: MessageMeta.copy_dataframe() causes SIGSEGV error with certain cudf dataframes #1934

ashsong-nv opened this issue Oct 9, 2024 · 3 comments · May be fixed by #1945
Assignees
Labels
bug Something isn't working Needs Triage Need team to review and classify

Comments

@ashsong-nv
Copy link

ashsong-nv commented Oct 9, 2024

Version

24.10

Which installation method(s) does this occur on?

Source

Describe the bug.

The MessageMeta.copy_dataframe() method crashes with a SIGSEGV error when called on cudf dataframes that meet any of the following edge case conditions:

  1. Empty cudf dataframes converted from empty pandas dataframe
  2. Empty cudf dataframes converted from non-empty pandas dataframe, and filtered to be empty
  3. cudf dataframes with ListDtype(object) columns that originally contained a mix of list[str] and None values, but are filtered to just the row with the None value.

The error doesn't occur when directly creating a deep copy of the dataframe, or when using MessageMeta.mutable_dataframe().

Please see attached reproducer Python script for more comprehensive tests of the various edge cases.

messagemeta_copydataframe_sigsegv_reproducer.txt

Minimum reproducible example

# Scenario 1: Empty cudf dataframe converted from pandas df
df = pd.DataFrame(columns=["a"], dtype="object")
df = cudf.from_pandas(df)
mm = MessageMeta(df)
mm.copy_dataframe()

# Scenario 2: Filtered cudf dataframe that orignally contained mixed `list[str]` and None values
df = cudf.DataFrame({"a": [["a"], None]})
df = df.drop(0)
mm = MessageMeta(df)
mm.copy_dataframe()

Relevant log output

Click here to see error details

Logs when running in the repro script:

Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:2455271) ====
 0  /opt/conda/envs/morpheus/lib/./libucs.so.0(ucs_handle_error+0x2fd) [0x7f119806dfed]
 1  /opt/conda/envs/morpheus/lib/./libucs.so.0(+0x2a1e1) [0x7f119806e1e1]
 2  /opt/conda/envs/morpheus/lib/./libucs.so.0(+0x2a3aa) [0x7f119806e3aa]
 3  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f11ebaa0520]
 4  /opt/conda/envs/morpheus/lib/python3.10/site-packages/cudf/_lib/column.cpython-310-x86_64-linux-gnu.so(+0x4e23c) [0x7f11b2aa423c]
 5  /opt/conda/envs/morpheus/lib/python3.10/site-packages/cudf/_lib/column.cpython-310-x86_64-linux-gnu.so(+0x4f2a7) [0x7f11b2aa52a7]
 6  /workspace/external/morpheus/python/morpheus/morpheus/_lib/cudf_helpers.cpython-310-x86_64-linux-gnu.so(_Z28data_from_table_view_indexedN4cudf10table_viewEP7_objectS2_S2_S2_+0xaee) [0x7f1190ab878e]
 7  /workspace/external/morpheus/python/morpheus/morpheus/_lib/cudf_helpers.cpython-310-x86_64-linux-gnu.so(_Z31make_table_from_table_info_dataN8morpheus13TableInfoDataEP7_object+0x18a7) [0x7f1190ac27b7]
 8  /workspace/external/morpheus/build/python/morpheus/morpheus/_lib/libmorpheus.so(+0x2673ba) [0x7f1198cc93ba]
 9  /workspace/external/morpheus/build/python/morpheus/morpheus/_lib/libmorpheus.so(_ZN8morpheus25MessageMetaInterfaceProxy14get_data_frameERNS_11MessageMetaE+0x2a1) [0x7f1198bd5761]
10  /workspace/external/morpheus/python/morpheus/morpheus/_lib/messages.cpython-310-x86_64-linux-gnu.so(+0x542b8) [0x7f11909ed2b8]
11  /workspace/external/morpheus/python/morpheus/morpheus/_lib/messages.cpython-310-x86_64-linux-gnu.so(+0x43e6f) [0x7f11909dce6f]
12  /opt/conda/envs/morpheus/bin/python(+0x1445a6) [0x55fc1954e5a6]
13  /opt/conda/envs/morpheus/bin/python(_PyObject_MakeTpCall+0x26b) [0x55fc19547a6b]
14  /opt/conda/envs/morpheus/bin/python(+0x150866) [0x55fc1955a866]
15  /opt/conda/envs/morpheus/bin/python(_PyEval_EvalFrameDefault+0x4c12) [0x55fc19543142]
16  /opt/conda/envs/morpheus/bin/python(_PyFunction_Vectorcall+0x6c) [0x55fc1954ea2c]
17  /opt/conda/envs/morpheus/bin/python(_PyEval_EvalFrameDefault+0x320) [0x55fc1953e850]
18  /opt/conda/envs/morpheus/bin/python(+0x1d7c60) [0x55fc195e1c60]
19  /opt/conda/envs/morpheus/bin/python(PyEval_EvalCode+0x87) [0x55fc195e1ba7]
20  /opt/conda/envs/morpheus/bin/python(+0x20812a) [0x55fc1961212a]
21  /opt/conda/envs/morpheus/bin/python(+0x203523) [0x55fc1960d523]
22  /opt/conda/envs/morpheus/bin/python(+0x9a6f5) [0x55fc194a46f5]
23  /opt/conda/envs/morpheus/bin/python(_PyRun_SimpleFileObject+0x1ae) [0x55fc196079fe]
24  /opt/conda/envs/morpheus/bin/python(_PyRun_AnyFileObject+0x44) [0x55fc19607594]
25  /opt/conda/envs/morpheus/bin/python(Py_RunMain+0x38b) [0x55fc1960478b]
26  /opt/conda/envs/morpheus/bin/python(Py_BytesMain+0x37) [0x55fc195d51f7]
27  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f11eba87d90]
28  /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f11eba87e40]
29  /opt/conda/envs/morpheus/bin/python(+0x1cb0f1) [0x55fc195d50f1]
=================================
Segmentation fault (core dumped)

Logs when running in a morpheus pipeline:

PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 7 (TID 0x7fb86e7dd640) from PID 0; stack trace: ***
    @     0x7fbc4561b197 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fbc477c7520 (unknown)
    @     0x7fbbfdb3e23c (unknown)
    @     0x7fbbfdb3f2a7 (unknown)
    @     0x7fbbe106678e data_from_table_view_indexed()
    @     0x7fbbe10707b7 make_table_from_table_info_data()
    @     0x7fbbe14da3ba morpheus::CudfHelper::table_from_table_info()
    @     0x7fbbe13e6761 morpheus::MessageMetaInterfaceProxy::get_data_frame()
    @     0x7fbbe13e67f5 morpheus::MessageMetaInterfaceProxy::df_property()
    @     0x7fbbe0f9a378 (unknown)
    @     0x7fbbe0f89e6f (unknown)
    @     0x557c43599576 cfunction_call
    @     0x557c435928d3 _PyObject_MakeTpCall.localalias
    @     0x557c434ceecf property_descr_get.cold
    @     0x557c43597bf3 _PyObject_GenericGetAttrWithDict.localalias
    @     0x557c43596a55 PyObject_GetAttr.localalias
    @     0x557c4358e0aa _PyEval_EvalFrameDefault
    @     0x557c435999fc _PyFunction_Vectorcall
    @     0x557c4358e2f5 _PyEval_EvalFrameDefault
    @     0x557c435a4f78 method_vectorcall
    @     0x7fbc46127c8c _ZNSt17_Function_handlerIFN8pybind116objectES1_EZNK3mrc5pymrc12PyFuncHolderIS2_E18build_cpp_functionEONS0_8functionEEUlS1_E_E9_M_invokeERKSt9_Any_dataOS1_
    @     0x7fbc461279ca _ZNK5rxcpp6detail17specific_observerIN3mrc5pymrc14PyObjectHolderENS_8observerIS4_NS_9operators6detail3mapIS4_ZZNS3_14OperatorsProxy3mapENS3_14OnDataFunctionEENKUlNS_10observableIS4_NS_18dynamic_observableIS4_EEEEE_clESE_EUlS4_E_E12map_observerINS_10subscriberIS4_NS5_IS4_vvvvEEEEEEvvvEEvE7on_nextERKS4_
    @     0x7fbc461781ea rxcpp::subjects::detail::multicast_observer<>::on_next()
    @     0x7fbc4612582f rxcpp::subscriber<>::on_next<>()
    @     0x7fbc4615e65d mrc::node::EdgeRxSubscriber<>::await_write()
    @     0x7fbc4614d1db _ZNK5rxcpp6detail17specific_observerIN3mrc5pymrc14PyObjectHolderENS_8observerIS4_NS0_22stateless_observer_tagEZNS2_4node12RxSourceBaseIS4_EC4EvEUlS4_E_ZNS9_C4EvEUlNSt15__exception_ptr13exception_ptrEE0_vEEvE7on_nextEOS4_
    @     0x7fbc4612582f rxcpp::subscriber<>::on_next<>()
    @     0x7fbc46127f99 _ZNK5rxcpp6detail17specific_observerIN3mrc5pymrc14PyObjectHolderENS_8observerIS4_NS_9operators6detail3mapIS4_ZZNS3_14OperatorsProxy3mapENS3_14OnDataFunctionEENKUlNS_10observableIS4_NS_18dynamic_observableIS4_EEEEE_clESE_EUlS4_E_E12map_observerINS_10subscriberIS4_NS5_IS4_vvvvEEEEEEvvvEEvE7on_nextEOS4_
    @     0x7fbc46165bbb mrc::node::RxSinkBase<>::progress_engine()
    @     0x7fbc46165e37 _ZNSt17_Function_handlerIFvN5rxcpp10subscriberIN3mrc5pymrc14PyObjectHolderENS0_8observerIS4_vvvvEEEEEZNS0_18dynamic_observableIS4_E9constructINS0_7sources6detail6createIS4_ZNS2_4node10RxSinkBaseIS4_EC4EvEUlS7_E_EEEEvOT_ONSC_10tag_sourceEEUlS7_E_E9_M_invokeERKSt9_Any_dataOS7_
    @     0x7fbc461232b4 _ZNK5rxcpp9operators6detail13lift_operatorIN3mrc5pymrc14PyObjectHolderENS_18dynamic_observableIS5_EENS1_3mapIS5_ZZNS4_14OperatorsProxy3mapENS4_14OnDataFunctionEENKUlNS_10observableIS5_S7_EEE_clESC_EUlS5_E_EEE12on_subscribeINS_10subscriberIS5_NS_8observerIS5_vvvvEEEEEEvT_
    @     0x7fbc46123466 _ZNSt17_Function_handlerIFvN5rxcpp10subscriberIN3mrc5pymrc14PyObjectHolderENS0_8observerIS4_vvvvEEEEEZNS0_18dynamic_observableIS4_E9constructINS0_9operators6detail13lift_operatorIS4_SA_NSD_3mapIS4_ZZNS3_14OperatorsProxy3mapENS3_14OnDataFunctionEENKUlNS0_10observableIS4_SA_EEE_clESJ_EUlS4_E_EEEEEEvOT_ONS0_7sources10tag_sourceEEUlS7_E_E9_M_invokeERKSt9_Any_dataOS7_

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

I originally discovered this issue when working with a morpheus pipeline that had message payloads converted from messy API JSON responses. The crash happened in the MonitorStage at monitor_controller.check_df() L195

Code of Conduct

  • I agree to follow Morpheus' Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@ashsong-nv ashsong-nv added the bug Something isn't working label Oct 9, 2024
@morpheus-bot-test morpheus-bot-test bot added Needs Triage Need team to review and classify external This issue was filed by someone outside of the Morpheus team labels Oct 9, 2024
@morpheus-bot-test
Copy link

Hi @ashsong-nv!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the meantime, feel free to add any relevant information to this issue.

@ashsong-nv ashsong-nv removed the external This issue was filed by someone outside of the Morpheus team label Oct 9, 2024
@cwharris
Copy link
Contributor

cwharris commented Oct 9, 2024

Attempted to repro with the RAPIDS 24.10 update from #1874:

Scenario 1 passes with:

Empty DataFrame
Columns: [a]
Index: []

Scenario 2 fails with:

[bfd660fbfe0d:78284:0:78284] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:  78284) ====
 0  /home/coder/.conda/envs/cyber/lib/libucs.so.0(ucs_handle_error+0x2fd) [0x738a9c894fed]
 1  /home/coder/.conda/envs/cyber/lib/libucs.so.0(+0x2a1e1) [0x738a9c8951e1]
 2  /home/coder/.conda/envs/cyber/lib/libucs.so.0(+0x2a3aa) [0x738a9c8953aa]
 3  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x738bd068d520]
 4  /home/coder/.conda/envs/cyber/lib/python3.10/site-packages/cudf/_lib/column.cpython-310-x86_64-linux-gnu.so(+0x57dcb) [0x738bbe2e3dcb]
 5  /home/coder/.conda/envs/cyber/lib/python3.10/site-packages/cudf/_lib/column.cpython-310-x86_64-linux-gnu.so(+0x59626) [0x738bbe2e5626]
 6  /home/coder/morpheus/python/morpheus/morpheus/_lib/cudf_helpers.cpython-310-x86_64-linux-gnu.so(_Z28data_from_table_view_indexedN4cudf10table_viewEP7_objectS2_S2_S2_+0x9c6) [0x738a99748e26]
 7  /home/coder/morpheus/python/morpheus/morpheus/_lib/cudf_helpers.cpython-310-x86_64-linux-gnu.so(_Z31make_table_from_table_info_dataN8morpheus13TableInfoDataEP7_object+0x1724) [0x738a997567d4]
 8  /home/coder/morpheus/build/conda/cuda-12.5/release/python/morpheus/morpheus/_lib/libmorpheus.so(+0x26bd99) [0x738a9d936d99]
 9  /home/coder/morpheus/build/conda/cuda-12.5/release/python/morpheus/morpheus/_lib/libmorpheus.so(_ZN8morpheus25MessageMetaInterfaceProxy14get_data_frameERNS_11MessageMetaE+0x2a2) [0x738a9d83ff82]
10  /home/coder/morpheus/python/morpheus/morpheus/_lib/messages.cpython-310-x86_64-linux-gnu.so(+0x5a53e) [0x738a9967e53e]
11  /home/coder/morpheus/python/morpheus/morpheus/_lib/messages.cpython-310-x86_64-linux-gnu.so(+0x45728) [0x738a99669728]
12  python(+0x13b576) [0x58c66f4d5576]
13  python(_PyObject_MakeTpCall+0x2d3) [0x58c66f4ce8d3]
14  python(+0x147106) [0x58c66f4e1106]
15  python(_PyEval_EvalFrameDefault+0x49b5) [0x58c66f4ca2f5]
16  python(+0x1cbfac) [0x58c66f565fac]
17  python(PyEval_EvalCode+0x87) [0x58c66f565ef7]
18  python(+0x1fc23a) [0x58c66f59623a]
19  python(+0x1f76b3) [0x58c66f5916b3]
20  python(+0x96e54) [0x58c66f430e54]
21  python(_PyRun_SimpleFileObject+0x1bd) [0x58c66f58beed]
22  python(_PyRun_AnyFileObject+0x44) [0x58c66f58ba84]
23  python(Py_RunMain+0x31b) [0x58c66f588deb]
24  python(Py_BytesMain+0x37) [0x58c66f559637]
25  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x738bd0674d90]
26  /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x738bd0674e40]
27  python(+0x1bf54e) [0x58c66f55954e]
=================================
Segmentation fault (core dumped)

@cwharris
Copy link
Contributor

cwharris commented Oct 9, 2024

using MessageMeta.copy_dataframe() with the following dfs indicates the problem may be specific to cudf series which represent lists of strings, though more investigation is required.

df = cudf.DataFrame({"a": cudf.Series([None], dtype=cudf.core.dtypes.ListDtype("int"))}) # nominal
df = cudf.DataFrame({"a": cudf.Series([], dtype=cudf.core.dtypes.ListDtype("string"))}) # segfault
df = cudf.DataFrame({"a": cudf.Series([None], dtype=cudf.core.dtypes.ListDtype("string"))}) # segfault
df = cudf.DataFrame({"a": cudf.Series([[]], dtype=cudf.core.dtypes.ListDtype("string"))}) # segfault
df = cudf.DataFrame({"a": cudf.Series([[None]], dtype=cudf.core.dtypes.ListDtype("string"))}) # nominal
df = cudf.DataFrame({"a": cudf.Series([["a"]], dtype=cudf.core.dtypes.ListDtype("string"))}) # nominal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Needs Triage Need team to review and classify
Projects
Status: Review - Ready for Review
3 participants