Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets / pyarrow / aws-c-io causes crash in aws_thread_launch #1026

Closed
yifanmai opened this issue Oct 14, 2022 · 6 comments · Fixed by #1648
Closed

datasets / pyarrow / aws-c-io causes crash in aws_thread_launch #1026

yifanmai opened this issue Oct 14, 2022 · 6 comments · Fixed by #1648
Assignees
Labels
p2 Priority 2 (Good to have for release)

Comments

@yifanmai
Copy link
Collaborator

yifanmai commented Oct 14, 2022

When running benchmark-present --conf src/benchmark/presentation/run_specs_tiny.conf --local --max-eval-instances 10 --suite v1, I got a crash with:

Fatal error condition occurred in /opt/vcpkg/buildtrees/aws-c-io/src/9e6648842a-364b708815.clean/source/event_loop.c:72: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200af06) [0x7f4346604f06]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x20028e5) [0x7f43465fc8e5]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f27e09) [0x7f4346521e09]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) [0x7f4346605a3d]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f25948) [0x7f434651f948]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) [0x7f4346605a3d]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1ee0b46) [0x7f43464dab46]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x194546a) [0x7f4345f3f46a]
/lib/x86_64-linux-gnu/libc.so.6(+0x468d7) [0x7f43489f98d7]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7f43489f9a90]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(+0x253608) [0x560bd9cca608]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(+0x25363b) [0x560bd9cca63b]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(+0x253680) [0x560bd9cca680]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(PyRun_SimpleFileExFlags+0x274) [0x560bd9ccd604]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(Py_RunMain+0x3a9) [0x560bd9ccda29]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(Py_BytesMain+0x39) [0x560bd9ccdc29]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f43489d70b3]
/nlp/u/maiyifan/miniconda3/envs/crfm-benchmarking/bin/python(+0x1f9ad7) [0x560bd9c70ad7]
Aborted

Seems to be caused by https://issues.apache.org/jira/browse/ARROW-15141

@yifanmai
Copy link
Collaborator Author

I'm going to try Python 3.9 to see if it makes a difference.

@yifanmai
Copy link
Collaborator Author

Python 3.9 on virtualenv also breaks. Not sure when this started happening. Also, I'm on a branch rather than main.

@percyliang
Copy link
Contributor

Is this still happening? Does it seem like we need to fix this before release?

@percyliang percyliang added the p2 Priority 2 (Good to have for release) label Oct 23, 2022
@yifanmai
Copy link
Collaborator Author

I talked to Tony and apparently this only happens on cluster machines. Also it seems to have stopped happening for me on the cluster machines. I'll close as can't repro for now.

@yuhui-zh15
Copy link
Collaborator

I'm experiencing the same error on the SAIL cluster. The error can be solved by downgrading pyarrow from 9.0.0 to 6.0.1 through pip install pyarrow==6.0.1. The solution was originally proposed by huggingface/datasets#3310 (comment).

While this error does not affect anything, it is annoying to see it every run, so I attach the solution here for future reference.

@yifanmai yifanmai reopened this Jun 6, 2023
@yifanmai
Copy link
Collaborator Author

yifanmai commented Jun 6, 2023

This seems to be worked around in pyarrow 11.0.0. Let's bump the pyarrow version.

@yifanmai yifanmai changed the title Using conda causes crash in aws_thread_launch datasets / pyarrow / aws-c-io causes crash in aws_thread_launch Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2 Priority 2 (Good to have for release)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants