Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR] read_file_from_uri() print Segmentation Fault message while loading from S3 bucket #32931

Open
woshiyyya opened this issue Mar 1, 2023 · 4 comments
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks ray-team-created Ray Team created train Ray Train Related Issue

Comments

@woshiyyya
Copy link
Member

woshiyyya commented Mar 1, 2023

What happened + What you expected to happen

When reading a file from S3 bucket, although the python script exits normally, read_file_from_uri still returned an segmentation fault message: Segmentation fault (core dumped)

from ray.air._internal.remote_storage import read_file_from_uri
read_file_from_uri("s3://anyscale-yunxuanx-demo/checkpoint_000008/.metadata.pkl")

No SegFault:
Release 2.3 + py39
Release 2.3 + py38

SegFault:
NightlyBuild + py39
NightlyBuild + py38

So some code change after release 2.3 might cause this error.

Reading from local URI works fine.

read_file_from_uri("file:///mnt/cluster_storage/test/checkpoint_000008/.metadata.pkl")

Versions / Dependencies

nightly build

Reproduction script

from ray.air._internal.remote_storage import read_file_from_uri
read_file_from_uri("s3://anyscale-yunxuanx-demo/checkpoint_000008/.metadata.pkl")

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@woshiyyya woshiyyya added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) air labels Mar 1, 2023
@woshiyyya woshiyyya changed the title [AIR] read_file_from_uri() cause Segmentation Fault while loading from S3 bucket [AIR] read_file_from_uri() print Segmentation Fault message while loading from S3 bucket Mar 1, 2023
@justinvyu justinvyu added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 1, 2023
@scv119
Copy link
Contributor

scv119 commented Mar 1, 2023

it would also nice to show the SEGFAULT trace; which might give us hints why it failed.

@justinvyu
Copy link
Contributor

@scv119

(gdb) run segfault.py
Starting program: /home/ray/anaconda3/bin/python segfault.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff1b96700 (LWP 11962)]
[New Thread 0x7ffff1395700 (LWP 11963)]
[New Thread 0x7fffecb94700 (LWP 11964)]
[New Thread 0x7fffe47ff700 (LWP 11988)]
[New Thread 0x7fffe3cba700 (LWP 11989)]
[Thread 0x7fffe3cba700 (LWP 11989) exited]
[New Thread 0x7fffe3cba700 (LWP 11990)]
[Thread 0x7fffe3cba700 (LWP 11990) exited]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffe68a7cb7 in Aws::Http::CurlHandleContainer::~CurlHandleContainer() () from /home/ray/anaconda3/lib/python3.9/site-packages/pyarrow/libarrow.so.1100

@justinvyu
Copy link
Contributor

This might be related to apache/arrow#15054

@Yard1 Yard1 added the ray-team-created Ray Team created label Mar 22, 2023
@anyscalesam anyscalesam added train Ray Train Related Issue and removed air labels Oct 27, 2023
@anyscalesam
Copy link
Contributor

@justinvyu @woshiyyya is this still happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks ray-team-created Ray Team created train Ray Train Related Issue
Projects
None yet
Development

No branches or pull requests

5 participants