Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"PRD" RxNorm Historical errors out with SIGKILL message #307

Open
jrlegrand opened this issue Jul 10, 2024 · 4 comments
Open

"PRD" RxNorm Historical errors out with SIGKILL message #307

jrlegrand opened this issue Jul 10, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@jrlegrand
Copy link
Member

Problem Statement

DAG runs for a while and then errors out. Doesn't do this on my local machine. Brief googling suggests it's memory related, but hard to diagnose. See this: apache/airflow#10435

Error message:
[2024-07-10, 11:48:05 UTC] {local_task_job.py:208} INFO - Task exited with return code Negsignal.SIGKILL

Criteria for Success

DAG runs to completion in "PRD".

Additional Information

image
@jrlegrand jrlegrand added the bug Something isn't working label Jul 11, 2024
@jrlegrand
Copy link
Member Author

This is also breaking my rxnorm_historical locally

Assuming it is a memory issue: https://medium.com/brexeng/debugging-and-preventing-memory-errors-in-python-e00be55e7cf2

It gets through all API calls and then silently crashes before it exists the concurrent API calls function. Assuming it has to do with ThreadPoolExecutor taking up too much memory somehow.

@jrlegrand
Copy link
Member Author

I think something's happening in the ThreadPoolExecutor. I added some logging and it doesn't seem to get past the for loop before stalling out forever (or giving the SIGKILL message in "PRD").

Image

@jrlegrand jrlegrand moved this from Todo to In Progress in SageRx Sprint Board Jul 18, 2024
@jrlegrand
Copy link
Member Author

Need to try asynchronous process. Look into async.io python library.

Suspect once concurrency is up and running with as many threads it can handle, the handoff trying to reassign the threads is causing it to break. Could also be a docker issue - choking on concurrent requests.

@jrlegrand
Copy link
Member Author

@NTBTI FYI - this is the bug you ran into.

@jrlegrand jrlegrand mentioned this issue Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

4 participants