-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job stuck "running", Post run hook errored, DB connection crashed? #11027
Comments
It has been corrected in 19.2.2. |
@azrdev is this fixed in >= 19.2.2 for you? |
I'll tell you as soon as our AWX is updated, sorry I'll have to wait some days for that to happen |
@chrismeyersfsu @Seb0042 on 19.3.0 we still see our long-running jobs' state not correctly synced with AWX, even though its stdout is displayed fully. The UI-visible error has changed, though: it's not stuck "running" anymore, but fails with status
|
Are you using the database provisioned by the operator, or connecting to your own? |
A couple more questions:
|
I have the same issue with a fresh new 19.3.0 and 19.4.0 Kubernetes setup with external Postgres 12 DB. Every job that runs longer than 5 min will result in the same failed status except that the playbook runs were all successful.
Job 20 log
Complete job 20 log --> |
@shanemcd sorry for the delay after your questions:
We use an external database.
no, this is not the case. |
The DB admins told us that indeed the DB closes unused connections after 10 minutes. |
Same here. With AWX version 19.4. Database and AWX launched by the AWX operator in the same k8s namespace |
AWX does not have a feature to explicitly keep the database connection alive. Maybe there is a Django feature you can tweak to do this for you? |
@chrismeyersfsu django has CONN_MAX_AGE and mentions the problem explicitly:
on https://docs.djangoproject.com/en/3.2/ref/databases/#persistent-connections I cannot see where AWX (specifically awx_task, I guess) sets its django config/options, which would also be the place to override CONN_MAX_AGE to something smaller than our external DB has -- by default it's apparently on a high value or |
Hello. Based on the few number of folks seeing this, it seems likely that the problem is with your environment. If you need help troubleshooting or are looking for help using AWX, try our mailing list or IRC channel: #ansible-awx on https://libera.chat/ If after further troubleshooting you still think this is a bug in AWX, please open a new issue with any information you find. |
Please confirm the following
Summary
I have a WFJ which has completed all tasks but is stuck in the "running" state.
kubectl logs
shows traceback(s):AWX version
19.2.0
Installation method
kubernetes
Modifications
yes
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
If I knew /o\
Expected results
the job to terminate
Actual results
keeps "running"
Additional information
Customization: our database is external to the k8s cluster.
(Maybe) related tickets:
The text was updated successfully, but these errors were encountered: