Replies: 4 comments 13 replies
-
|
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
|
Please find below airflow config : [celery] [celery_kubernetes_executor] [core] [dag_processor] [database] [scheduler] Thanks |
Beta Was this translation helpful? Give feedback.
-
|
Moce to a discussion. You are not maintainer to open meta tasks, so please don't. Discussions are supposed to be used for such questions |
Beta Was this translation helpful? Give feedback.
-
|
@rcrchawla Thanks for sharing the logs. This looks more like an Airflow control plane issue than a Spark app issue. Here's why I say that: Could you share 3 things to confirm?
If this reproduces with healthy DB/ network, then it likely needs an Airflow fix. Otherwise it’s probably infra/DB pressure |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Body
Airflow task got failed where spark kube app is running. Although spark kube app is long running app most probably around 1-2 hour. And there are concurrently many task running at the same time usually it happens between 02:30 am - 03:45 am UTC.
Q) What causing issue ?
A) Airflow task failed while spark kube app running
Airflow version -- 3.0.4
Setup config
2 API servers
2 workers
1 dag processor
2 schedulers
Deployment --> HELM Chart deployment on Azure Kubernetes
Please check below logs
Worker logs :
2026-03-10 02:33:56.191330 [info ] Task execute_workload[8cbabf91-009f-44a6-86d1-bef109c70341] succeeded in 2715.019189195242s: None [celery.app.trace]
2026-03-10 02:39:57.112078 [info ] Task finished [supervisor] duration=1723.7576029417105 exit_code=0 final_state=success
2026-03-10 02:39:57.128929 [info ] Task execute_workload[9b3f27ec-09b5-424e-8d5c-412e541f51e8] succeeded in 1723.8186896019615s: None [celery.app.trace]
2026-03-10 02:40:50.688403 [info ] Task finished [supervisor] duration=744.0669570546597 exit_code=0 final_state=success
2026-03-10 02:40:50.705538 [info ] Task execute_workload[b08ac31a-2ee7-4029-b897-753157b18475] succeeded in 744.139388079755s: None [celery.app.trace]
2026-03-10 02:42:11.649891 [info ] Task finished [supervisor] duration=756.7588595808484 exit_code=0 final_state=success
2026-03-10 02:42:11.666368 [info ] Task execute_workload[0351c271-194e-4e58-87e4-a9c224351ab1] succeeded in 756.8229349320754s: None [celery.app.trace]
2026-03-10 02:43:37.239128 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:38.119304 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:38.640468 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:39.247588 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:39.425843 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:39.618220 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:40.002999 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:40.582177 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:41.186771 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:41.510710 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:42.658853 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:43.171303 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:43.826966 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:44.330891 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:44.874859 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:44.922591 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:45.866775 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:46.194974 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:46.482845 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:46.750792 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:48.198838 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:48.462121 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:49.749467 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:50.029438 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:50.834835 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:51.334847 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:51.431052 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:51.537615 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:52.567197 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:52.967177 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:53.615078 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:54.513959 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:56.442819 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:57.527549 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:57.765172 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:57.982839 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:58.099625 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:58.534632 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:59.007106 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:59.947380 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:44:02.200313 [warning ] Failed to send heartbeat. Will be retried [supervisor] failed_heartbeats=1 max_retries=3 ti_id=UUID('019cd54c-28b0-7e18-9a7b-71ba469bf545')
API Server
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=155023 state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local current_pid=81402 state=running ti_id=019cd578-f8c1-7125-9906-ef64229dbba5
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local current_pid=86154 state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
INFO: 10.10.12.52:40870 - "GET /api/v2/version HTTP/1.1" 200 OK
INFO: 10.10.12.52:40880 - "GET /api/v2/version HTTP/1.1" 200 OK
2026-03-10 02:45:23 [debug ] Processing heartbeat hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local pid=151395 ti_id=019cd542-0d47-7d93-a021-0cc2c9de7344
2026-03-10 02:45:23 [debug ] Refreshed token issued to Task [airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 valid_left=73
2026-03-10 02:45:23 [debug ] Refreshed token issued to Task [airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 valid_left=73
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Processing heartbeat hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local pid=155023 ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
[2026-03-10T02:45:23.575+0000] {exceptions.py:77} ERROR - Error with id 9zBmdizJ
File "/home/airflow/.local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/airflow/.local/lib/python3.12/site-packages/starlette/routing.py", line 75, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line 302, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line 213, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py", line 474, in decorator
response = await self._convert_endpoint_response_to_version(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py", line 520, in _convert_endpoint_response_to_version
response_or_response_body: Union[FastapiResponse, object] = await run_in_threadpool(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/starlette/concurrency.py", line 38, in run_in_threadpool
return await anyio.to_thread.run_sync(func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2476, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 967, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/cadwyn/schema_generation.py", line 515, in call
return self._original_callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/execution_api/routes/xcoms.py", line 419, in set_xcom
session.flush()
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
self.flush(objects)
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3588, in flush
with util.safe_reraise():
^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise
raise exception
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3549, in _flush
flush_context.execute()
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
rec.execute(self)
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
util.preloaded.orm_persistence.save_obj(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
_emit_insert_statements(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1097, in _emit_insert_statements
c = connection._execute_20(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
self.handle_dbapi_exception(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2134, in handle_dbapi_exception
util.raise(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise
raise exception
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 179, in execute
res = self._query(mogrified_query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 330, in _query
db.query(q)
File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/connections.py", line 280, in query
_mysql.connection.query(self, query)
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=65618 state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=151858 state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
What you think should happen instead?
Airflow task should run without getting failed.
Committer
Beta Was this translation helpful? Give feedback.
All reactions