Skip to content

Commit 88a8eff

Browse files
karenbraganzRNHTTR
andauthored
Remove reference to undead tasks from documentation (apache#43536)
--------- Co-authored-by: Ryan Hatter <25823361+RNHTTR@users.noreply.github.com>
1 parent a14aedb commit 88a8eff

File tree

4 files changed

+41
-47
lines changed

4 files changed

+41
-47
lines changed

airflow/jobs/scheduler_job_runner.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2017,14 +2017,14 @@ def _purge_zombies(self, zombies: list[tuple[TI, str]], *, session: Session) ->
20172017
f"Task did not emit heartbeat within time limit ({self._zombie_threshold_secs} "
20182018
"seconds) and will be terminated. "
20192019
"See https://airflow.apache.org/docs/apache-airflow/"
2020-
"stable/core-concepts/tasks.html#zombie-undead-tasks"
2020+
"stable/core-concepts/tasks.html#zombie-tasks"
20212021
),
20222022
)
20232023
)
20242024
self.log.error(
20252025
"Detected zombie job: %s "
20262026
"(See https://airflow.apache.org/docs/apache-airflow/"
2027-
"stable/core-concepts/tasks.html#zombie-undead-tasks)",
2027+
"stable/core-concepts/tasks.html#zombie-tasks)",
20282028
request,
20292029
)
20302030
self.job.executor.send_callback(request)

docs/apache-airflow/core-concepts/tasks.rst

Lines changed: 10 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -167,55 +167,20 @@ These can be useful if your code has extra knowledge about its environment and w
167167

168168
.. _concepts:zombies:
169169

170-
Zombie/Undead Tasks
171-
-------------------
170+
Zombie Tasks
171+
------------
172172

173-
No system runs perfectly, and task instances are expected to die once in a while. Airflow detects two kinds of task/process mismatch:
173+
No system runs perfectly, and task instances are expected to die once in a while.
174174

175-
* *Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
176-
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
177-
periodically, clean them up, and either fail or retry the task depending on its settings. Tasks can become zombies for
178-
many reasons, including:
175+
*Zombie tasks* are ``TaskInstances`` stuck in a ``running`` state despite their associated jobs being inactive
176+
(e.g. their process did not send a recent heartbeat as it got killed, or the machine died). Airflow will find these
177+
periodically, clean them up, and either fail or retry the task depending on its settings. Tasks can become zombies for
178+
many reasons, including:
179179

180-
* The Airflow worker ran out of memory and was OOMKilled.
181-
* The Airflow worker failed its liveness probe, so the system (for example, Kubernetes) restarted the worker.
182-
* The system (for example, Kubernetes) scaled down and moved an Airflow worker from one node to another.
180+
* The Airflow worker ran out of memory and was OOMKilled.
181+
* The Airflow worker failed its liveness probe, so the system (for example, Kubernetes) restarted the worker.
182+
* The system (for example, Kubernetes) scaled down and moved an Airflow worker from one node to another.
183183

184-
* *Undead tasks* are tasks that are *not* supposed to be running but are, often caused when you manually edit Task
185-
Instances via the UI. Airflow will find them periodically and terminate them.
186-
187-
188-
Below is the code snippet from the Airflow scheduler that runs periodically to detect zombie/undead tasks.
189-
190-
.. exampleinclude:: /../../airflow/jobs/scheduler_job_runner.py
191-
:language: python
192-
:start-after: [START find_and_purge_zombies]
193-
:end-before: [END find_and_purge_zombies]
194-
195-
196-
The explanation of the criteria used in the above snippet to detect zombie tasks is as below:
197-
198-
1. **Task Instance State**
199-
200-
Only task instances in the RUNNING state are considered potential zombies.
201-
202-
2. **Job State and Heartbeat Check**
203-
204-
Zombie tasks are identified if the associated job is not in the RUNNING state or if the latest heartbeat of the job is
205-
earlier than the calculated time threshold (limit_dttm). The heartbeat is a mechanism to indicate that a task or job is
206-
still alive and running.
207-
208-
3. **Job Type**
209-
210-
The job associated with the task must be of type ``LocalTaskJob``.
211-
212-
4. **Queued by Job ID**
213-
214-
Only tasks queued by the same job that is currently being processed are considered.
215-
216-
These conditions collectively help identify running tasks that may be zombies based on their state, associated job
217-
state, heartbeat status, job type, and the specific job that queued them. If a task meets these criteria, it is
218-
considered a potential zombie, and further actions, such as logging and sending a callback request, are taken.
219184

220185
Reproducing zombie tasks locally
221186
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
/*!
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
document.addEventListener("DOMContentLoaded", function () {
21+
const redirects = {
22+
"zombie-undead-tasks": "zombie-tasks",
23+
};
24+
const fragment = window.location.hash.substring(1);
25+
if (redirects[fragment]) {
26+
window.location.hash = redirects[fragment];
27+
}
28+
});

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,7 @@ def _get_rst_filepath_from_path(filepath: pathlib.Path):
363363
"administration-and-deployment/logging-monitoring/advanced-logging-configuration.html",
364364
"howto/docker-compose/index.html",
365365
]
366+
html_js_files.append("redirects.js")
366367
if PACKAGE_NAME.startswith("apache-airflow-providers"):
367368
manual_substitutions_in_generated_html = ["example-dags.html", "operators.html", "index.html"]
368369
if PACKAGE_NAME == "docker-stack":

0 commit comments

Comments
 (0)