-
Notifications
You must be signed in to change notification settings - Fork 16.7k
Description
Apache Airflow version: 1.10.12
Kubernetes version: v1.17.6
Environment:
- OS: CentOS 7
Context:
Dynamically launch KubernetesPodOperators within a subdag :
The number of tasks n (gps_statistics_0 .... gps_statistics_n) varies with each execution date.
What happened:
As you can see on the previous image, all execution dates seem to have launched a maximum of 16 executors. BUT this is not the case. In fact the webserver only show the tasks with the same name as the ones in the last execution date (here it means tasks gps_statistics_0 .... gps_statistics_n).
(Which also means if I decided to suffix task's names with their execution date, I would only see tasks for the last execution date.)
Now if I go to browse -> task instances and filter for task id gps_statistics_17, for instance, which seems not to exist, I actually find tasks with this name. It proves that those tasks exist and have been executed.

Sadly it is not just a display issue, because if I try to access one of the gps_statistics_17 task instances, I get the following error:

Logs are not available either.
Furthermore, you can see on the first image that one execution date had only 6 instances. So at that moment, tasks with n > 6 were not available. However when the next execution start, with n >= 16, task with id >6 and < n will be back, available to see and click.
So the tasks exist somewhere and just seem to be unavailable at the moment of the executions with n inferior to their id.
Why it is important:
It is not possible to monitor nor to access failed tasks to make a correction. Consequently, users risk is to lose results at failed executions dates:
- It is not possible to know how many tasks failed
- It is not possible to run again failed tasks
- It is not possible to know how many tasks where executed
How is that and is there a way to prevent it ?
Regards
