Add ExecuteCallback support to AwsLambdaExecutor#63035
Add ExecuteCallback support to AwsLambdaExecutor#63035SameerMesiah97 wants to merge 3 commits intoapache:mainfrom
Conversation
91cc6df to
b1496cd
Compare
|
Tagging you here as requested for tracking. |
ferruzzi
left a comment
There was a problem hiding this comment.
Good start, left some comments and suggestions. Thanks for taking this on!
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/utils.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/utils.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Outdated
Show resolved
Hide resolved
o-nikolas
left a comment
There was a problem hiding this comment.
Thanks for the contribution! This looks like a good start 🙂
But it's the bare minimum changes. Please be sure to cleanup your code before submitting a PR. Make sure types are correct, all comments, variables, method names updated. It takes a long time for maintainers to review code so we want to be using that time wisely.
| data = json.loads(ser_task_key) | ||
| task_key = TaskInstanceKey.from_dict(data) | ||
| except Exception: | ||
| # If that task fails to deserialize, we should just skip it. | ||
| self.log.exception( | ||
| "Task failed to be adopted because the key could not be deserialized" | ||
| ) | ||
| continue | ||
| # Callback workloads use string keys. | ||
| task_key = ser_task_key | ||
|
|
There was a problem hiding this comment.
@ferruzzi can callbacks be adopted? If not, then we don't need these changes here, or if anything, only to discard callbacks (we probably don't want to log for each one).
There was a problem hiding this comment.
I have kept the functional aspects of try_adopt_task_instances the same as in my first attempt for now. Once @ferruzzi confirms whether or not this needs to handle callbacks (even if it is just logging exceptions), I will adjust this accordingly.
There was a problem hiding this comment.
Sorry for the delay. I don't believe they support adoption yet, but that might/should/could be added in the future (not this PR, obviously).
There was a problem hiding this comment.
@ferruzzi So you are okay with keeping it as is? Or revert it to the state before the implementation with exception logging removed?
| @@ -411,7 +467,7 @@ def process_queue(self, queue_url: str): | |||
| task_key = self.running_tasks[ser_task_key] | |||
| except KeyError: | |||
There was a problem hiding this comment.
Shouldn't all these mentions of task(s) in variables and comments be updated?
There was a problem hiding this comment.
Where appropriate I have changed 'task' to 'workload' in variable names, method signatures, comments, docstrings and log messsages. However, some variables I thought it best to leave them as is because it would be misleading to change them. For example, ser_task_key is derived from the value whose key name is task_key in the JSON payload, which constitutes the SQS message. It would not make sense to change ser_task_key to ser_workload_key just for the sake of consistency.
There was a problem hiding this comment.
That's interesting yeah. I suppose that key should really be workload_key in the return schema for the lambda executor. It should still continue to accept task, but we should try to migrate towards using workload there. But I agree that is out of scope for this PR 🙂
| if task_key: | ||
| if return_code == 0: | ||
| self.success(task_key) | ||
| self.success(task_key) # type: ignore[arg-type] |
There was a problem hiding this comment.
Why are we just ignoring this instead of getting the typing right?
Here and just below as well.
There was a problem hiding this comment.
I have removed the type :ignore in favour of using cast. Now, arguably, it's another way of doing the same thing but self.success and self.fail are defined in BaseExecutor (which I am not going to touch in this PR) so this limits what can be done to satisfy typing whilst keeping the code explicit. If you have an alternative suggestion in mind, I am all ears.
| # Add the serialized task key as the info, this will be assigned on the ti as the external_executor_id | ||
| self.running_state(task_key, ser_task_key) | ||
|
|
||
| def sync_running_tasks(self): |
There was a problem hiding this comment.
Should this method be renamed?
| @@ -263,7 +313,7 @@ def execute_async(self, key: TaskInstanceKey, command: CommandType, queue=None, | |||
|
|
|||
| def attempt_task_runs(self): | |||
|
|
||
| def execute_async( | ||
| self, | ||
| key: TaskInstanceKey | str, |
This means the PR is not currently reviewable. I have converted it to draft until I can get into an acceptable state. |
Wonderful, thanks! |
8ad7f59 to
4ec2290
Compare
|
I have just pushed my latest changes. Here is a brief summary of the last commit:
|
ferruzzi
left a comment
There was a problem hiding this comment.
I only got about half way thought this pass. Lots of improvements! I lleft a few more thoughts.
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Show resolved
Hide resolved
|
|
||
| def _process_workloads(self, workload_items: Sequence[workloads.All]) -> None: | ||
| from airflow.executors import workloads | ||
|
|
There was a problem hiding this comment.
I agree with Niko. I know this was existing code, but that was a miss in an earlier review, w is a terrible variable name. Can you fix it while you are in here please?
There was a problem hiding this comment.
Can you be more specific about what you are agreeing on with Niko? I am not sure what action is required from me on this comment besides changing the variable name?
There was a problem hiding this comment.
Sorry I wasn't clear, the variable name was what I was referring to. I see that at least some of it was existing code, but we should clean that up, not just fix your new changes.
There was a problem hiding this comment.
Sorry I wasn't clear, the variable name was what I was referring to. I see that at least some of it was existing code, but we should clean that up, not just fix your new changes.
Fixed in latest commit.
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py
Show resolved
Hide resolved
| self.log_task_event( | ||
| event="lambda invoke failure", | ||
| ti_key=task_key, | ||
| ti_key=workload_key, |
There was a problem hiding this comment.
Non-blocking: Looks like we missed this parameter name in base executor... we'll have to get that later.
There was a problem hiding this comment.
When a callback workload exceeds max submit attempts, log_task_event is called with ti_key=workload_key. For callbacks, this key is a string UUID, not a TaskInstanceKey named tuple, which will cause errors since Log(task_instance=...) expects a TaskInstanceKey. Is this a problem @ferruzzi ?
There was a problem hiding this comment.
Yup, good catch. I think the cleanest solution for now will be to add an optional workload parameter to Log() which will work for either type, and a note that we need to clean that up in the future. Don't remove the task_instance parameter since that will break things, but add the new workload and use it here.
| @@ -411,7 +467,7 @@ def process_queue(self, queue_url: str): | |||
| task_key = self.running_tasks[ser_task_key] | |||
| except KeyError: | |||
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/utils.py
Show resolved
Hide resolved
Extend AwsLambdaExecutor to support ExecuteCallback workloads introduced in Airflow 3.2. Callback workloads are queued using the callback identifier and processed alongside task workloads. Both workload types are serialized and dispatched to AWS Lambda using the Task SDK runtime entrypoint `python -m airflow.sdk.execution_time.execute_workload`. Update executor lifecycle methods to support string-based callback identifiers while preserving JSON-serialized TaskInstanceKey handling for task workloads. Add unit tests covering callback workload execution, queue override propagation, and adoption of callback workloads using string-based external_executor_id values.
…typing to align with the workload-based executor model.
4ec2290 to
4760635
Compare
Description
This change adds support for callback workloads (
ExecuteCallback) to theAwsLambdaExecutor.Previously, the executor only supported task workloads (
ExecuteTask). This update extends the executor to accept, queue, and execute callback workloads alongside task workloads. The executor now maintains aqueued_callbacksstructure and updatesqueue_workload()to registerExecuteCallbackworkloads using the callback identifier (callback.id) as the workload key.The workload processing flow has been updated to support callbacks throughout the executor lifecycle.
_process_workloads()now handles bothExecuteTaskandExecuteCallbackworkloads, dispatching them through the same execution pathway.execute_async()has been extended to serialize both workload types and forward them to the Lambda runtime usingpython -m airflow.sdk.execution_time.execute_workload. Additionally,attempt_task_runs()and related task tracking logic have been updated to support string-based workload identifiers for callbacks while maintaining JSON-serializedTaskInstanceKeyidentifiers for task workloads.Rationale
The ExecutorCallback framework introduced in Airflow 3.2 allows executors to execute synchronous callbacks on worker infrastructure. Executors must therefore be able to accept and dispatch
ExecuteCallbackworkloads in addition to task workloads.AwsLambdaExecutordelegates execution to the Task SDK runtime by invokingairflow.sdk.execution_time.execute_workloadinside the Lambda environment. As reflected in the direction of PR #62645, the expectation is that handling of both task and callback execution will ultimately occur within the Task SDK runtime rather than inside individual executors. Forwarding callback workloads to the same runtime entrypoint used for tasks aligns the Lambda executor with this model.Tests
Added unit tests verifying that:
callback.data["queue"].external_executor_idvalues.Documentation
Docstrings have been updated in
AwsLambdaExecutorto describe support for both task and callback workloads.Backwards Compatibility
This change preserves existing behavior for task workloads.
Task workloads continue to use JSON-serialized
TaskInstanceKeyidentifiers for executor tracking and scheduler adoption. Callback workloads use their callback identifier string as the workload key.Related: #62887