Skip to content

Fix SFN.client.start_execution() idempotency#9397

Merged
bblommers merged 8 commits intogetmoto:masterfrom
chriselion:celion/fix-state_machine.start_execution-idempotency
Nov 13, 2025
Merged

Fix SFN.client.start_execution() idempotency#9397
bblommers merged 8 commits intogetmoto:masterfrom
chriselion:celion/fix-state_machine.start_execution-idempotency

Conversation

@chriselion
Copy link
Contributor

@chriselion chriselion commented Oct 23, 2025

Resolves #9394

When an execution with the same name is found, it now compares the inputs:

  • If the inputs are different, an ExecutionAlreadyExists exception is raised (same as before)
  • If the inputs are the same, the original execution is returned, instead of creating a new one.

I updated the existing test case to pass an input, and added another test to check that duplicating the name and input is idempotent.


I'm not sure how we should define "equal" inputs here. StateMachine.start_execution() saves the json-decoded input on the Execution

execution = Execution(
region_name=region_name,
account_id=account_id,
state_machine_name=self.name,
execution_name=execution_name,
state_machine_arn=self.arn,
execution_input=json.loads(execution_input),

although the field is supposed to be a str
class Execution:
def __init__(
self,
region_name: str,
account_id: str,
state_machine_name: str,
execution_name: str,
state_machine_arn: str,
execution_input: str,

Since the original input is lost, I'm comparing the json-decoded values.

I can do a little more digging to see what AWS actually does (although if you have some intuition, that would save me some time). If we need to check the exact string equality instead, that'll take some more changes.

Comment on lines 576 to 582
input='{"a": "b", "c": "d"}',
)
#
execution_two = client.start_execution(
stateMachineArn=sm["stateMachineArn"],
name="execution_name",
input='{"c": "d", "a": "b"}',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a more concrete example of what I meant in the description - are these inputs equal or not?

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.02%. Comparing base (443beaf) to head (1966805).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
moto/stepfunctions/parser/models.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #9397   +/-   ##
=======================================
  Coverage   93.02%   93.02%           
=======================================
  Files        1293     1293           
  Lines      116181   116187    +6     
=======================================
+ Hits       108081   108087    +6     
  Misses       8100     8100           
Flag Coverage Δ
servertests 28.82% <9.09%> (-0.01%) ⬇️
unittests 93.00% <90.90%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bblommers
Copy link
Collaborator

bblommers commented Oct 23, 2025

Hi @chriselion , thanks for raising the PR!

although if you have some intuition, that would save me some time

No, I'm afraid not - I would (also) have to do a bunch of manual testing to determine the exact behavior.

This does look like a tricky scenario, so it does require some more research IMO.

I'd be happy to do so, I just can't guarantee a timeline when that would happen. Let me know what you prefer.

(Edit: also marking this PR as a draft, to make it clear it's not quite ready yet.)

@bblommers bblommers marked this pull request as draft October 23, 2025 08:07
@chriselion
Copy link
Contributor Author

chriselion commented Oct 23, 2025

I ran this script

import boto3
import json
import os

definition = {
  "StartAt": "Wait",
  "States": {
    "Wait": {
      "Type": "Wait",
      "Seconds": 60,
      "End": True
    }
  },
  "QueryLanguage": "JSONata"
}

input1 = '{"a": "b", "c": "d"}'
input2 = '{"c": "d", "a": "b"}'
execution_name = "testExecution"
assert json.loads(input1) == json.loads(input2)


client = boto3.client("stepfunctions")

print("creating state machine")
resp = client.create_state_machine(
    name="motoRepro1",
    definition=json.dumps(definition),
    roleArn=os.getenv("ROLE_ARN")
)
state_machine_arn = resp["stateMachineArn"]

try:
    execution_response = client.start_execution(
        stateMachineArn=state_machine_arn,
        name=execution_name,
        input=input1,
    )
    execution_arn = execution_response["executionArn"]
    print("Initial execution successful")
except:
    print("Initial execution failed")

try:
    identical_response = client.start_execution(
        stateMachineArn=state_machine_arn,
        name=execution_name,
        input=input1,
    )
    print("Identical execution successful")
    print(f"{(execution_arn==identical_response["executionArn"])=}")
except:
    print("Identical execution failed")

try:
    equivalent_response = client.start_execution(
        stateMachineArn=state_machine_arn,
        name=execution_name,
        input=input2,
    )
    print("Equivalent execution successful")
    print(f"{(execution_arn==equivalent_response["executionArn"])}")
except:
    print("Equivalent execution failed")

print("deleting state machine")
client.delete_state_machine(stateMachineArn=state_machine_arn)

which produces this output:

creating state machine
Initial execution successful
Identical execution successful
(execution_arn==identical_response["executionArn"])=True
Equivalent execution failed
deleting state machine

So I'll update Execution to store the string form of the execution_input (which is what the type hint indicates anyway). Hopefully I can get to that today.

@chriselion
Copy link
Contributor Author

I tried this tonight, but it's more complicated than I expected. I removed the json.load() call in start_execution() and figured I'd need to remove the corresponding json.dumps() here

"input": json.dumps(execution.execution_input),

but that ended up breaking a lot of tests.

self.state_machine = state_machine
self._cloud_watch_logging_session = cloud_watch_logging_session
self.input_data = input_data
self.input_data = json.loads(input_data)
Copy link
Contributor Author

@chriselion chriselion Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now input_data is saved as decoded json, and execution_input is a str - previously the decoding happened in StepFunctionsParserBackend.start_execution, so both would be decoded.

@chriselion
Copy link
Contributor Author

chriselion commented Nov 10, 2025

@bblommers - I finally had a chance to come back to this; I think I got it everything working. Part of my initial confusion was the two Execution classes, and I only fixed one of them.

Several tests are failing because of the issue you noted in #9459 - I'll merge the fix into my branch when it lands. Merged master after #9460

@chriselion chriselion marked this pull request as ready for review November 10, 2025 05:55
Copy link
Collaborator

@bblommers bblommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - thank you for investigating this and for contributing the fix @chriselion!

@bblommers bblommers added this to the 5.1.17 milestone Nov 13, 2025
@bblommers bblommers merged commit 98e327f into getmoto:master Nov 13, 2025
65 checks passed
@chriselion chriselion deleted the celion/fix-state_machine.start_execution-idempotency branch November 13, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SFN.Client.start_execution() doesn't follow same idempotency guarantees as boto3

2 participants