-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map state add tolerated failure #282
Changes from all commits
b5e79c6
6a4041c
6ffb6e9
6c32fc9
8115aa9
f516eaf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# frozen_string_literal: true | ||
|
||
module Floe | ||
class Workflow | ||
module States | ||
module RetryCatchMixin | ||
def find_retrier(error) | ||
self.retry.detect { |r| r.match_error?(error) } | ||
end | ||
|
||
def find_catcher(error) | ||
self.catch.detect { |c| c.match_error?(error) } | ||
end | ||
|
||
def retry_state!(context, error) | ||
retrier = find_retrier(error["Error"]) if error | ||
return if retrier.nil? | ||
|
||
# If a different retrier is hit reset the context | ||
if !context["State"].key?("RetryCount") || context["State"]["Retrier"] != retrier.error_equals | ||
context["State"]["RetryCount"] = 0 | ||
context["State"]["Retrier"] = retrier.error_equals | ||
end | ||
|
||
context["State"]["RetryCount"] += 1 | ||
|
||
return if context["State"]["RetryCount"] > retrier.max_attempts | ||
|
||
wait_until!(context, :seconds => retrier.sleep_duration(context["State"]["RetryCount"])) | ||
context.next_state = context.state_name | ||
context.output = error | ||
logger.info("Running state: [#{long_name}] with input [#{context.json_input}] got error[#{context.json_output}]...Retry - delay: #{wait_until(context)}") | ||
true | ||
end | ||
|
||
def catch_error!(context, error) | ||
catcher = find_catcher(error["Error"]) if error | ||
return if catcher.nil? | ||
|
||
context.next_state = catcher.next | ||
context.output = catcher.result_path.set(context.input, error) | ||
logger.info("Running state: [#{long_name}] with input [#{context.json_input}]...CatchError - next state: [#{context.next_state}] output: [#{context.json_output}]") | ||
Comment on lines
+37
to
+42
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. catcher and retrier always seemed like the same thing. you try and match it, and if it matches, then you set the next_state / output There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They are similar for sure, not sure the same thing though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ignore ^ I can play with this refactor later |
||
|
||
true | ||
end | ||
|
||
def fail_workflow!(context, error) | ||
# next_state is nil, and will be set to nil again in super | ||
# keeping in here for completeness | ||
context.next_state = nil | ||
context.output = error | ||
logger.error("Running state: [#{long_name}] with input [#{context.json_input}]...Complete workflow - output: [#{context.json_output}]") | ||
end | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,16 @@ | ||
# frozen_string_literal: true | ||
|
||
require_relative "input_output_mixin" | ||
require_relative "non_terminal_mixin" | ||
require_relative "retry_catch_mixin" | ||
Comment on lines
+3
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feels like these belong with the other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is there as well but keeping those alphabetic means this fails to resolve the constant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like making the require not alphabetical. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the require being in the place that needs it and not trying to solve a dependency graph in floe.rb 😆 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh. We have been putting all requires up front in Do we want to get away from that? wondering if InputOutputMixin and the others do not belong in Does look like these 3 mixins are used by only states, so alternatively we can just move the mixins up front. require_relative "floe/workflow/state"
require_relative "floe/workflow/states/input_output_mixin"
require_relative "floe/workflow/states/non_terminal_mixin"
# require_relative "floe/workflow/states/*_mixin"
require_relative "floe/workflow/states/choice"
require_relative "floe/workflow/states/fail"
require_relative "floe/workflow/states/map"
require_relative "floe/workflow/states/parallel"
require_relative "floe/workflow/states/pass"
require_relative "floe/workflow/states/succeed"
require_relative "floe/workflow/states/task"
require_relative "floe/workflow/states/wait" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No I don't think we need to stop doing that, I think of |
||
|
||
module Floe | ||
class Workflow | ||
module States | ||
class Task < Floe::Workflow::State | ||
include InputOutputMixin | ||
include NonTerminalMixin | ||
include RetryCatchMixin | ||
|
||
attr_reader :credentials, :end, :heartbeat_seconds, :next, :parameters, | ||
:result_selector, :resource, :timeout_seconds, :retry, :catch, | ||
|
@@ -82,54 +87,6 @@ def success?(context) | |
runner.success?(context.state["RunnerContext"]) | ||
end | ||
|
||
def find_retrier(error) | ||
self.retry.detect { |r| r.match_error?(error) } | ||
end | ||
|
||
def find_catcher(error) | ||
self.catch.detect { |c| c.match_error?(error) } | ||
end | ||
|
||
def retry_state!(context, error) | ||
retrier = find_retrier(error["Error"]) if error | ||
return if retrier.nil? | ||
|
||
# If a different retrier is hit reset the context | ||
if !context["State"].key?("RetryCount") || context["State"]["Retrier"] != retrier.error_equals | ||
context["State"]["RetryCount"] = 0 | ||
context["State"]["Retrier"] = retrier.error_equals | ||
end | ||
|
||
context["State"]["RetryCount"] += 1 | ||
|
||
return if context["State"]["RetryCount"] > retrier.max_attempts | ||
|
||
Comment on lines
-94
to
-106
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Always wanted the retrier / catcher to act just like a You ask - do you have a retrier/catcher for me? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ignore ^ - I can play with this later |
||
wait_until!(context, :seconds => retrier.sleep_duration(context["State"]["RetryCount"])) | ||
context.next_state = context.state_name | ||
context.output = error | ||
logger.info("Running state: [#{long_name}] with input [#{context.json_input}] got error[#{context.json_output}]...Retry - delay: #{wait_until(context)}") | ||
true | ||
end | ||
|
||
def catch_error!(context, error) | ||
catcher = find_catcher(error["Error"]) if error | ||
return if catcher.nil? | ||
|
||
context.next_state = catcher.next | ||
context.output = catcher.result_path.set(context.input, error) | ||
logger.info("Running state: [#{long_name}] with input [#{context.json_input}]...CatchError - next state: [#{context.next_state}] output: [#{context.json_output}]") | ||
|
||
true | ||
end | ||
|
||
def fail_workflow!(context, error) | ||
# next_state is nil, and will be set to nil again in super | ||
# keeping in here for completeness | ||
context.next_state = nil | ||
context.output = error | ||
logger.error("Running state: [#{long_name}] with input [#{context.json_input}]...Complete workflow - output: [#{context.json_output}]") | ||
end | ||
|
||
def parse_error(output) | ||
return if output.nil? | ||
return output if output.kind_of?(Hash) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is supposed to happen if both count and percentage are given (or is that not allowed)? I can think of percentages where you want, say, the minimum of 50% or 3. In that case, I think you need to check both clauses, and then return, as opposed to bailing out on the first one.
(This could have fallen through with the flip from failed? to success?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specs tend to say one vs the other.
In the branch that I had with all error checkings, I only allow or the other.
Also, you should not be able to state Next and End at the same time.
But in the short term, we've been letting these cases slide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the states language spec:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an additional nuance here that I'm wondering if we need to code explicitly (not 100% sure). The spec says
So if 0 or 100 are specified, then those have defined meanings, and I'm concerned that silly floating point math might let those fall through the cracks? I wonder if we should have at least some tests for those specific values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another very strange question but is it possible for
total
to be 0 here or does some earlier check avoid that? Asking because this is a potential divide by zero here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this will report a failure if either threshold is hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this will return early if total is zero because num_failed will also be zero, but I can add an additional / explicit total.zero? above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way it's coded though, I'm not sure it does? Taking a (very) contrived example, if we had 4 items, 2 failures, threshold count of 2, and the threshold % of 25%, then the current code will return success true, but should return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I added an explicit check for ToleratedFailurePercentage==100 (interesting the spec says it is an integer not a float so that made
== 100
easier, I did&.to_i
in the initialize)I think all concerns here are convered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, this did flip when going from
failed?
tosuccess?