-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASSET_FAILED_TO_MATERIALIZE event type #28093
base: master
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
b1a961b
to
fdc9f20
Compare
("asset_key", AssetKey), | ||
("partition", Optional[str]), | ||
("reason", AssetFailedToMaterializeReason), | ||
("error", Optional[SerializableErrorInfo]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see us maybe wanting some additional fields in the future, like which upstream asset failed, but i think we can add that when needed
@staticmethod | ||
def build_asset_failed_to_materialize_event( | ||
job_name: str, | ||
step_key: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step_key is deprecated, and i think i should be able to get the step_handle when these events are emitted. I see other fns in the class take a step_key and were written more recently than step_key was deprecated. So just want to double check what is the right thing to do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to emit these in the event log consumer daemon, I think you can have pipeline failure events that trigger the emitting of these failure events. In that case, I'm pretty sure you will not have a step_key value to pass in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the planned event always has a step key though, which is ultimately what we are using right? But maybe in the airlift case we will be emitting these when there was not a step key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step_key being deprecated is news to me. seems like we would need to add step_handle to the planned event to be able to use step_handle here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is maybe wrong then? looks like it was set as deprecated 5 years ago https://github.com/dagster-io/dagster/blame/80059632d37819bbf9eaa8b0b74ed2f16ab76b35/python_modules/dagster/dagster/_core/events/__init__.py#L448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe in the airlift case we will be emitting these when there was not a step key
good point. I can makestep_key
optional
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit a7e5c7e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good.
If we end up storing these events, we'll need to handle this event type in the graphql layer.
@staticmethod | ||
def build_asset_failed_to_materialize_event( | ||
job_name: str, | ||
step_key: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to emit these in the event log consumer daemon, I think you can have pipeline failure events that trigger the emitting of these failure events. In that case, I'm pretty sure you will not have a step_key value to pass in here.
COMPUTE_FAILED = "COMPUTE_FAILED" | ||
UPSTREAM_COMPUTE_FAILED = "UPSTREAM_COMPUTE_FAILED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's also 'run succeeded without materializing' and 'run was canceled without materializing' right? (which as i understand it is treated like a skip and doesn't turn the asset red, to match current behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. i was going to add the other events in a second pass once i had gotten the daemon set up for run failures just to break things up a bit. but no reason i can't add the reason types now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a stab at adding the different reasons i could think of. I also added two lists to split the reasons into the ones that are like failure-failures and ones that are when an asset failed to materialize in an expected way (user canceled a run, optional assets). I don't have to merge those lists in, it's more for trying to communicate the difference between the reasons.
Basically it's a rough attempt to split apart how the status for an asset in a run should affect the "global" status we display for the asset. For the skipped types, i think we would ideally be able to fall back and show the status of the latest run where the status of the asset is not a skipped type. Not sure how we can do that though with having the "skipped" and "failed" reasons all in one event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure about splitting out user terminated vs system/external terminated. my thinking here is that if a user cancels or deletes a run, then the assets that didn't materialize shouldn't be in a failed state. This is the behavior today. But if a run is killed (by some external process or we cancel it because it runs too long) that should get surfaced as a failure. The thing that's unclear to me is if we are able to split out the run canceled events like that? from a given cancel event are we able to see if it was user initiated or not?
34ced85
to
cae661f
Compare
69c8444
to
78b50b2
Compare
## Summary & Motivation ## How I Tested These Changes ## Changelog > Insert changelog entry or delete this section.
78b50b2
to
d0263ad
Compare
Summary & Motivation
Adds the new
ASSET_FAILED_TO_MATERIALIZE
event type. Nothing emits or reads this event yet.internal pr https://github.com/dagster-io/internal/pull/14177
How I Tested These Changes
simple unit test to confirm that the new event is written to and read from the event log