Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSET_FAILED_TO_MATERIALIZE event type #28093

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented Feb 27, 2025

Summary & Motivation

Adds the new ASSET_FAILED_TO_MATERIALIZE event type. Nothing emits or reads this event yet.

internal pr https://github.com/dagster-io/internal/pull/14177

How I Tested These Changes

simple unit test to confirm that the new event is written to and read from the event log

Copy link
Contributor Author

jamiedemaria commented Feb 27, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@jamiedemaria jamiedemaria changed the title event type declaration PLANNED_ASSET_MATERIALIZATION_FAILURE event type Feb 27, 2025
@jamiedemaria jamiedemaria changed the title PLANNED_ASSET_MATERIALIZATION_FAILURE event type [wip] PLANNED_ASSET_MATERIALIZATION_FAILURE event type Feb 27, 2025
@jamiedemaria jamiedemaria changed the title [wip] PLANNED_ASSET_MATERIALIZATION_FAILURE event type [wip] ASSET_FAILED_TO_MATERIALIZE event type Mar 4, 2025
@jamiedemaria jamiedemaria force-pushed the jamie/asset-failure-event branch 3 times, most recently from b1a961b to fdc9f20 Compare March 5, 2025 21:02
@jamiedemaria jamiedemaria mentioned this pull request Mar 5, 2025
("asset_key", AssetKey),
("partition", Optional[str]),
("reason", AssetFailedToMaterializeReason),
("error", Optional[SerializableErrorInfo]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see us maybe wanting some additional fields in the future, like which upstream asset failed, but i think we can add that when needed

@staticmethod
def build_asset_failed_to_materialize_event(
job_name: str,
step_key: str,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step_key is deprecated, and i think i should be able to get the step_handle when these events are emitted. I see other fns in the class take a step_key and were written more recently than step_key was deprecated. So just want to double check what is the right thing to do here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to emit these in the event log consumer daemon, I think you can have pipeline failure events that trigger the emitting of these failure events. In that case, I'm pretty sure you will not have a step_key value to pass in here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the planned event always has a step key though, which is ultimately what we are using right? But maybe in the airlift case we will be emitting these when there was not a step key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step_key being deprecated is news to me. seems like we would need to add step_handle to the planned event to be able to use step_handle here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe in the airlift case we will be emitting these when there was not a step key

good point. I can makestep_key optional

@jamiedemaria jamiedemaria marked this pull request as ready for review March 5, 2025 21:26
@jamiedemaria jamiedemaria requested review from gibsondan and prha March 5, 2025 21:26
@jamiedemaria jamiedemaria changed the title [wip] ASSET_FAILED_TO_MATERIALIZE event type ASSET_FAILED_TO_MATERIALIZE event type Mar 5, 2025
Copy link

github-actions bot commented Mar 6, 2025

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-ex6zt9cqj-elementl.vercel.app
https://jamie-asset-failure-event.core-storybook.dagster-docs.io

Built with commit a7e5c7e.
This pull request is being automatically deployed with vercel-action

Copy link
Member

@prha prha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good.

If we end up storing these events, we'll need to handle this event type in the graphql layer.

@staticmethod
def build_asset_failed_to_materialize_event(
job_name: str,
step_key: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to emit these in the event log consumer daemon, I think you can have pipeline failure events that trigger the emitting of these failure events. In that case, I'm pretty sure you will not have a step_key value to pass in here.

Comment on lines 1592 to 1593
COMPUTE_FAILED = "COMPUTE_FAILED"
UPSTREAM_COMPUTE_FAILED = "UPSTREAM_COMPUTE_FAILED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also 'run succeeded without materializing' and 'run was canceled without materializing' right? (which as i understand it is treated like a skip and doesn't turn the asset red, to match current behavior)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. i was going to add the other events in a second pass once i had gotten the daemon set up for run failures just to break things up a bit. but no reason i can't add the reason types now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a stab at adding the different reasons i could think of. I also added two lists to split the reasons into the ones that are like failure-failures and ones that are when an asset failed to materialize in an expected way (user canceled a run, optional assets). I don't have to merge those lists in, it's more for trying to communicate the difference between the reasons.

Basically it's a rough attempt to split apart how the status for an asset in a run should affect the "global" status we display for the asset. For the skipped types, i think we would ideally be able to fall back and show the status of the latest run where the status of the asset is not a skipped type. Not sure how we can do that though with having the "skipped" and "failed" reasons all in one event

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure about splitting out user terminated vs system/external terminated. my thinking here is that if a user cancels or deletes a run, then the assets that didn't materialize shouldn't be in a failed state. This is the behavior today. But if a run is killed (by some external process or we cancel it because it runs too long) that should get surfaced as a failure. The thing that's unclear to me is if we are able to split out the run canceled events like that? from a given cancel event are we able to see if it was user initiated or not?

@jamiedemaria jamiedemaria force-pushed the jamie/asset-failure-event branch from 34ced85 to cae661f Compare March 7, 2025 18:40
@jamiedemaria jamiedemaria requested a review from gibsondan March 7, 2025 19:15
@jamiedemaria jamiedemaria force-pushed the jamie/asset-failure-event branch 2 times, most recently from 69c8444 to 78b50b2 Compare March 11, 2025 19:55
@jamiedemaria jamiedemaria force-pushed the jamie/asset-failure-event branch from 78b50b2 to d0263ad Compare March 12, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants