Renaming intermediary nodes to input nodes #1052

zechigan · 2024-07-22T16:17:32Z

zechigan
Jul 22, 2024

Hello!

Suppose I have a DAG that does the following:

|observations| -> winsorsized_observations -> normalized_winsorsized_observations -> ...

Given an external input observations, the node winsorsized_observations winsorsizes it followed by normalized_winsorsized_observations that performs normalization on top of that.

Something I thought could be good to have is to be able to rename some intermediary nodes back to the input name i.e., I rename normalized_winsorsized_observations back to observations, and any child nodes now knows observations no longer refers to the input observations:

# From this
def regression(normalized_winsorsize_observations: pd.DataFrame) -> pd.DataFrame:
     ...


# To this
def regression(observations: pd.DataFrame) -> pd.DataFrame:
     ...

Obviously, this could be achieved by having two DAGs executed separately. But I would like to know whether being able to somehow splice the two DAGs is a good idea, and if it isn't, what Hamilton principles does this go against.

Thank you!

skrawcz · 2024-07-22T22:39:35Z

skrawcz
Jul 22, 2024
Maintainer

@zechigan thanks for the question.

Yes there's a few ways to do this, and we also have two issues open related to this #922 & #701 (and also this #1045 so it's clearer what the options are).

Let me quickly recap a few ways you could try to get at this:

Use @pipe. E.g. something like

def raw_regression(...) -> pd.DataFrame:
   # code to load
   return df

@pipe(
  @step(_winsorsized_observations),
  @step(_normalized_winsorsized_observations),
)
def regression(raw_regression: pd.DataFrame) -> pd.DataFrame:
    return raw_regression

Use@subdag.
Use Parallelizable/Collect if appropriate.
Use some combination of the above.

Without knowing more about your context, it's hard to say what would work best for you -- also there's more decorators that could also help (e.g. @parameterize).

My suggestion is to watch the youtube video I did and see if that helps you -- if not let's chat / add more here to determine how we can make it better :)

References:

Slides from meetup on this - https://github.com/skrawcz/talks/files/14657471/Hamilton.March.2024.Meetup.pdf
youtube video explaining some of this(see deep dive section - it walks through some of the options above)
Notebook that was walked through in meet up

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renaming intermediary nodes to input nodes #1052

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Renaming intermediary nodes to input nodes #1052

zechigan Jul 22, 2024

Replies: 1 comment

skrawcz Jul 22, 2024 Maintainer

zechigan
Jul 22, 2024

skrawcz
Jul 22, 2024
Maintainer