Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resolved_kwargs to data_saver and data_loader tags #1136

Open
Riezebos opened this issue Sep 12, 2024 · 4 comments
Open

Add resolved_kwargs to data_saver and data_loader tags #1136

Riezebos opened this issue Sep 12, 2024 · 4 comments

Comments

@Riezebos
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When I have a built dataflow I would like to be able to see which paths are entered in @load_from and @save_to.

Describe the solution you'd like
After executing the dataflow I can see the paths in the results, but I'd like to be able to see them without executing the dataflow.

Some metadata is already being written to tags: https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/function_modifiers/adapters.py#L578

I tested adding the following line there:

                "hamilton.data_saver.kwargs": resolved_kwargs,

Then I tried running examples/parallelism/star_counting/run.py with the dr.execute statement replaced by:

    node = next(
        node for node in dr.list_available_variables() if node.name == "save.unique_stargazers"
    )
    print(node.as_dict()["tags"])

This gives the output I was hoping for:

{'hamilton.data_saver': True, 'hamilton.data_saver.sink': 'csv', 'hamilton.data_saver.classname': 'PandasCSVWriter', 'hamilton.data_saver.kwargs': {'path': 'unique_stargazers.csv'}}

Describe alternatives you've considered
Maybe a custom DataLoader and DataSaver that store the arguments they were initiated with?

@skrawcz
Copy link
Collaborator

skrawcz commented Sep 12, 2024

@Riezebos thanks for the issue. This sounds similar to another conversation @elijahbenizzy and @vograno were having about exposing bound values...

Question on your intended user experience. To confirm, it seems you'd be happy getting this via the node object you have above?

@Riezebos
Copy link
Contributor Author

Riezebos commented Sep 12, 2024

Yes, for me that would be great!

If I try to think of a potentially better ux, disregarding how the driver and tags are currently implemented it might look something like:

node = dr.get_node("save.unique_stargazers") # or a dictionary, but a way to get a node by name without iterating over them
if node.data_saver and node.data_saver.name == "csv":
    print(node.data_saver.kwargs)

But adding it to the tags that are already implemented would be a great solution in my opinion :)

@elijahbenizzy
Copy link
Collaborator

OK, adding in -- I think that this makes sense. Having a non-iteration access is good -- mind adding another issue on that?

For this, I think it makes sense to add as "attributes" -- mix in with this concpt: #1129.

Then we can attach the kwargs (as you did). These will be the non-resolved kwargs (e.g. with source in it still). We can probably also attach the same stuff at runtime with metadata-- e.g. just add a field materializer_metadata in the materialized metadata for everything that returns all the kwargs we have.

@Riezebos
Copy link
Contributor Author

Regarding the non-iteration access, I created another issue: #1138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants