Visualise `Pipeline` objects #1993

astrojuanlu · 2024-07-21T10:29:19Z

AS A Kedro user
I WANT TO visualise Pipeline objects directly in notebooks
SO THAT

I don't need the full Kedro Framework structure (a requirement for %run_viz)

I can interactively visualise Pipeline objects while I am creating them

Originally #1459, extra context in #1833 (comment) reproduced below:

I am showcasing Kedro concepts on a notebook without creating a full-fledged project. Took https://github.com/ibis-project/kedro-ibis-tutorial/blob/main/03%20-%20First%20Steps%20with%20Kedro.ipynb as inspiration, and adapted it to Spark and Databricks (will try to publish that soon).

However, since there is no Kedro Framework project, there is no way I can visualise my pipelines, even though I have a Pipeline object perfectly defined:

It would be insanely awesome if I could do KedroViz().visualize(pipe).show() or something like that, without ever needing to set-up a Kedro project.

The text was updated successfully, but these errors were encountered:

yury-fedotov · 2024-07-23T11:39:56Z

@astrojuanlu interesting use case. Have you seen a lot that users define pipelines in notebooks or import them to there?

I thought vast majority of notebook usage is to do catalog.load("something") and then some EDA. While all pipeline definition is in .py files.

astrojuanlu · 2024-07-25T19:38:16Z

Have you seen a lot that users define pipelines in notebooks

I have not, and probably the reason is that traditionally Kedro had taken sort of an anti-notebook stance. We evolved that in 2023, for example by writing https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html

I've personally found it very handy to explain things to data scientists with notebooks when teaching. See for example https://github.com/ibis-project/kedro-ibis-tutorial/blob/main/03%20-%20First%20Steps%20with%20Kedro.ipynb, recording (very well received) or https://github.com/astrojuanlu/kedro-databricks-demo/blob/main/First%20Steps%20with%20Kedro%20on%20Databricks.ipynb (essentially the same thing, but with a ManagedTableDataset connecting to DBX UC). Being able to visualise the pipelines there directly would be awesome I think.

or import them to there?

We launched a feature earlier this year to do something like that https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.html#load-node-line-magic it's for nodes rather than full pipelines though.

I thought vast majority of notebook usage is to do catalog.load("something") and then some EDA.

That's our impression too yes (and in fact I do that all the time). So this issue would be about taking that one little step further.

astrojuanlu · 2024-09-06T08:23:37Z

A user just asked about this.

astrojuanlu · 2024-09-06T08:37:46Z

(And it had nothing to do with notebooks)

KikiCS · 2024-09-06T10:01:20Z

Hello, I add some context for my use-case after sending a message on Slack.
Kedro viz diagrams are very useful for non-technical people wanting to get a high-level view of the data pipeline.
While documenting models in my company internal Notion, I thought including a kedro viz diagram would be super useful, as well as generating a new one every time a change to the pipeline is released.
I got the idea when I saw that Notion shows diagrams written in Mermaid, but I don't know and haven't checked if kedro viz is based on Mermaid under the hood.

astrojuanlu · 2024-10-14T08:54:16Z

Prior art: #1668 (comment)

ravi-kumar-pilla · 2025-01-15T19:50:40Z

Hi @astrojuanlu ,

Did some experimental implementation and it seems to be feasible 💯 . Haven't tested complex parts to start off with. But the simple pipelines seems achievable with some limitations. I will be doing some more testing before documenting the limitations.

Thank you

astrojuanlu · 2025-01-16T07:44:46Z

Fantastic @ravi-kumar-pilla ! So #2241 basically launches a Viz server and then embeds that as an iframe, right?

Do you think it's feasible to do this using only the frontend React component, without a server? To reduce overhead and have better control of what's presented. For example, it would be nice if the left toolbar, the node filter area, and the other toolbar weren't even displayed.

ravi-kumar-pilla · 2025-01-16T16:05:54Z

Fantastic @ravi-kumar-pilla ! So #2241 basically launches a Viz server and then embeds that as an iframe, right?

You are right.

Do you think it's feasible to do this using only the frontend React component, without a server? To reduce overhead and have better control of what's presented. For example, it would be nice if the left toolbar, the node filter area, and the other toolbar weren't even displayed.

Yes, I am thinking about this as well. We can either inject a config header and hide parts of viz or as you said we can totally go with react component. I am exploring on this too. I will update on this. Thank you

ravi-kumar-pilla · 2025-01-22T00:25:51Z

Hi @astrojuanlu ,

I tried using KedroViz directly in HTML but we do not bundle KedroViz to be used directly via a CDN link (or I could not find a way to use the package that way). I tried locally and there seems to be some compatibility issues. I reached out to @Huongg and she will have a look at the issue. For now, I tried config on top of starting server. This would be a first attempt at this feature. We can improve on the performance at later stages.

If the bundling approach takes time, I would suggest we go with the run_server approach and giving the user ability to configure what he/she can see on viz (like hiding everything except the flowchart view might be default). I have a PR which implements that (needs some polishing but works well). Let me know what you think.

Screenshot after configuring only flowchart view :

cc: @rashidakanchwala

Thank you

astrojuanlu · 2025-01-22T11:55:17Z

Thanks a lot @ravi-kumar-pilla !

I tried using KedroViz directly in HTML but we do not bundle KedroViz to be used directly via a CDN link (or I could not find a way to use the package that way).

Indeed, I don't see a bundled version in https://cdn.jsdelivr.net/npm/@quantumblack/kedro-viz/ What would be the cost of doing it?

I tried locally and there seems to be some compatibility issues.

Could you describe them a bit more?

I know the UI would look the same in either case but probably the DX is going to be much better if we avoid the server. A server needs to allocate a port, needs the proper Python dependencies installed, etc. I think we need to continue exploring the feasability of doing a JS-only solution.

astrojuanlu · 2025-01-23T10:56:43Z

Yesterday we briefly discussed this.

@ravi-kumar-pilla clarified that with the current proposal (#2241), even if we do use only the frontend, the user would still need to install Kedro Viz anyway.

Logging my current understanding of the situation:

From https://github.com/kedro-org/kedro-viz-standalone/blob/main/src/App.js, all that's needed is to go from a kedro.pipeline.Pipeline object to a JSON representation that resembles https://github.com/kedro-org/kedro-viz/blob/e418ecd/src/utils/data/spaceflights.mock.json

However, this is easier said than done. For starters, I couldn't find a schema that defines what properties are expected in that JSON - although they can be derived from other pleaces. The constructor suggests that there are 4 mandatory ones

kedro-viz/src/components/app/app.js

Lines 89 to 93 in e418ecd

    
           PropTypes.shape({ 
        
             edges: PropTypes.array.isRequired, 
        
             layers: PropTypes.array, 
        
             nodes: PropTypes.array.isRequired, 
        
             tags: PropTypes.array,

but actually the response returned by the API has a few more

kedro-viz/package/kedro_viz/api/rest/responses/pipelines.py

Lines 203 to 209 in e418ecd

    
           nodes: List[NodeAPIResponse] 
        
           edges: List[GraphEdgeAPIResponse] 
        
           layers: List[str] 
        
           tags: List[NamedEntityAPIResponse] 
        
           pipelines: List[NamedEntityAPIResponse] 
        
           modular_pipelines: ModularPipelinesTreeAPIResponse 
        
           selected_pipeline: str

(this is a Pydantic model)

This, in turn, is generated here

kedro-viz/package/kedro_viz/api/rest/responses/pipelines.py

Lines 212 to 238 in e418ecd

    
           def get_pipeline_response( 
        
               pipeline_id: Union[str, None] = None, 
        
           ) -> Union[GraphAPIResponse, JSONResponse]: 
        
               """API response for `/api/pipelines/pipeline_id`.""" 
        
               if pipeline_id is None: 
        
                   pipeline_id = data_access_manager.get_default_selected_pipeline().id 
        
               if not data_access_manager.registered_pipelines.has_pipeline(pipeline_id): 
        
                   return JSONResponse(status_code=404, content={"message": "Invalid pipeline ID"}) 
        
               modular_pipelines_tree = ( 
        
                   data_access_manager.create_modular_pipelines_tree_for_registered_pipeline( 
        
                       pipeline_id 
        
                   ) 
        
               ) 
        
               return GraphAPIResponse( 
        
                   nodes=data_access_manager.get_nodes_for_registered_pipeline(pipeline_id), 
        
                   edges=data_access_manager.get_edges_for_registered_pipeline(pipeline_id), 
        
                   tags=data_access_manager.tags.as_list(), 
        
                   layers=data_access_manager.get_sorted_layers_for_registered_pipeline( 
        
                       pipeline_id 
        
                   ), 
        
                   pipelines=data_access_manager.registered_pipelines.as_list(), 
        
                   modular_pipelines=modular_pipelines_tree, 
        
                   selected_pipeline=pipeline_id, 
        
               )

which gets populated here

kedro-viz/package/kedro_viz/data_access/managers.py

Line 145 in 214ee8f

def add_pipeline(self, registered_pipeline_id: str, pipeline: KedroPipeline):

In other words: the logic to transform a Python pipeline into the expected JSON structure is complex as it stands now.

I think this is taking me again to kedro-org/kedro#4363, which has several use cases, possibly including this one.

In the meantime, as part of the spike @ravi-kumar-pilla could you keep exploring the bundling issues just in case? And describe what you've found in the meantime.

ravi-kumar-pilla · 2025-01-23T14:59:22Z

Hi @astrojuanlu ,

Thank you for the comment. You are 💯 correct on -

In other words: the logic to transform a Python pipeline into the expected JSON structure is complex as it stands now.

In the meantime, as part of the spike @ravi-kumar-pilla could you keep exploring the bundling issues just in case? And describe what you've found in the meantime.

Regarding the bundling, I have resolved the issue and the bundle can be used directly in html. However, the generated html works well in browser but have issues with jupyter notebook. I will try fixing the issues today (some issues are around window object which is different from browser window)

Once this is done, I will try to see what is the bare minimum requirements needed to get this working. As you mentioned in the comment, since we generate json via Kedro-Viz, we need to install kedro-viz. Instead of the complex viz backend.

On a side note, if we can use to_json() of kedro to generate the pipeline json and frontend can interpret it, we can have viz jupyter notebook experience much better.

Thank you

ravi-kumar-pilla · 2025-01-24T06:12:07Z

Hi @astrojuanlu ,

I am able to display KedroViz using the bundle approach inside notebook (pending additional testing), but we can see a demo implementation in the PR.

Documenting the current approach and local testing methodology for reference.

Current Approach:

In KedroViz we will have a class KedroVizNotebook which exposes an api/method visualize. Below is the method definition -

# [TODO: will add options to display certain parts of viz, for now the default is only chart view. 
# We can also add more customization if needed]
def visualize(self, pipeline: Pipeline, catalog: DataCatalog = None, embed_in_notebook=True):

Internally we load and populate our kedroViz backend repositories
Instantiate a dummy catalog (i.e., all datasets are of type MemoryDataset)
Get the json required for the html
Inject the json data to the html template and save the html to a file .viz/viz_jupyter_exploration.html (filename is configurable)
Display an iframe pointing to the saved html file

Pre-requisites:

Installations required: Kedro, Kedro-Viz, jupyter notebook
To test locally we need to create a bundle using webpack. The PR has the webpack config. From root dir of kedroViz, execute

# Assuming you have webpack from package.json of kedroViz. 
# If not already installed do npm install webpack

npx webpack --mode development

# This will create a viz bundle. The bundle needs to be served for now as it is not published
# Use a local server for publishing. Navigate to the bundle folder `/dist` and run

python -m http.server 8000

# Make sure http://localhost:8000/kedroViz.bundle.js is accessible

Custom jupyter config (Needs discussion)

jupyter notebook --generate-config

# Go to config file path and add at end of the file
c.ContentsManager.allow_hidden = True

NOTE: The html content is currently saved to a file which is placed under .viz folder. This needs jupyter config to be updated, as by default the notebook cannot access hidden folders.

Testing Current Approach:

# In case of demo_project
cd demo_project
kedro jupyter notebook

# Run each cell present in demo-project/viz_jupyter_test.ipynb of the PR or 
# Instantiate your pipeline and execute below code in the jupyter cell
from kedro_viz.launchers.experimental_viz import KedroVizNotebook
KedroVizNotebook().visualize(pipe)

Other approaches:
a. Using KedroViz run_server with the pipeline information. This needs us to start a process which runs uvicorn server, serving a FASTAPI app specifically for notebook users. There might be some delay as we start a process and get uvicorn running.
b. Using the html content text directly i.e., display(HTML(html_text)). I faced issues with this approach like blank cell, window object not recognized etc), if anyone has experience in this, would be great to explore as we are not creating an extra file.
c. Creating a temp file which gets deleted after jupyter session. Somehow the notebook could not access these files, again not sure if I was missing some config.

Questions:
i. Can we have the html file generated in the user's cwd where they launched the notebook ?
ii. If not (a), is it fine to ask the user update jupyter config to allow hidden files discovery and we always save it to .viz ?
iii. Do you know a way to directly display html without saving the file ?

Some questions related to testing and expectations from MVP or first draft:

iv. Since this was a spike, I did not test complex pipelines but I hope the approach works. Can we assume testing the demo_project size pipeline a success for first draft ?
v. Could you please let me know other env (i.e., databricks etc) to test this on ?
vi. For first draft, what are the expectations in case the above approach works ?

Next steps:

There is an unpolished implementation in the PR, which needs modifications based on the discussion outcome here
Publish the kedro-viz.bundle.js to npm which can be referred via CDN directly in the html text.

cc: @rashidakanchwala

Thank you

astrojuanlu · 2025-01-24T13:43:05Z

Thanks a lot for the update @ravi-kumar-pilla !

b. Using the html content text directly i.e., display(HTML(html_text)). I faced issues with this approach like blank cell, window object not recognized etc), if anyone has experience in this, would be great to explore as we are not creating an extra file.

That's what I had in mind (this or the _repr_html_ method). Definitely it would be preferable to not create an extra file (which, IIUC, requires the custom Jupyter config)

Can we assume testing the demo_project size pipeline a success for first draft ?

Yes!

astrojuanlu · 2025-01-27T15:10:54Z

In the interest of time boxing this effort and ship incremental improvements towards the final goal, for now let's focus the on having the Webpack bundle introduced in #2241 be part of the normal Kedro Viz release flow.

In parallel, we can show the current PoC to users to gather their early feedback.

ravi-kumar-pilla · 2025-01-27T15:16:47Z

In the interest of time boxing this effort and ship incremental improvements towards the final goal, for now let's focus the on having the Webpack bundle introduced in #2241 be part of the normal Kedro Viz release flow.

In parallel, we can show the current PoC to users to gather their early feedback.

Sounds good @astrojuanlu . I will introduce the bundling into the current workflow.

astrojuanlu · 2025-02-03T09:24:39Z

Bringing part of the discussion on #2256 here:

Looks like there are some present challenges with the bundling approach #2256 (comment)

@rashidakanchwala commented:

We need to evaluate maintainability, effort vs. impact, and alignment with future Kedro-Viz developments. Given that the second half of 2025 will focus on the Pipeline Editor, which will be a major architectural shift, I think we should take a step back and plan PS sessions to align on the best direction for Kedro-Viz.

Next Steps, can we do a PS session:
* Original problem - Kedro-Viz in Notebooks
  
  * Reviewing the user feedback we’ve received so far on the high fidelity prototypes
  * Understand what we can do to release an MVP soon

* UMD Bundling
  
  * What additional benefits would UMD bundling bring beyond notebook integration? cons to this?
and also another session on :-
* Broader Kedro-Viz Architecture
  
  * Flowchart Rendering as a separate Library
  * Separate the logic of Kedro project --> structure json pipeline in another package ([Spin off pipeline inspection to separate package kedro#4363](https://github.com/kedro-org/kedro/issues/4363))
  * Pros/cons of the above two?
  * Would this help future-proof Kedro-Viz, especially for the Pipeline Editor?

ravi-kumar-pilla · 2025-02-04T17:09:56Z

Hi @astrojuanlu ,

I had a discussion with Rashida and we agreed on shipping this feature as experimental using the production bundle. Here is what we will do to ship the feature in the next release -

Create a folder umd in the current kedro-viz GH repository
Upload the production bundles kedro-viz.production.min.js and vendors.production.min.js to the umd folder
Automate the process of bundling and updating the bundles via make release, i.e., add a step to update this folder when we do a new release
Use these bundles in the backend of our NotebookVisualizer class and create a html template which will be displayed in the notebook cell.

Let me know what you think of the approach.

Thank you

astrojuanlu · 2025-02-04T17:16:04Z

@ravi-kumar-pilla Let's proceed 👍🏼

astrojuanlu added this to Kedro-Viz Jul 21, 2024

astrojuanlu moved this to Backlog in Kedro-Viz Jul 21, 2024

astrojuanlu added the Enhancement label Jul 21, 2024

rashidakanchwala moved this from Backlog to Inbox in Kedro-Viz Jul 29, 2024

astrojuanlu changed the title ~~Visualise Pipeline objects directly in notebooks~~ Visualise Pipeline objects Sep 6, 2024

rashidakanchwala added the Technical Design label Sep 9, 2024

rashidakanchwala moved this from Inbox to Backlog in Kedro-Viz Sep 9, 2024

rashidakanchwala assigned ravi-kumar-pilla Nov 7, 2024

rashidakanchwala moved this from Backlog to Todo in Kedro-Viz Jan 13, 2025

astrojuanlu moved this from Todo to In Progress in Kedro-Viz Jan 13, 2025

ravi-kumar-pilla linked a pull request Jan 15, 2025 that will close this issue

Visualize pipeline objects in notebook #2241

Open

5 tasks

astrojuanlu mentioned this issue Jan 24, 2025

%run_viz Jupyter line magic inconsistency #1823

Open

1 task

astrojuanlu added this to Kedro 🔶 Jan 28, 2025

astrojuanlu moved this to In Progress in Kedro 🔶 Jan 28, 2025

astrojuanlu mentioned this issue Feb 3, 2025

Add UMD bundle for Kedro-Viz #2256

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualise `Pipeline` objects #1993

Visualise `Pipeline` objects #1993

astrojuanlu commented Jul 21, 2024

yury-fedotov commented Jul 23, 2024

astrojuanlu commented Jul 25, 2024

astrojuanlu commented Sep 6, 2024

astrojuanlu commented Sep 6, 2024

KikiCS commented Sep 6, 2024

astrojuanlu commented Oct 14, 2024

ravi-kumar-pilla commented Jan 15, 2025

astrojuanlu commented Jan 16, 2025

ravi-kumar-pilla commented Jan 16, 2025

ravi-kumar-pilla commented Jan 22, 2025

astrojuanlu commented Jan 22, 2025

astrojuanlu commented Jan 23, 2025 •

edited

Loading

ravi-kumar-pilla commented Jan 23, 2025

ravi-kumar-pilla commented Jan 24, 2025

astrojuanlu commented Jan 24, 2025

astrojuanlu commented Jan 27, 2025

ravi-kumar-pilla commented Jan 27, 2025

astrojuanlu commented Feb 3, 2025

ravi-kumar-pilla commented Feb 4, 2025

astrojuanlu commented Feb 4, 2025

Visualise Pipeline objects #1993

Visualise Pipeline objects #1993

Comments

astrojuanlu commented Jul 21, 2024

yury-fedotov commented Jul 23, 2024

astrojuanlu commented Jul 25, 2024

astrojuanlu commented Sep 6, 2024

astrojuanlu commented Sep 6, 2024

KikiCS commented Sep 6, 2024

astrojuanlu commented Oct 14, 2024

ravi-kumar-pilla commented Jan 15, 2025

astrojuanlu commented Jan 16, 2025

ravi-kumar-pilla commented Jan 16, 2025

ravi-kumar-pilla commented Jan 22, 2025

astrojuanlu commented Jan 22, 2025

astrojuanlu commented Jan 23, 2025 • edited Loading

ravi-kumar-pilla commented Jan 23, 2025

ravi-kumar-pilla commented Jan 24, 2025

astrojuanlu commented Jan 24, 2025

astrojuanlu commented Jan 27, 2025

ravi-kumar-pilla commented Jan 27, 2025

astrojuanlu commented Feb 3, 2025

ravi-kumar-pilla commented Feb 4, 2025

astrojuanlu commented Feb 4, 2025

Visualise `Pipeline` objects #1993

Visualise `Pipeline` objects #1993

astrojuanlu commented Jan 23, 2025 •

edited

Loading