Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualize pipeline objects in notebook #2241

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Jan 15, 2025

Description

Resolves #1993

NOTE: The bundle URL will be updated once #2268 is merged

Development notes

  • Created a class NotebookVisualizer and a method show responsible for visualizing Kedro-Viz using the esm bundle in notebook
  • Added load_data_for_notebook_users and load_and_populate_data_for_notebook_users methods to kedro-viz -> integrations -> notebook -> data_loader.py
  • Added initialize and reset methods in data_access_manager for reuse
  • Added few utility functions and classes
  • Update release note, tests and gcp load balancer doc link
  • Removed few GraphQL checks from workflow as they were failing

QA notes

  • All tests should pass
  • For manual testing, open a jupyter notebook and try -
from kedro.pipeline import pipeline, node

def dummy(ds1):
   return ds1
    
n0 = node(dummy, 'flights', 'processed_flights')
dummy_pipe = pipeline([n0])

from kedro_viz.integrations.notebook import NotebookVisualizer
NotebookVisualizer(dummy_pipe).show()

image

  • You can also test demo_project pipelines, try -
from kedro_viz.integrations.notebook import NotebookVisualizer
from demo_project.pipeline_registry import register_pipelines
demo_pipe = register_pipelines()

# Since globalNavigation depends on localStorage, the option is not working.
 
NotebookVisualizer(pipeline=demo_pipe, options={ "display": {
                "expandPipelinesBtn": False,
                "exportBtn": False,
                "labelBtn": False,
                "layerBtn": False,
                "metadataPanel": True,
                "miniMap": False,
                "sidebar": False,
                "zoomToolbar": False,
            },
            "expandAllPipelines": False,
            "behaviour": { 
                "reFocus": False,
            },
            "theme": "dark",
            "width": "100%",
            "height": "600px",   
            }).show()

Testing Results :

Jupyter lab:

image

Databricks:

image

Marimo:
image

VS Code
image

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
…at/umd-viz-bundle

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
…at/umd-viz-bundle

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API does fulfill my needs I'd say 👍🏼 shall we wait until #2268 is merged so that we can do proper QA? Would like to try it on JupyterLab/Jupyter Notebook, VS Code notebooks, marimo, and Databricks.

…at/viz-pipe

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
@astrojuanlu

This comment was marked as outdated.

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
@ravi-kumar-pilla
Copy link
Contributor Author

@astrojuanlu for security reason (accessing localStorage), esm bundle does not allow globalNavigation to be True. Though for our use case this is not a blocker but if it is an issue, we can fix it later or go with umd. I restored globalNavigation and it should work now.

On databricks:

image

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
@astrojuanlu
Copy link
Member

Last commit fixed things it seems.

VS Code

This is how it looks like for me, and I have a big screen (1512 x 982). Any chance we can not show the logs? And maybe make the area a bit smaller?

image

Jupyter Notebook

Same thing

image

And I'm seeing some warnings and errors in the terminal:

[E 2025-02-11 18:31:21.922 ServerApp] Uncaught exception in write_error
    Traceback (most recent call last):
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/tornado/web.py", line 1788, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/tornado/web.py", line 269, in _unimplemented_method
        raise HTTPError(405)
    tornado.web.HTTPError: HTTP 405: Method Not Allowed
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/extension/handler.py", line 29, in get_template
        template = cast(Template, self.settings[env].get_template(name))  # type:ignore[attr-defined]
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 1016, in get_template
        return self._load_template(name, globals)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 975, in _load_template
        template = self.loader.load(self, name, self.make_globals(globals))
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/loaders.py", line 126, in load
        source, filename, uptodate = self.get_source(environment, name)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/loaders.py", line 209, in get_source
        raise TemplateNotFound(
    jinja2.exceptions.TemplateNotFound: '405.html' not found in search path: '/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/notebook/templates'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/base/handlers.py", line 740, in write_error
        html = self.render_template("%s.html" % status_code, **ns)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/extension/handler.py", line 93, in render_template
        template = cast(Template, self.get_template(name))  # type:ignore[attr-defined]
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/extension/handler.py", line 32, in get_template
        return cast(Template, super().get_template(name))  # type:ignore[misc]
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/base/handlers.py", line 662, in get_template
        return self.settings["jinja2_env"].get_template(name)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 1016, in get_template
        return self._load_template(name, globals)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 975, in _load_template
        template = self.loader.load(self, name, self.make_globals(globals))
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/loaders.py", line 126, in load
        source, filename, uptodate = self.get_source(environment, name)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/loaders.py", line 209, in get_source
        raise TemplateNotFound(
    jinja2.exceptions.TemplateNotFound: '405.html' not found in search paths: '/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server', '/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/templates'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/tornado/web.py", line 1298, in send_error
        self.write_error(status_code, **kwargs)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/base/handlers.py", line 742, in write_error
        html = self.render_template("error.html", **ns)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jupyter_server/extension/handler.py", line 98, in render_template
        return cast(str, template.render(**ns))
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 1295, in render
        self.environment.handle_exception()
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 942, in handle_exception
        raise rewrite_traceback_stack(source=source)
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/notebook/templates/error.html", line 1, in top-level template code
        <!doctype html><html><head><meta charset="utf-8"><title>{% block title %}{{page_title | e}}{% endblock %}</title>{% block favicon %}<link rel="shortcut icon" type="image/x-icon" href="/static/favicons/favicon.ico">{% endblock %}<script defer="defer" src="{{page_config.fullStaticUrl}}/main.407246dd27aed8010549.js?v=407246dd27aed8010549"></script></head><body class="jp-ThemedContainer">{% block stylesheet %}<style>/* disable initial hide */
      File "/Users/juan_cano/Projects/QuantumBlackLabs/tmp/spaceflights/.venv/lib/python3.10/site-packages/jinja2/environment.py", line 490, in getattr
        return getattr(obj, attribute)
    jinja2.exceptions.UndefinedError: 'page_config' is undefined
[W 2025-02-11 18:31:21.960 JupyterNotebookApp] 405 OPTIONS /notebooks/srcdoc/api/deploy-viz-metadata (@127.0.0.1) 76.06ms referer=None

marimo

This is where it looks best, interestingly enough. Still, maybe the area is too big.

image

@ravi-kumar-pilla
Copy link
Contributor Author

Hi @astrojuanlu , Wow ! Thanks for the quick test and feedback. I can check if we can -

  1. Customize the height and width accepting it as user options (pretty much doable)
  2. Hiding logs (need to check on this)
  3. The errors on the console - few seem to be from .venv (not sure it is related to the bundle) but there will be some security errors related to localStorage which seem unavoidable for the moment.

I will fix 1, 2 for now
Thank you

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ravi-kumar-pilla ,

Does it make sense to separate how we:

  • Load Kedro-Viz from a Kedro project via a FastAPI server
  • Load Kedro-Viz in a notebook by generating JSON and bundling it using ESM

Currently, notebook-related functions are in data_loader.py and server.py, making these files larger and somewhat out of place. Would it be better to create a new folder under integrations called notebooks and move the visualizer and loader files there for better separation?

Let me know your thoughts!

@@ -3,6 +3,9 @@
import sys
import warnings

# alias to ease Notebook visualization import
from .launchers.notebook_visualizer import NotebookVisualizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably not be there but in intergrations/notebook/init.py then users can do

from kedro_viz.intergrations.notebook import NotebookVisualizer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. I don't have a strong opinion on this. I wanted users to get the NotebookVisualizer class easily, but I can move it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an experimental feature, and we know that only a very small percentage of users run Kedro-Viz in notebooks—especially since run_viz was broken for months! If we import it here, it will load every time a user runs Kedro-Viz, even when notebooks aren’t involved, which isn’t ideal.

else:
notebook_user_pipeline = {"__default__": notebook_user_pipeline}

return catalog, notebook_user_pipeline, session_store, stats_dict # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make session_store and stat_dict optional properties in populate_data? We're already removing session_store in the ET removal PR, and stat_dict doesn’t seem essential.

Not sure if we should do the same for catalog—do we actually need it in the notebook visualizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rashidakanchwala , I think we need DataCatalog to be not optional as we do add_catalog in the following steps and we expect Catalog to be present. For this PR, I am not making the changes to populate_data as it is introducing other changes in DataAccessManager, conftest and other test cases. It would be better we handle this in other ticket which involves moving these methods from server.py as we discussed. wdyt ?

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
from typing import Dict, Optional, Tuple, Union, cast

from kedro import __version__
from kedro.framework.session.store import BaseSessionStore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need session.store. or will we remove it another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done for type checking, we can remove it when session_store is removed completely after ET removal

def _load_viz_data(self) -> Optional[Any]:
"""Load pipeline and catalog data for visualization."""
load_and_populate_data_for_notebook_users(self.pipeline, self.catalog)
return get_kedro_project_json_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we completely sure that load_and_populate_data_for_notebook_users has finished executing before calling get_kedro_project_json_data()? Do we need any asynchronous handling here?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_and_populate_data_for_notebook_users does not have any async call to be awaited. Everything is synchronous

html_content = self.generate_html(
json_to_visualize, self.options, self.js_url
)
iframe_content = self._wrap_in_iframe(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do this after, can we not add heigh/width in generate_html itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the question but the height and width are used to customize the iframe size. Do you think we should customize the root div instead ? or both iframe and root div ?

Signed-off-by: ravi_kumar_pilla <ravi_kumar_pilla@mckinsey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Visualise Pipeline objects
3 participants