Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualize pipeline objects in notebook #2241

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
dc31929
initial draft
ravi-kumar-pilla Jan 15, 2025
d3448dc
adding window config for jupyter users
ravi-kumar-pilla Jan 22, 2025
7755e11
working draft
ravi-kumar-pilla Jan 23, 2025
7c264dd
working final draft
ravi-kumar-pilla Jan 28, 2025
bf08766
working final draft
ravi-kumar-pilla Jan 28, 2025
4483342
clean window pollution
ravi-kumar-pilla Jan 29, 2025
fd532ee
working draft with 2 approaches
ravi-kumar-pilla Jan 29, 2025
563182f
initial bundle draft
ravi-kumar-pilla Jan 29, 2025
e8f7249
update webpack
ravi-kumar-pilla Jan 29, 2025
8b66fec
testing webpack
ravi-kumar-pilla Jan 30, 2025
72bcc28
ignore babel for umd
ravi-kumar-pilla Jan 30, 2025
32632d0
testing with published bundle
ravi-kumar-pilla Jan 30, 2025
d3f9c21
tested bundle
ravi-kumar-pilla Jan 30, 2025
c148a55
merge bundle PR
ravi-kumar-pilla Jan 30, 2025
f7e10a1
optimization code added
ravi-kumar-pilla Jan 31, 2025
5031722
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jan 31, 2025
06fc82d
add optimization to prod bundle
ravi-kumar-pilla Jan 31, 2025
8298408
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Feb 5, 2025
6e5511b
add umd to repo
ravi-kumar-pilla Feb 5, 2025
b49109b
v10.3.0
ravi-kumar-pilla Feb 5, 2025
fd379f7
push umd bundle
ravi-kumar-pilla Feb 5, 2025
d962a9b
remove additional commits
ravi-kumar-pilla Feb 5, 2025
7ad4be9
remove additional commits
ravi-kumar-pilla Feb 5, 2025
47e2b4b
add release note
ravi-kumar-pilla Feb 5, 2025
199a34c
merge main
ravi-kumar-pilla Feb 5, 2025
45dd808
add umd bundle
ravi-kumar-pilla Feb 5, 2025
4f69d7e
testing esm module
ravi-kumar-pilla Feb 5, 2025
27cfd9d
add esm ref
ravi-kumar-pilla Feb 5, 2025
0485f45
add esm
ravi-kumar-pilla Feb 5, 2025
508beaa
test with esm
ravi-kumar-pilla Feb 6, 2025
fac1b3c
add esm draft
ravi-kumar-pilla Feb 6, 2025
ffe1657
add esm ref
ravi-kumar-pilla Feb 6, 2025
08f74e8
clean bundle config
ravi-kumar-pilla Feb 6, 2025
4cf8635
fix lint and format checks
ravi-kumar-pilla Feb 6, 2025
be2d9d8
temp remove gql checks
ravi-kumar-pilla Feb 6, 2025
450a695
fix lint
ravi-kumar-pilla Feb 6, 2025
6f4fcc3
fix lint
ravi-kumar-pilla Feb 7, 2025
cdc2d7a
fix tests
ravi-kumar-pilla Feb 7, 2025
847bc95
fix doc test
ravi-kumar-pilla Feb 7, 2025
6ff45ed
merge main
ravi-kumar-pilla Feb 10, 2025
1115c34
add granularity to notebook visualizer
ravi-kumar-pilla Feb 10, 2025
05ce8de
structured notebook visualizer
ravi-kumar-pilla Feb 11, 2025
fb73d92
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Feb 11, 2025
28a228f
updated js link
ravi-kumar-pilla Feb 11, 2025
884752c
fix lint
ravi-kumar-pilla Feb 11, 2025
67cfa5f
restore global navigation
ravi-kumar-pilla Feb 11, 2025
bb29abf
add default globalNavigation
ravi-kumar-pilla Feb 11, 2025
f956451
fix cache deprecation
ravi-kumar-pilla Feb 11, 2025
14c6c7a
fix based on comments
ravi-kumar-pilla Feb 11, 2025
8d79192
address PR comments
ravi-kumar-pilla Feb 12, 2025
b494af5
remove unused import
ravi-kumar-pilla Feb 12, 2025
8678052
remove test notebook
ravi-kumar-pilla Feb 12, 2025
84f1e07
fix lint
ravi-kumar-pilla Feb 12, 2025
e7b5239
address PR comments2
ravi-kumar-pilla Feb 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@ jobs:

- name: Run security scan
run: make security-scan

- name: Verify GraphQL schema is up to date
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved
run: make schema-check


- name: Run Python formatters and linters
run: make format-check lint-check
7 changes: 0 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,6 @@ lint-check:
mypy --config-file=package/mypy.ini package/kedro_viz package/features
mypy --disable-error-code abstract --config-file=package/mypy.ini package/tests

schema-fix:
strawberry export-schema --app-dir=package kedro_viz.api.graphql.schema > src/apollo/schema.graphql
graphqlviz src/apollo/schema.graphql | dot -Tpng -o .github/img/schema.graphql.png

schema-check:
strawberry export-schema --app-dir=package kedro_viz.api.graphql.schema | diff src/apollo/schema.graphql -

secret-scan:
trufflehog --max_depth 1 --exclude_path trufflehog-ignore.txt .

Expand Down
2 changes: 2 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

## Major features and improvements

- Visualize pipeline objects in notebook. (#2241)

Check warning on line 12 in RELEASE.md

View workflow job for this annotation

GitHub Actions / vale

[vale] RELEASE.md#L12

[Kedro-viz.ukspelling] In general, use UK English spelling instead of 'Visualize'.
Raw output
{"message": "[Kedro-viz.ukspelling] In general, use UK English spelling instead of 'Visualize'.", "location": {"path": "RELEASE.md", "range": {"start": {"line": 12, "column": 4}}}, "severity": "WARNING"}

## Bug fixes and other changes

- Fix `%run_viz` using old process in jupyter notebook. (#2267)
Expand Down
3 changes: 3 additions & 0 deletions package/kedro_viz/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
import sys
import warnings

# alias to ease Notebook visualization import
from .launchers.notebook_visualizer import NotebookVisualizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably not be there but in intergrations/notebook/init.py then users can do

from kedro_viz.intergrations.notebook import NotebookVisualizer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. I don't have a strong opinion on this. I wanted users to get the NotebookVisualizer class easily, but I can move it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an experimental feature, and we know that only a very small percentage of users run Kedro-Viz in notebooks—especially since run_viz was broken for months! If we import it here, it will load every time a user runs Kedro-Viz, even when notebooks aren’t involved, which isn’t ideal.


__version__ = "10.2.0"


Expand Down
8 changes: 8 additions & 0 deletions package/kedro_viz/data_access/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ class DataAccessManager:
"""Centralised interface for the rest of the application to interact with data repositories."""

def __init__(self):
self._initialize_fields()

def _initialize_fields(self):
"""Initialize or reset all instance variables."""
self.catalog = CatalogRepository()
self.nodes = GraphNodesRepository()
self.registered_pipelines = RegisteredPipelinesRepository()
Expand All @@ -72,6 +76,10 @@ def __init__(self):
self.tracking_datasets = TrackingDatasetsRepository()
self.dataset_stats = {}

def reset_fields(self):
"""Reset all instance variables."""
self._initialize_fields()

def set_db_session(self, db_session_class: sessionmaker):
"""Set db session on repositories that need it."""
self.runs.set_db_session(db_session_class)
Expand Down
27 changes: 26 additions & 1 deletion package/kedro_viz/integrations/kedro/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import logging
import sys
from pathlib import Path
from typing import Any, Dict, Optional, Set, Tuple
from typing import Any, Dict, Optional, Set, Tuple, Union, cast
from unittest.mock import patch

from kedro import __version__
Expand Down Expand Up @@ -113,6 +113,31 @@ def _load_data_helper(
return catalog, pipelines_dict, session_store, stats_dict


def load_data_for_notebook_users(
notebook_pipeline: Union[Pipeline, Dict[str, Pipeline]],
notebook_catalog: Optional[DataCatalog],
) -> Tuple[DataCatalog, Dict[str, Pipeline], BaseSessionStore, Dict]:
"""Load data from a notebook user's pipeline"""
# Create a dummy data catalog with all datasets as memory datasets
catalog = DataCatalog() if notebook_catalog is None else notebook_catalog
session_store = None
stats_dict: Dict = {}

notebook_user_pipeline = notebook_pipeline

# create a default pipeline if a dictionary of pipelines are sent
if isinstance(notebook_user_pipeline, dict):
notebook_user_pipeline = {
"__default__": notebook_user_pipeline["__default__"]
if "__default__" in notebook_user_pipeline
else cast(Pipeline, sum(notebook_user_pipeline.values()))
}
else:
notebook_user_pipeline = {"__default__": notebook_user_pipeline}

return catalog, notebook_user_pipeline, session_store, stats_dict # type: ignore[return-value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make session_store and stat_dict optional properties in populate_data? We're already removing session_store in the ET removal PR, and stat_dict doesn’t seem essential.

Not sure if we should do the same for catalog—do we actually need it in the notebook visualizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rashidakanchwala , I think we need DataCatalog to be not optional as we do add_catalog in the following steps and we expect Catalog to be present. For this PR, I am not making the changes to populate_data as it is introducing other changes in DataAccessManager, conftest and other test cases. It would be better we handle this in other ticket which involves moving these methods from server.py as we discussed. wdyt ?



def load_data(
project_path: Path,
env: Optional[str] = None,
Expand Down
155 changes: 155 additions & 0 deletions package/kedro_viz/launchers/notebook_visualizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
import json
import uuid
from typing import Any, Dict, Optional, Union

from IPython.display import HTML, display
from kedro.io.data_catalog import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.api.rest.responses.pipelines import get_kedro_project_json_data
from kedro_viz.server import load_and_populate_data_for_notebook_users
from kedro_viz.utils import merge_dicts

DEFAULT_VIZ_OPTIONS = {
"display": {
"expandPipelinesBtn": False,
"exportBtn": False,
"globalNavigation": False,
"labelBtn": False,
"layerBtn": False,
"metadataPanel": False,
"miniMap": False,
"sidebar": False,
"zoomToolbar": False,
},
"expandAllPipelines": False,
"behaviour": {
"reFocus": False,
},
"theme": "dark",
}

DEFAULT_JS_URL = "https://cdn.jsdelivr.net/gh/kedro-org/kedro-viz@feat/esm-viz-bundle/esm/kedro-viz.production.mjs"


class NotebookVisualizer:
"""Represent a Kedro-Viz visualization instance in a notebook"""

def __init__(
self,
pipeline: Union[Pipeline, Dict[str, Pipeline]],
catalog: Optional[DataCatalog] = None,
options: Optional[Dict[str, Any]] = None,
js_url: Optional[str] = None,
):
"""
Initialize NotebookVisualizer.

Args:
pipeline: Kedro pipeline(s) to visualize.
catalog: Kedro data catalog.
options: Visualization options.
(Ref: https://github.com/kedro-org/kedro-viz/blob/main/README.npm.md#configure-kedro-viz-with-options)
js_url: Optional URL for the Kedro-Viz JS bundle.

Returns:
A new ``NotebookVisualizer`` instance.
"""
self.pipeline = pipeline
self.catalog = catalog
self.options = (
DEFAULT_VIZ_OPTIONS
if options is None
else merge_dicts(DEFAULT_VIZ_OPTIONS, options)
)
self.js_url = js_url or DEFAULT_JS_URL

def _load_viz_data(self) -> Optional[Any]:
"""Load pipeline and catalog data for visualization."""
load_and_populate_data_for_notebook_users(self.pipeline, self.catalog)
return get_kedro_project_json_data()

@staticmethod
def generate_html(
json_to_visualize: Optional[Any],
options: Dict[str, Any] = DEFAULT_VIZ_OPTIONS,
js_url: str = DEFAULT_JS_URL,
) -> str:
"""Generate HTML markup for Kedro-Viz.

Args:
json_to_visualize: Kedro project pipeline data as a json object.
options: Visualization options.
js_url: Optional URL for the Kedro-Viz JS bundle.

Returns:
The HTML markup template as a string
"""
unique_id = uuid.uuid4().hex[:8] # To isolate container for each cell execution
json_data_str = json.dumps(json_to_visualize)
options_str = json.dumps(options)

html_content = (
r"""<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='UTF-8'>
<meta name='viewport' content='width=device-width, initial-scale=1.0'>
<title>Kedro-Viz</title>
</head>
<body>
<div id=kedro-viz-"""
+ unique_id
+ """ style='height: 600px'></div>
<script type="module">
import { KedroViz, React, createRoot } from '"""
+ js_url
+ """';
const viz_container = document.getElementById('kedro-viz-"""
+ unique_id
+ """');

if (createRoot && viz_container) {
const viz_root = createRoot(viz_container);
viz_root.render(
React.createElement(KedroViz, {
data: """
+ json_data_str
+ """,
options: """
+ options_str
+ """
})
);
}
</script>
</body>
</html>"""
)

return html_content

@staticmethod
def _wrap_in_iframe(html_content: str) -> str:
"""Wrap the HTML content in an iframe.

Args:
html_content: The HTML markup template as a string for visualization

Returns:
A string containing html markup embedded in an iframe
"""
sanitized_content = html_content.replace('"', "&quot;")
return f"""<iframe srcdoc="{sanitized_content}" style="width:100%; height:600px; border:none;" sandbox="allow-scripts"></iframe>"""

def show(self) -> None:
"""Display Kedro-Viz in a notebook."""
try:
json_to_visualize = self._load_viz_data()
html_content = self.generate_html(
json_to_visualize, self.options, self.js_url
)
iframe_content = self._wrap_in_iframe(html_content)
display(HTML(iframe_content))
except Exception as exc: # noqa: BLE001
display(HTML(f"<strong>Error: {str(exc)}</strong>"))
20 changes: 19 additions & 1 deletion package/kedro_viz/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
for Kedro pipeline visualisation."""

from pathlib import Path
from typing import Any, Dict, Optional
from typing import Any, Dict, Optional, Union

from kedro.framework.session.store import BaseSessionStore
from kedro.io import DataCatalog
Expand Down Expand Up @@ -44,6 +44,24 @@ def populate_data(
data_access_manager.add_pipelines(pipelines)


def load_and_populate_data_for_notebook_users(
notebook_pipeline: Union[Pipeline, Dict[str, Pipeline]],
notebook_catalog: Optional[DataCatalog],
):
"""Loads pipeline data and populates Kedro Viz Repositories for a notebook user"""
catalog, pipelines, session_store, stats_dict = (
kedro_data_loader.load_data_for_notebook_users(
notebook_pipeline, notebook_catalog
)
)

# make each cell independent
data_access_manager.reset_fields()

# Creates data repositories which are used by Kedro Viz Backend APIs
populate_data(data_access_manager, catalog, pipelines, session_store, stats_dict)


def load_and_populate_data(
path: Path,
env: Optional[str] = None,
Expand Down
16 changes: 15 additions & 1 deletion package/kedro_viz/utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Transcoding related utility functions."""

import hashlib
from typing import Tuple
from typing import Any, Tuple

TRANSCODING_SEPARATOR = "@"

Expand Down Expand Up @@ -57,3 +57,17 @@ def _strip_transcoding(element: str) -> str:
def is_dataset_param(dataset_name: str) -> bool:
"""Return whether a dataset is a parameter"""
return dataset_name.lower().startswith("params:") or dataset_name == "parameters"


def merge_dicts(dict_one: dict[str, Any], dict_two: dict[str, Any]) -> dict[str, Any]:
"""Utility to merge two dictionaries"""
import copy

merged = copy.deepcopy(dict_one)

for key, value in dict_two.items():
if isinstance(value, dict) and key in merged:
merged[key] = merge_dicts(merged[key], value)
else:
merged[key] = value
return merged
42 changes: 42 additions & 0 deletions package/tests/test_data_access/test_managers.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import defaultdict
from typing import Dict

import networkx as nx
Expand All @@ -11,9 +12,18 @@
from kedro_viz.constants import DEFAULT_REGISTERED_PIPELINE_ID, ROOT_MODULAR_PIPELINE_ID
from kedro_viz.data_access.managers import DataAccessManager
from kedro_viz.data_access.repositories.catalog import CatalogRepository
from kedro_viz.data_access.repositories.graph import GraphNodesRepository
from kedro_viz.data_access.repositories.modular_pipelines import (
ModularPipelinesRepository,
)
from kedro_viz.data_access.repositories.registered_pipelines import (
RegisteredPipelinesRepository,
)
from kedro_viz.data_access.repositories.runs import RunsRepository
from kedro_viz.data_access.repositories.tags import TagsRepository
from kedro_viz.data_access.repositories.tracking_datasets import (
TrackingDatasetsRepository,
)
from kedro_viz.integrations.utils import UnavailableDataset
from kedro_viz.models.flowchart.edge import GraphEdge
from kedro_viz.models.flowchart.named_entities import Tag
Expand All @@ -29,6 +39,38 @@ def identity(x):
return x


class TestDataAccessManager:
def test_manager_initialize_fields(self, data_access_manager: DataAccessManager):
"""Test that all instance variables are correctly initialized."""
assert isinstance(data_access_manager.catalog, CatalogRepository)
assert isinstance(data_access_manager.nodes, GraphNodesRepository)
assert isinstance(
data_access_manager.registered_pipelines, RegisteredPipelinesRepository
)
assert isinstance(data_access_manager.tags, TagsRepository)
assert isinstance(data_access_manager.modular_pipelines, defaultdict)
assert isinstance(data_access_manager.edges, defaultdict)
assert isinstance(data_access_manager.node_dependencies, defaultdict)
assert isinstance(data_access_manager.runs, RunsRepository)
assert isinstance(
data_access_manager.tracking_datasets, TrackingDatasetsRepository
)
assert data_access_manager.dataset_stats == {}

def test_manager_reset_fields(self, data_access_manager: DataAccessManager):
"""Test that reset_fields correctly reinitializes the instance variables."""
# Modify fields to non-default values
data_access_manager.catalog = None
data_access_manager.dataset_stats = {"test_key": "test_value"}

data_access_manager.reset_fields()

# Assert fields are reset to default
assert isinstance(data_access_manager.catalog, CatalogRepository)
assert isinstance(data_access_manager.dataset_stats, dict)
assert data_access_manager.dataset_stats == {}


class TestAddCatalog:
def test_add_catalog(
self,
Expand Down
Loading
Loading