Semantic model perf lab #426

KayUnkroth · 2025-01-30T05:45:43Z

Functions to provision and run a Semantic Model Perf Lab for continuous testing and ad-hoc investigations.

eisber · 2025-01-30T07:13:31Z

src/sempy_labs/perf_lab/_sample_lab.py

+        return str(self._properties)
+
+
+def _get_or_create_workspace(


should move to common utils.

Following up offline with Michael.

Moving this to _helper_functions.py

eisber · 2025-01-30T07:14:12Z

src/sempy_labs/perf_lab/_sample_lab.py

+    create_abfss_path,
+)
+
+class PropertyBag:


why not just dict?

Yeah, dict is much easier. Replaced.

eisber · 2025-01-30T07:15:17Z

src/sempy_labs/perf_lab/_sample_lab.py

+            raise ValueError("For new workspaces, the workspace parameter must be string, not a Guid. Please provide a workspace name.")
+        except ValueError:
+            # OK, it's not a Guid. But also make sure the workspace parameter isn't empty.
+            if workspace == "" or workspace is None:


just use if workspace:

https://stackoverflow.com/questions/9573244/how-to-check-if-the-string-is-empty-in-python

Thank you. Changed throughout.

eisber · 2025-01-30T07:15:39Z

src/sempy_labs/perf_lab/_sample_lab.py

+        print(f"{icons.green_dot} Workspace '{workspace_name}' created.")
+        return (workspace,workspace_id)
+
+def _get_or_create_lakehouse(


move to utility file

Following up offline with Michael.

Moving this to _helper_functions.py

eisber · 2025-01-30T07:16:07Z

src/sempy_labs/perf_lab/_sample_lab.py

+    """
+
+    # Treat empty strings as None.
+    if lakehouse == "":


not sure why this is needed.

No longer in the sample_lab file..

eisber · 2025-01-30T07:22:33Z

src/sempy_labs/perf_lab/_simulated_etl.py

+
+        try:
+            (target_workspace_name, target_workspace_id) = resolve_workspace_name_and_id(workspace=target_workspace)
+        except:


catch specific exception, otherwise we mask errors

Thanks. Replaced with (WorkspaceNotFoundException, ValueError).

eisber · 2025-01-30T07:23:11Z

src/sempy_labs/perf_lab/_simulated_etl.py

+
+    return spark.createDataFrame(rows, schema=schema).dropDuplicates()
+
+def _get_min_max_keys(


move to util file

Following up offline with Michael.

Michael already has a _get_column_aggregate utility function which does essentially the same thing, but it doesn't take a table path as a parameter and only returns a single int value. For now, let's leave it here. Happy to move later if OK with Michael.

eisber · 2025-01-30T07:23:59Z

src/sempy_labs/perf_lab/_table_diagnostics.py

+def get_storage_table_column_segments(
+    test_cycle_definitions: DataFrame,
+    tables_info: DataFrame
+)->DataFrame:


run black to get formatting

Done! File reformatted.

eisber · 2025-01-30T07:24:14Z

src/sempy_labs/perf_lab/_table_diagnostics.py

+
+        try:
+            (target_workspace_name, target_workspace_id) = resolve_workspace_name_and_id(workspace=target_workspace)
+        except:


catch specific exception

Fixed (WorkspaceNotFoundException, ValueError)

eisber · 2025-01-30T07:27:53Z

src/sempy_labs/perf_lab/_test_cycle.py

+    ]
+
+
+def _get_test_definitions(


the dataframe seems small, so I'd suggest not to use a pandas dataframe.
since you want to subsequently persist it (and maybe re-hydrate), I think a separate class would make it more convenient.

Agreed. Classes are more user friendly and less error prone. Also let me hard-code the fields that the test definitions must have. But I do want to keep this flexible so that users can add more fields as needed for their specific cases. For example, a query Category can be useful for analysis but it's not strictly needed for the perf lab. Long story short, added TestDefinition and TestSuite classes that feature the standard fields but also persist and load any additional fields the user may want to have.

…classes

eisber · 2025-03-04T07:36:53Z

src/sempy_labs/perf_lab/_sample_lab.py

+    olEPs = response.json().get("oneLakeEndpoints")
+    dfsEP = olEPs.get("dfsEndpoint")
+
+    start_expr = "let\n\tdatabase = "


multiline string using """

""" to improve readability

KayUnkroth · 2025-03-04T20:12:53Z

Closing this big PR for a series of small PRs, as discussed offline.

KayUnkroth added 6 commits January 25, 2025 22:48

Functions to provision a sample perf lab.

d23497b

Additional perf lab modules.

75aae3b

table_diagnostics

6e27311

Test cycle functions

109de39

run_test_cycle description

4aa5196

Perf Lab.ipynb added

f6d4c24

eisber requested changes Jan 30, 2025

View reviewed changes

KayUnkroth added 12 commits February 3, 2025 14:07

Merge branch 'microsoft:main' into main

ae5672f

Explicit _refresh_test_models

8645b8d

Merge branch 'main' of https://github.com/KayUnkroth/semantic-link-labs

259f655

UpdateTableCallback included in perf_lab package

baf2bf4

fewer tom.model.SaveChanges()

f95ea1b

ExecutionTracker added

7948218

_get_test_cycle_id

63d0a08

Merge branch 'microsoft:main' into main

9055a5b

Merge branch 'microsoft:main' into main

158cbcf

PropertyBag replaced with dict

85f86a2

TestDefinitions DataFrames converted to TestDefinition and TestSuite …

bf652a2

…classes

AdventureWorksDW Delta table generator

9c578b2

eisber reviewed Mar 4, 2025

View reviewed changes

KayUnkroth closed this Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic model perf lab #426

Semantic model perf lab #426

KayUnkroth commented Jan 30, 2025

eisber Jan 30, 2025

KayUnkroth Feb 27, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Feb 27, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Feb 27, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Jan 30, 2025

KayUnkroth Mar 1, 2025

eisber Mar 4, 2025

KayUnkroth commented Mar 4, 2025


		return spark.createDataFrame(rows, schema=schema).dropDuplicates()

		def _get_min_max_keys(

Semantic model perf lab #426

Semantic model perf lab #426

Conversation

KayUnkroth commented Jan 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KayUnkroth commented Mar 4, 2025