Agent skills for Databricks Assistant by grusin-db · Pull Request #1025 · databrickslabs/dqx

grusin-db · 2026-02-06T13:31:03Z

PR: Add `install_skills()` -- deploy DQX agent skills to Databricks Assistant

Branch: f/agent_skills → main | 3 commits | +2109 lines

Key Changes

1. New `install_skills()` command (`src/databricks/labs/dqx/skills.py`)

One-liner to deploy DQX agent skills into a Databricks workspace so that Databricks Assistant can use them.

from databricks.labs import dqx

dqx.install_skills()                        # → /Users/{me}/.assistant/skills/dqx/
dqx.install_skills("/Shared/skills/dqx")    # custom folder

Reads bundled SKILL.md files and example scripts from package resources via importlib.resources, then uploads them with WorkspaceClient.workspace.upload().

2. Bundled skill content (`src/databricks/labs/dqx/resources/skills/`)

Six SKILL.md files covering the full DQX surface, structured for the assistant's skill discovery:

Skill	Scope
`dqx/SKILL.md`	Root: install, quick-start, routing to sub-skills
`checks/SKILL.md`	Check authoring, apply, split, reference DataFrames
`checks/row-level/SKILL.md`	30+ row-level functions with full parameter reference (required vs optional)
`checks/dataset-level/SKILL.md`	10+ dataset-level functions with full parameter reference
`checks/custom/SKILL.md`	SQL expressions, Python row/dataset checks
`profiling/SKILL.md`	DQProfiler + manual SQL profiling

Each SKILL.md includes a parameter table at the top that explicitly marks required vs optional arguments to prevent the assistant from generating invalid check definitions (e.g. calling is_in_range without both min_limit and max_limit).

3. Runnable examples (`checks/examples/`)

21 self-contained Python scripts (one per check category) bundled as package resources and deployed alongside skills:

01–09: row-level checks (null, list, comparison, range, regex, datetime, network, SQL, complex types)
10–16: dataset-level checks (unique, aggregation, FK, compare, freshness, schema, SQL query)
17–19: custom checks (SQL window, Python row, Python dataset)
20–21: profiling

All SKILL.md files cross-reference relevant examples.

4. Package wiring (`init.py`, `resources/init.py`)

install_skills is exported from databricks.labs.dqx.__init__, so both import styles work:

from databricks.labs.dqx import install_skills
# or
from databricks.labs import dqx; dqx.install_skills()

No pyproject.toml build changes needed -- hatchling already includes non-Python files under src/.

⚠️ Breaking Changes

None. This is purely additive.

Examples

Databricks AI Assistant with skills loaded

...

Unskilled Databricks AI Assistant

github-actions · 2026-02-06T13:31:16Z

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

github-actions · 2026-02-06T13:52:30Z

✅ 550/550 passed, 3 flaky, 35 skipped, 8h44m42s total

Flaky tests:

🤪 test_profiler (1m24.894s)
🤪 test_quality_checker (1m6.999s)
🤪 test_profile_with_custom_folder (1m23.335s)

_{Running from acceptance #4028}

codecov · 2026-02-06T14:35:04Z

Codecov Report

❌ Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.60%. Comparing base (435459f) to head (0a0733c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/databricks/labs/dqx/skills.py	0.00%	26 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1025      +/-   ##
==========================================
- Coverage   91.24%   90.60%   -0.64%     
==========================================
  Files          64       99      +35     
  Lines        6703     8971    +2268     
==========================================
+ Hits         6116     8128    +2012     
- Misses        587      843     +256

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new install_skills() helper to deploy bundled DQX “skills” (SKILL.md + runnable examples) into a Databricks workspace folder for Databricks Assistant discovery.

Changes:

Introduces src/databricks/labs/dqx/skills.py with install_skills() that uploads package-bundled skill content to the workspace.
Bundles a skills tree under src/databricks/labs/dqx/resources/skills/ including SKILL.md docs and runnable example scripts.
Updates exports/docs/perf artifacts to reflect new skills/examples and benchmark additions.

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/perf/.benchmarks/baseline.json	Updates stored benchmark baselines to include new benchmark cases.
src/databricks/labs/dqx/skills.py	Adds `install_skills()` implementation that discovers packaged resources and uploads them to Workspace.
src/databricks/labs/dqx/resources/skills/dqx/SKILL.md	Root skill entrypoint for DQX assistant discovery and routing to sub-skills.
src/databricks/labs/dqx/resources/skills/dqx/checks/SKILL.md	Documents authoring/applying checks and references examples.
src/databricks/labs/dqx/resources/skills/dqx/checks/row-level/SKILL.md	Adds row-level checks reference + examples index.
src/databricks/labs/dqx/resources/skills/dqx/checks/dataset-level/SKILL.md	Adds dataset-level checks reference + examples index.
src/databricks/labs/dqx/resources/skills/dqx/checks/custom/SKILL.md	Adds custom checks guidance + examples index.
src/databricks/labs/dqx/resources/skills/dqx/profiling/SKILL.md	Adds profiling skill content and runnable example references.
src/databricks/labs/dqx/resources/skills/dqx/checks/examples/*.py	Adds runnable example scripts to be deployed alongside skills.
src/databricks/labs/dqx/init.py	Exports `install_skills` from the package root.
docs/dqx/docs/reference/benchmarks.mdx	Updates benchmark documentation tables with new entries.
demo.ipynb	Adds an executable demo notebook showing `dqx.install_skills()` usage.

Comments suppressed due to low confidence (6)

src/databricks/labs/dqx/skills.py:1

Converting importlib.resources.files(...) to a filesystem Path via Path(str(...)) will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), and rglob/read_bytes won’t work reliably. Keep the object as an importlib.resources.abc.Traversable and either (a) recursively walk it via iterdir() and use Traversable.read_bytes(), or (b) use importlib.resources.as_file() around the directory Traversable to obtain a temporary real path before doing rglob().
src/databricks/labs/dqx/skills.py:1
Converting importlib.resources.files(...) to a filesystem Path via Path(str(...)) will break when the package is installed as a zip/pex/egg (resources aren’t guaranteed to exist on disk), and rglob/read_bytes won’t work reliably. Keep the object as an importlib.resources.abc.Traversable and either (a) recursively walk it via iterdir() and use Traversable.read_bytes(), or (b) use importlib.resources.as_file() around the directory Traversable to obtain a temporary real path before doing rglob().
src/databricks/labs/dqx/skills.py:1
ws.workspace.mkdirs(target_dir) is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., a set[str]) and only call mkdirs the first time each directory is encountered.
src/databricks/labs/dqx/skills.py:1
ws.workspace.mkdirs(target_dir) is invoked for every file, which can add a lot of redundant API calls for larger skill trees. Track which directories have already been created (e.g., a set[str]) and only call mkdirs the first time each directory is encountered.
src/databricks/labs/dqx/skills.py:1
install_skills() adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mock WorkspaceClient to verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.
src/databricks/labs/dqx/skills.py:1
install_skills() adds new behavior that is sensitive to environment (current user lookup, resource discovery, workspace upload). Please add unit tests that mock WorkspaceClient to verify: (1) default folder resolution uses the current user, (2) the correct target paths are computed for nested resources, and (3) uploads are invoked with overwrite enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T16:48:05Z

src/databricks/labs/dqx/__init__.py

+
+__all__ = ["install_skills"]


Defining __all__ as ["install_skills"] changes from databricks.labs.dqx import * behavior to only export install_skills, which can be an unintended breaking change for users relying on star-imports. Consider removing __all__ entirely, or appending to an existing/exported list (e.g., include previously intended public names) rather than restricting exports to only this symbol.

Suggested change

__all__ = ["install_skills"]

Copilot · 2026-02-06T16:48:06Z

demo.ipynb

+      "execution_count": 2,
+      "id": "c234d4f6",
+      "metadata": {},
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+            "  from .autonotebook import tqdm as notebook_tqdm\n"
+          ]
+        }
+      ],


The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.

Suggested change

"execution_count": 2,

"id": "c234d4f6",

"metadata": {},

"outputs": [

{

"name": "stderr",

"output_type": "stream",

"text": [

"/Users/grzegorz.rusin/repos/dqx/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",

" from .autonotebook import tqdm as notebook_tqdm\n"

]

}

],

"execution_count": null,

"id": "c234d4f6",

"metadata": {},

"outputs": [],

Copilot · 2026-02-06T16:48:06Z

demo.ipynb

+          "output_type": "stream",
+          "text": [
+            "\u001b[90m12:38:26\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[databricks.sdk] Using Databricks Metadata Service authentication\u001b[0m\n",
+            "\u001b[90m12:38:27\u001b[0m \u001b[1m\u001b[32m INFO\u001b[0m \u001b[1m[d.l.dqx.skills] Deploying skills to: /Users/grzegorz.rusin@databricks.com/.assistant/skills/dqx\u001b[0m\n",


The committed notebook output includes a local filesystem path and a user email address, which are sensitive and typically should not be checked in. Clear notebook outputs before committing (and consider removing machine-specific setup like sys.path.insert(...) or replacing it with a more portable approach) so the repo doesn’t leak developer environment details.

mwojtyczka

This is really cool!

Feedback:

Please sign the commits.
We need to reduce the maintenance effort. All the examples should already be available in the docs. We need a better way to manage this to avoid duplication. E.g. store examples in one place and source it for both docs and skills from there; document the examples in each check function. The examples on how to use DQX are also already in the docs. We need to centralize this.

mwojtyczka

Great addition — bundled SKILL.md files are well-structured and the runnable examples are a nice touch. A few correctness issues in skills.py worth fixing, plus a gap between the PR description and the actual diff.

Missing __init__.py export: The PR description states that install_skills is exported from databricks.labs.dqx.__init__ and that from databricks.labs import dqx; dqx.install_skills() works. But src/databricks/labs/dqx/__init__.py is not in this diff — the function is only reachable via from databricks.labs.dqx.skills import install_skills. Either the description is aspirational and __init__.py still needs updating, or the claim should be removed.

mwojtyczka · 2026-03-02T23:25:37Z

src/databricks/labs/dqx/skills.py

+        folder: Target workspace folder. Defaults to
+            ``/Users/{current_user}/.assistant/skills/dqx``.
+    """
+    ws = WorkspaceClient()


WorkspaceClient() is not injectable — blocks unit testing

Constructing the client inside the function means tests must either hit a real Databricks workspace or monkey-patch the constructor. The rest of the codebase (e.g. DQEngine, SettingsManager) accepts ws as a constructor argument precisely to allow mocking.

Suggested signature:

def install_skills(folder: str | None = None, ws: WorkspaceClient | None = None) -> None: if ws is None: ws = WorkspaceClient()

This keeps the zero-argument convenience call while enabling install_skills(ws=mock_ws) in tests.

mwojtyczka · 2026-03-02T23:25:37Z

src/databricks/labs/dqx/skills.py

+        user_name = ws.current_user.me().user_name
+        folder = f"/Users/{user_name}/.assistant/skills/dqx"
+
+    skills_root = Path(str(files("databricks.labs.dqx.resources") / "skills" / "dqx"))


Path(str(Traversable)) + rglob breaks for zip-packaged installs

files(...) returns an importlib.resources.abc.Traversable, not necessarily a real filesystem Path. Converting it with Path(str(...)) works for editable installs and extracted wheels, but silently fails (or raises NotADirectoryError) when the package is imported from inside a zip archive (e.g., a compiled .egg or certain CI environments).

For zip-safe resource access the idiomatic approach is to use Traversable APIs throughout:

from importlib.resources import files skills_root = files("databricks.labs.dqx.resources").joinpath("skills").joinpath("dqx") def _rglob_traversable(node): for child in node.iterdir(): if child.is_dir(): yield from _rglob_traversable(child) else: yield child deploy_files = sorted(_rglob_traversable(skills_root), key=lambda f: str(f))

This avoids the Path conversion entirely and works correctly regardless of how the package is packaged.

mwojtyczka · 2026-03-02T23:25:37Z

src/databricks/labs/dqx/skills.py

+
+    for src_file in deploy_files:
+        relative = src_file.relative_to(skills_root)
+        target_path = f"{folder}/{relative}"


Path object in f-string uses OS-native separators — breaks on Windows

relative is a pathlib.Path. On Windows, str(relative) produces backslash-separated paths (e.g., checks\examples\01_row_null_empty.py), so the target workspace path becomes /.../checks\examples\01_row_null_empty.py, which the Databricks API will reject.

Use relative.as_posix() to always get forward slashes:

target_path = f"{folder}/{relative.as_posix()}"

mwojtyczka · 2026-03-02T23:25:37Z

src/databricks/labs/dqx/skills.py

+        target_path = f"{folder}/{relative}"
+        target_dir = "/".join(target_path.split("/")[:-1])
+
+        ws.workspace.mkdirs(target_dir)


mkdirs is called once per file instead of once per unique directory

With 25 example files across ~5 directories, this makes 25 mkdirs API calls when only 5 are needed. mkdirs is idempotent so it doesn't produce incorrect results, but it generates unnecessary network round-trips.

Collect the unique parent directories before the loop:

unique_dirs = {f"{folder}/{Path(f).as_posix()}".rsplit("/", 1)[0] for f in deploy_files_posix} for d in sorted(unique_dirs): ws.workspace.mkdirs(d) for src_file in deploy_files: ... ws.workspace.upload(target_path, ...)

or more simply, track which directories have already been created in a set inside the loop.

grusin-db added 3 commits February 6, 2026 12:00

skilz

bc60c27

skills

11eeb35

examples

2344734

grusin-db requested a review from a team as a code owner February 6, 2026 13:31

grusin-db requested review from tombonfert and removed request for a team February 6, 2026 13:31

grusin-db temporarily deployed to tool February 6, 2026 13:31 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 6, 2026 13:31 — with GitHub Actions Failure

Add pytest-benchmark performance baseline

9f67447

grusin-db mentioned this pull request Feb 6, 2026

Are skills for databricks labs project welcome here? databricks-solutions/ai-dev-kit#60

Open

mwojtyczka requested a review from Copilot February 6, 2026 16:43

Copilot AI reviewed Feb 6, 2026

View reviewed changes

mwojtyczka requested changes Feb 7, 2026

View reviewed changes

mwojtyczka changed the title ~~F/agent skills~~ Agent skills for Databricks Assistant Feb 8, 2026

Merge branch 'main' into f/agent_skills

645705e

grusin-db temporarily deployed to tool February 9, 2026 06:52 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 9, 2026 06:52 — with GitHub Actions Error

grusin-db had a problem deploying to tool February 9, 2026 06:52 — with GitHub Actions Failure

grusin-db added 2 commits February 9, 2026 07:53

revert

b2420c7

trash

9adbb93

grusin-db had a problem deploying to tool February 9, 2026 06:54 — with GitHub Actions Error

AGENTS instructions

1adf9be

grusin-db had a problem deploying to tool February 9, 2026 07:09 — with GitHub Actions Error

grusin-db added 2 commits February 9, 2026 08:35

pass 1

79cc77b

add geo

b1f6ca9

grusin-db temporarily deployed to tool February 9, 2026 07:53 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 9, 2026 08:19 — with GitHub Actions Failure

grusin-db temporarily deployed to tool February 9, 2026 08:19 — with GitHub Actions Inactive

grusin-db had a problem deploying to tool February 9, 2026 08:19 — with GitHub Actions Failure

Merge branch 'main' into f/agent_skills

0a0733c

mwojtyczka had a problem deploying to tool March 2, 2026 23:03 — with GitHub Actions Failure

mwojtyczka temporarily deployed to tool March 2, 2026 23:03 — with GitHub Actions Inactive

mwojtyczka had a problem deploying to tool March 2, 2026 23:03 — with GitHub Actions Failure

mwojtyczka temporarily deployed to tool March 2, 2026 23:03 — with GitHub Actions Inactive

mwojtyczka reviewed Mar 2, 2026

View reviewed changes

Conversation

grusin-db commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR: Add install_skills() -- deploy DQX agent skills to Databricks Assistant

Key Changes

1. New install_skills() command (src/databricks/labs/dqx/skills.py)

2. Bundled skill content (src/databricks/labs/dqx/resources/skills/)

3. Runnable examples (checks/examples/)

4. Package wiring (__init__.py, resources/__init__.py)

⚠️ Breaking Changes

Examples

Databricks AI Assistant with skills loaded

Unskilled Databricks AI Assistant

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka left a comment

Choose a reason for hiding this comment

Uh oh!

mwojtyczka left a comment

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grusin-db commented Feb 6, 2026 •

edited

Loading

PR: Add `install_skills()` -- deploy DQX agent skills to Databricks Assistant

1. New `install_skills()` command (`src/databricks/labs/dqx/skills.py`)

2. Bundled skill content (`src/databricks/labs/dqx/resources/skills/`)

3. Runnable examples (`checks/examples/`)

4. Package wiring (`init.py`, `resources/init.py`)

github-actions bot commented Feb 6, 2026 •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading