feat: add pytest AI validation and Robot Framework BDD testing samples #1

Michspirit99 · 2026-02-09T17:29:27Z

Summary

Adds 2 new sample categories (pytest and Robot Framework BDD) to demonstrate how AI agents powered by the Copilot SDK can be integrated into established test frameworks. Bumps the sample count from 15 → 17.

New Samples

🧪 pytest AI Validation (`pytest_ai_validation.py`)

4 test scenarios: code generation, bug detection, structured JSON output, AI-as-judge
Deterministic assertions (ast.parse, json.loads) combined with AI validation
copilot_session pytest fixture for lifecycle management
Runs standalone (python samples/pytest_ai_validation.py) or with pytest -v

🤖 Robot Framework BDD (`robot_copilot_library.py` + `copilot_bdd.robot`)

BDD/Gherkin syntax: Given I have a Copilot session / When I ask Copilot to generate code / Then the code should be valid Python
Python keyword library wrapping the Copilot SDK for Robot Framework
4 BDD test cases: code generation, bug detection, JSON output, concept explanation
Runs standalone (python samples/robot_copilot_library.py) or with robot samples/copilot_bdd.robot

Other Changes

requirements.txt: Added pytest, pytest-asyncio, robotframework
README.md: Updated 15 → 17 samples, added "AI-Enhanced Testing" section
scripts/run_agent_scenarios.py: Updated E2E runner to include new samples
copilot_bdd.robot: .robot file with Given/When/Then syntax (4 test cases)

Local Test Results

pytest_ai_validation.py:    4/4 passed (code gen, bug detection, JSON schema, AI-as-judge)
robot_copilot_library.py:   3/3 BDD scenarios passed (code gen, bug detection, JSON output)

Key Patterns Demonstrated

Pattern	pytest Sample	Robot Framework Sample
AI-as-judge (one AI validates another)	✅	—
Deterministic + AI assertions	✅	✅
BDD/Gherkin syntax	—	✅
Keyword-driven AI testing	—	✅
Test framework fixture integration	✅ (`copilot_session`)	✅ (Suite Setup/Teardown)
Standalone + framework execution	✅	✅

- Reposition as THE comprehensive Python SDK sample collection - Highlight E2E proof (15/15 passing) as key differentiator - Better comparison table vs typical SDK examples - Clearer value proposition upfront - Add CI status and E2E badges - Emphasize free tier (gpt-5-mini) compatibility - Reorganize samples into clear categories - Add acknowledgments for GitHub Copilot SDK

- pytest_ai_validation.py: AI-enhanced pytest with 4 test scenarios (code generation, bug detection, structured JSON, AI-as-judge) - robot_copilot_library.py: Robot Framework keyword library wrapping Copilot SDK + standalone BDD runner (3 Gherkin scenarios) - copilot_bdd.robot: BDD test suite with Given/When/Then syntax for AI agent testing (code gen, code review, JSON output, explanations) - Update requirements.txt with pytest, pytest-asyncio, robotframework - Update README: 15 -> 17 samples, add AI-Enhanced Testing section - Update E2E runner to include new samples in scenario suite

Copilot

Pull request overview

Adds new “AI-enhanced testing” sample scripts showing how to integrate Copilot SDK–powered agent validation into established test frameworks (pytest + Robot Framework BDD), and wires them into docs and the E2E scenario runner.

Changes:

Added a pytest-based AI validation sample with reusable async scenarios and optional pytest integration.
Added a Robot Framework keyword library + .robot BDD suite demonstrating Given/When/Then AI testing.
Updated requirements, README catalog/claims, and the E2E scenario runner to account for the new samples.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`scripts/run_agent_scenarios.py`	Adds reporting of `.robot` files to the E2E scenario summary.
`samples/robot_copilot_library.py`	New Robot Framework keyword library + standalone BDD-style runner.
`samples/pytest_ai_validation.py`	New pytest AI-validation sample + standalone runner.
`samples/copilot_bdd.robot`	New Robot Framework BDD suite consuming the keyword library.
`requirements.txt`	Adds pytest/pytest-asyncio/robotframework dependencies.
`README.md`	Updates sample catalog and adds AI-Enhanced Testing section + badges/claims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-09T17:37:21Z

samples/pytest_ai_validation.py

+import ast
+import json
+import re
+import sys


sys is imported but never used, which will fail the repo's ruff check CI step. Remove the unused import (or use it explicitly if needed).

Suggested change

import sys

samples/pytest_ai_validation.py

Copilot · 2026-02-09T17:37:22Z

samples/robot_copilot_library.py

+    def json_should_be_valid(self) -> dict:
+        """Assert that the last response is valid JSON and return the parsed dict."""
+        try:
+            data = json.loads(self._last_response)
+            return data
+        except json.JSONDecodeError as e:
+            raise AssertionError(
+                f"Invalid JSON: {e}\n\nRaw:\n{self._last_response[:300]}"
+            )
+
+    def json_should_have_keys(self, *keys):
+        """Assert that the parsed JSON contains all specified keys.
+
+        Example (Robot):
+            JSON Should Have Keys    name    age    email
+        """
+        data = self.json_should_be_valid()
+        missing = set(keys) - set(data.keys())
+        if missing:


json_should_be_valid() is annotated to return dict but returns whatever json.loads() yields, and json_should_have_keys() assumes data.keys() exists. If the model returns a JSON array/string, this will raise AttributeError instead of a clear assertion failure. Assert isinstance(data, dict) (and improve the error message) before using .keys().

@copilot open a new pull request to apply changes based on this feedback

samples/copilot_bdd.robot

Copilot · 2026-02-09T17:37:23Z

scripts/run_agent_scenarios.py

+        # Report .robot files (run via robot_copilot_library.py standalone)
+        for robot_file in sorted(samples_dir.glob("*.robot")):
+            results.append(ScenarioResult(
+                robot_file.stem, 
+                True,
+                "SKIP - Run via: robot samples/copilot_bdd.robot (BDD scenarios tested through robot_copilot_library.py)"
+            ))


Adding .robot files to results increases the scenario count reported by this runner (e.g., 17 .py samples + 1 .robot entry). This will desync the README’s “17/17” E2E badge/transcript unless those are updated too. Consider excluding .robot from the scenario count here, or update the README/badge to match the new total.

Copilot · 2026-02-09T17:37:23Z

samples/pytest_ai_validation.py

+    )
+    response = await session.send_and_wait({"prompt": prompt})
+    result = response.data.content.strip()
+    passed = result.upper().startswith("PASS")


ai_judge() asks for EXACT 'PASS'/'FAIL' on the first line, but the parser accepts any response starting with "PASS" (e.g., "PASS ✅" or "PASSING"). Parse the first line strictly (splitlines()[0] == 'PASS'/'FAIL') and treat anything else as a test failure to keep the oracle deterministic.

Suggested change

passed = result.upper().startswith("PASS")

first_line = result.splitlines()[0].strip() if result else ""

passed = first_line == "PASS"

Copilot · 2026-02-09T17:37:24Z

samples/pytest_ai_validation.py

+        total = len(SCENARIOS)
+        print(f"Results: {passed} passed, {failed} failed out of {total}")
+        if failed == 0:
+            print("\n✅ All AI validation tests passed!")
+        else:
+            print("\n⚠️  Some tests failed (AI responses are non-deterministic)")
+


In standalone mode, failures only increment failed and print a warning, but main() always exits successfully. Since scripts/run_agent_scenarios.py treats “no exception” as PASS, this can produce false-green E2E results. Return an exit code and/or raise SystemExit(1) when failed > 0.

Copilot · 2026-02-09T17:37:24Z

samples/robot_copilot_library.py

+import sys
+from pathlib import Path


sys and Path are imported but never used, which will fail the repo's ruff check CI step. Remove the unused imports (or use them explicitly).

Suggested change

import sys

from pathlib import Path

Copilot · 2026-02-09T17:37:24Z

samples/robot_copilot_library.py

+    if failed == 0:
+        print("\n✅ All BDD scenarios passed!")
+    else:
+        print("\n⚠️  Some scenarios failed")


In standalone mode, scenario failures only increment failed and print, but main() always completes successfully. Because the E2E runner marks “no exception” as PASS, this can hide failing BDD scenarios. Return a non-zero exit code and/or raise SystemExit(1) when failed > 0.

Suggested change

print("\n⚠️ Some scenarios failed")

print("\n⚠️ Some scenarios failed")

raise SystemExit(1)

Copilot · 2026-02-09T17:37:24Z

README.md

 [![Python 3.12+](https://img.shields.io/badge/Python-3.12+-blue.svg)](https://www.python.org/downloads/)
-[![Copilot SDK](https://img.shields.io/badge/Copilot_SDK-Technical_Preview-green.svg)](https://github.com/github/copilot-sdk)
+[![CI Status](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/ci.yml/badge.svg)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/ci.yml)
+[![E2E Proof](https://img.shields.io/badge/E2E-17%2F17%20passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)


The E2E badge hard-codes 17/17, but scripts/run_agent_scenarios.py now also reports .robot files as scenarios, which will change the total. Update the badge (or adjust the runner) so the displayed E2E count matches the workflow output.

Suggested change

[![E2E Proof](https://img.shields.io/badge/E2E-17%2F17%20passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)

[![E2E Proof](https://img.shields.io/badge/E2E-passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)

Copilot · 2026-02-09T17:53:22Z

@Michspirit99 I've opened a new pull request, #2, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Michspirit99 added 2 commits February 8, 2026 14:35

Copilot AI review requested due to automatic review settings February 9, 2026 17:29

Copilot started reviewing on behalf of Michspirit99 February 9, 2026 17:29 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

fix: resolve linter errors (unused imports, f-strings)

81f4c73

Copilot AI mentioned this pull request Feb 9, 2026

Fix type safety in Robot Framework JSON validation methods #2

Draft

Michspirit99 and others added 2 commits February 9, 2026 09:54

Update samples/copilot_bdd.robot

5c7f8cb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update samples/pytest_ai_validation.py

575d1ab

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add pytest AI validation and Robot Framework BDD testing samples #1

feat: add pytest AI validation and Robot Framework BDD testing samples #1

Uh oh!

Michspirit99 commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Michspirit99 Feb 9, 2026

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	passed = result.upper().startswith("PASS")
	first_line = result.splitlines()[0].strip() if result else ""
	passed = first_line == "PASS"

	print("\n⚠️ Some scenarios failed")
	print("\n⚠️ Some scenarios failed")
	raise SystemExit(1)

	[![E2E Proof](https://img.shields.io/badge/E2E-17%2F17%20passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)
	[![E2E Proof](https://img.shields.io/badge/E2E-passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)

feat: add pytest AI validation and Robot Framework BDD testing samples #1

Are you sure you want to change the base?

feat: add pytest AI validation and Robot Framework BDD testing samples #1

Uh oh!

Conversation

Michspirit99 commented Feb 9, 2026

Summary

New Samples

🧪 pytest AI Validation (pytest_ai_validation.py)

🤖 Robot Framework BDD (robot_copilot_library.py + copilot_bdd.robot)

Other Changes

Local Test Results

Key Patterns Demonstrated

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Michspirit99 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🧪 pytest AI Validation (`pytest_ai_validation.py`)

🤖 Robot Framework BDD (`robot_copilot_library.py` + `copilot_bdd.robot`)