Skip to content

Conversation

@Michspirit99
Copy link
Owner

Summary

Adds 2 new sample categories (pytest and Robot Framework BDD) to demonstrate how AI agents powered by the Copilot SDK can be integrated into established test frameworks. Bumps the sample count from 15 → 17.

New Samples

🧪 pytest AI Validation (pytest_ai_validation.py)

  • 4 test scenarios: code generation, bug detection, structured JSON output, AI-as-judge
  • Deterministic assertions (ast.parse, json.loads) combined with AI validation
  • copilot_session pytest fixture for lifecycle management
  • Runs standalone (python samples/pytest_ai_validation.py) or with pytest -v

🤖 Robot Framework BDD (robot_copilot_library.py + copilot_bdd.robot)

  • BDD/Gherkin syntax: Given I have a Copilot session / When I ask Copilot to generate code / Then the code should be valid Python
  • Python keyword library wrapping the Copilot SDK for Robot Framework
  • 4 BDD test cases: code generation, bug detection, JSON output, concept explanation
  • Runs standalone (python samples/robot_copilot_library.py) or with robot samples/copilot_bdd.robot

Other Changes

  • requirements.txt: Added pytest, pytest-asyncio, robotframework
  • README.md: Updated 15 → 17 samples, added "AI-Enhanced Testing" section
  • scripts/run_agent_scenarios.py: Updated E2E runner to include new samples
  • copilot_bdd.robot: .robot file with Given/When/Then syntax (4 test cases)

Local Test Results

pytest_ai_validation.py:    4/4 passed (code gen, bug detection, JSON schema, AI-as-judge)
robot_copilot_library.py:   3/3 BDD scenarios passed (code gen, bug detection, JSON output)

Key Patterns Demonstrated

Pattern pytest Sample Robot Framework Sample
AI-as-judge (one AI validates another)
Deterministic + AI assertions
BDD/Gherkin syntax
Keyword-driven AI testing
Test framework fixture integration ✅ (copilot_session) ✅ (Suite Setup/Teardown)
Standalone + framework execution

- Reposition as THE comprehensive Python SDK sample collection
- Highlight E2E proof (15/15 passing) as key differentiator
- Better comparison table vs typical SDK examples
- Clearer value proposition upfront
- Add CI status and E2E badges
- Emphasize free tier (gpt-5-mini) compatibility
- Reorganize samples into clear categories
- Add acknowledgments for GitHub Copilot SDK
- pytest_ai_validation.py: AI-enhanced pytest with 4 test scenarios
  (code generation, bug detection, structured JSON, AI-as-judge)
- robot_copilot_library.py: Robot Framework keyword library wrapping
  Copilot SDK + standalone BDD runner (3 Gherkin scenarios)
- copilot_bdd.robot: BDD test suite with Given/When/Then syntax for
  AI agent testing (code gen, code review, JSON output, explanations)
- Update requirements.txt with pytest, pytest-asyncio, robotframework
- Update README: 15 -> 17 samples, add AI-Enhanced Testing section
- Update E2E runner to include new samples in scenario suite
Copilot AI review requested due to automatic review settings February 9, 2026 17:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new “AI-enhanced testing” sample scripts showing how to integrate Copilot SDK–powered agent validation into established test frameworks (pytest + Robot Framework BDD), and wires them into docs and the E2E scenario runner.

Changes:

  • Added a pytest-based AI validation sample with reusable async scenarios and optional pytest integration.
  • Added a Robot Framework keyword library + .robot BDD suite demonstrating Given/When/Then AI testing.
  • Updated requirements, README catalog/claims, and the E2E scenario runner to account for the new samples.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
scripts/run_agent_scenarios.py Adds reporting of .robot files to the E2E scenario summary.
samples/robot_copilot_library.py New Robot Framework keyword library + standalone BDD-style runner.
samples/pytest_ai_validation.py New pytest AI-validation sample + standalone runner.
samples/copilot_bdd.robot New Robot Framework BDD suite consuming the keyword library.
requirements.txt Adds pytest/pytest-asyncio/robotframework dependencies.
README.md Updates sample catalog and adds AI-Enhanced Testing section + badges/claims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import ast
import json
import re
import sys
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys is imported but never used, which will fail the repo's ruff check CI step. Remove the unused import (or use it explicitly if needed).

Suggested change
import sys

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +214
def json_should_be_valid(self) -> dict:
"""Assert that the last response is valid JSON and return the parsed dict."""
try:
data = json.loads(self._last_response)
return data
except json.JSONDecodeError as e:
raise AssertionError(
f"Invalid JSON: {e}\n\nRaw:\n{self._last_response[:300]}"
)

def json_should_have_keys(self, *keys):
"""Assert that the parsed JSON contains all specified keys.

Example (Robot):
JSON Should Have Keys name age email
"""
data = self.json_should_be_valid()
missing = set(keys) - set(data.keys())
if missing:
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_should_be_valid() is annotated to return dict but returns whatever json.loads() yields, and json_should_have_keys() assumes data.keys() exists. If the model returns a JSON array/string, this will raise AttributeError instead of a clear assertion failure. Assert isinstance(data, dict) (and improve the error message) before using .keys().

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +229 to +235
# Report .robot files (run via robot_copilot_library.py standalone)
for robot_file in sorted(samples_dir.glob("*.robot")):
results.append(ScenarioResult(
robot_file.stem,
True,
"SKIP - Run via: robot samples/copilot_bdd.robot (BDD scenarios tested through robot_copilot_library.py)"
))
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding .robot files to results increases the scenario count reported by this runner (e.g., 17 .py samples + 1 .robot entry). This will desync the README’s “17/17” E2E badge/transcript unless those are updated too. Consider excluding .robot from the scenario count here, or update the README/badge to match the new total.

Copilot uses AI. Check for mistakes.
)
response = await session.send_and_wait({"prompt": prompt})
result = response.data.content.strip()
passed = result.upper().startswith("PASS")
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai_judge() asks for EXACT 'PASS'/'FAIL' on the first line, but the parser accepts any response starting with "PASS" (e.g., "PASS ✅" or "PASSING"). Parse the first line strictly (splitlines()[0] == 'PASS'/'FAIL') and treat anything else as a test failure to keep the oracle deterministic.

Suggested change
passed = result.upper().startswith("PASS")
first_line = result.splitlines()[0].strip() if result else ""
passed = first_line == "PASS"

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +264
total = len(SCENARIOS)
print(f"Results: {passed} passed, {failed} failed out of {total}")
if failed == 0:
print("\n✅ All AI validation tests passed!")
else:
print("\n⚠️ Some tests failed (AI responses are non-deterministic)")

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In standalone mode, failures only increment failed and print a warning, but main() always exits successfully. Since scripts/run_agent_scenarios.py treats “no exception” as PASS, this can produce false-green E2E results. Return an exit code and/or raise SystemExit(1) when failed > 0.

Copilot uses AI. Check for mistakes.
Comment on lines 25 to 26
import sys
from pathlib import Path
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys and Path are imported but never used, which will fail the repo's ruff check CI step. Remove the unused imports (or use them explicitly).

Suggested change
import sys
from pathlib import Path

Copilot uses AI. Check for mistakes.
if failed == 0:
print("\n✅ All BDD scenarios passed!")
else:
print("\n⚠️ Some scenarios failed")
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In standalone mode, scenario failures only increment failed and print, but main() always completes successfully. Because the E2E runner marks “no exception” as PASS, this can hide failing BDD scenarios. Return a non-zero exit code and/or raise SystemExit(1) when failed > 0.

Suggested change
print("\n⚠️ Some scenarios failed")
print("\n⚠️ Some scenarios failed")
raise SystemExit(1)

Copilot uses AI. Check for mistakes.
[![Python 3.12+](https://img.shields.io/badge/Python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![Copilot SDK](https://img.shields.io/badge/Copilot_SDK-Technical_Preview-green.svg)](https://github.com/github/copilot-sdk)
[![CI Status](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/ci.yml/badge.svg)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/ci.yml)
[![E2E Proof](https://img.shields.io/badge/E2E-17%2F17%20passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The E2E badge hard-codes 17/17, but scripts/run_agent_scenarios.py now also reports .robot files as scenarios, which will change the total. Update the badge (or adjust the runner) so the displayed E2E count matches the workflow output.

Suggested change
[![E2E Proof](https://img.shields.io/badge/E2E-17%2F17%20passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)
[![E2E Proof](https://img.shields.io/badge/E2E-passing-brightgreen)](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml)

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI commented Feb 9, 2026

@Michspirit99 I've opened a new pull request, #2, to work on those changes. Once the pull request is ready, I'll request review from you.

Michspirit99 and others added 2 commits February 9, 2026 09:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants