Skip to content

Conversation

@NISH1001
Copy link
Collaborator

@NISH1001 NISH1001 commented Dec 22, 2025

Summary

This PR adds support for source_context within the guardrail system to provide the Risk Agent with behavioral information about the specific agent or tool being evaluated. It also introduces a debug mode for the Risk Agent to log the exact messages being sent to the LLM during criteria generation.

Details

  1. Enhanced Guardrail Input: Added source_context to the GuardrailInput schema to allow passing specific constraints or behavioral expectations of the content source.
  2. Context Extraction Logic: Implemented _get_source_context in the decorators module. This helper extracts context from classes or instances with a priority order: explicit overrides, the description attribute (if it is a string), and finally the docstring.
  3. Decorator Updates: Updated the guardrail decorator and apply_guardrails function to accept and process the source_context.
  4. Risk Agent Integration: Modified RiskAgent to inject "Source (Agent) Behaviour" into the system prompt when source_context is provided. This ensures the LLM has enough context to generate more accurate risk criteria.
  5. Debug Logging: Added a debug flag to the RiskAgent that prints the full list of messages sent to the LLM. This is useful for verifying prompt injection and behavioral context during development.

Checks

  • Tested Changes
  • Stakeholder Approval

- Agent or tool descirption and other behavioral context is added,
  especially to risk agent
@NISH1001 NISH1001 changed the title Enhance/guardrail agent context Support behavioral context in Risk Agent and enhance guardrail decorators Dec 22, 2025
@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 132
  • Coverage: 77%

Branch: enhance/guardrail-agent-context
PR: #300
Commit: ef3dc20

📋 Full coverage report and logs are available in the workflow run.

…ration

- Update RISK_SYSTEM_PROMPT with two-step process (context understanding → criteria generation)
- Add _default_source_context_message() method with clear producer context formatting
- Change source_context injection from generic label to structured system message
- Add debug logging for total criteria count across risk categories

The prompt now instructs the LLM to use source context to generate MORE RELEVANT
criteria rather than skipping evaluation entirely. This ensures consistent risk
evaluation while tailoring criteria to the producer's domain.

Co-Authored-By: Tigran Tchrakian <tigran@Tigrans-MacBook-Pro.local>
@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 133
  • Coverage: 77%

Branch: enhance/guardrail-agent-context
PR: #300
Commit: 9eb3a8d

📋 Full coverage report and logs are available in the workflow run.

@NISH1001 NISH1001 deployed to integration December 22, 2025 16:12 — with GitHub Actions Active
@github-actions
Copy link

✅ Tests passed

📊 Test Results

  • Passed: 549
  • Failed: 0
  • Skipped: 23
  • Warnings: 134
  • Coverage: 77%

Branch: enhance/guardrail-agent-context
PR: #300
Commit: 63e4955

📋 Full coverage report and logs are available in the workflow run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants