A multi-agent system for deep research on HCI topics, featuring orchestrated agents, safety guardrails, and LLM-as-a-Judge evaluation.
This template provides a starting point for building a multi-agent research assistant system. The system uses multiple specialized agents to:
- Plan research tasks
- Gather evidence from academic papers and web sources
- Synthesize findings into coherent responses
- Evaluate quality and verify accuracy
- Ensure safety through guardrails
.
├── src/
│ ├── agents/ # Agent implementations
│ │ ├── base_agent.py # Base agent class
│ │ ├── planner_agent.py # Task planning agent
│ │ ├── researcher_agent.py # Evidence gathering agent
│ │ ├── critic_agent.py # Quality verification agent
│ │ └── writer_agent.py # Response synthesis agent
│ ├── guardrails/ # Safety guardrails
│ │ ├── safety_manager.py # Main safety coordinator
│ │ ├── input_guardrail.py # Input validation
│ │ └── output_guardrail.py # Output validation
│ ├── tools/ # Research tools
│ │ ├── web_search.py # Web search integration
│ │ ├── paper_search.py # Academic paper search
│ │ └── citation_tool.py # Citation formatting
│ ├── evaluation/ # Evaluation system
│ │ ├── judge.py # LLM-as-a-Judge implementation
│ │ └── evaluator.py # System evaluator
│ ├── ui/ # User interfaces
│ │ ├── cli.py # Command-line interface
│ │ └── streamlit_app.py # Web interface
│ └── orchestrator.py # Agent orchestration
├── data/
│ └── example_queries.json # Example test queries
├── logs/ # Log files (created at runtime)
├── outputs/ # Evaluation results (created at runtime)
├── config.yaml # System configuration
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── main.py # Main entry point
- Python 3.9 or higher
uvpackage manager (recommended) orpip- Virtual environment
uv is a fast Python package installer and resolver. Install it first:
# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Alternative: Using pip
pip install uvClone the repository and navigate to the project directory:
cd is-492-assignment-3Option A: Using uv (Recommended - Much Faster)
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On macOS/Linux
# OR
.venv\Scripts\activate # On Windows
# Install dependencies
uv pip install -r requirements.txtOption B: Using standard pip
# Create virtual environment
python -m venv venv
source venv/bin/activate # On macOS/Linux
# OR
venv\Scripts\activate # On Windows
# Install dependencies
pip install -r requirements.txtBefore committing any code, set up pre-commit hooks to prevent API key leaks:
# Quick setup - installs hooks and runs security checks
./scripts/install-hooks.sh
# Or manually
pre-commit installThis will automatically scan for hardcoded API keys and secrets before each commit. See SECURITY_SETUP.md for full details.
Copy the example environment file:
cp .env.example .envEdit .env and add your API keys:
# Required: At least one LLM API
GROQ_API_KEY=your_groq_api_key_here
# OR
OPENAI_API_KEY=your_openai_api_key_here
# Recommended: At least one search API
TAVILY_API_KEY=your_tavily_api_key_here
# OR
BRAVE_API_KEY=your_brave_api_key_here
# Optional: For academic paper search
SEMANTIC_SCHOLAR_API_KEY=your_key_here- Groq (Recommended for students): https://console.groq.com - Free tier available
- OpenAI: https://platform.openai.com - Paid, requires credits
- Tavily: https://www.tavily.com - Student free quota available
- Brave Search: https://brave.com/search/api
- Semantic Scholar: https://www.semanticscholar.org/product/api - Free tier available
Edit config.yaml to customize your system:
- Choose your research topic
- Configure agent prompts (see below)
- Set model preferences (Groq vs OpenAI)
- Define safety policies
- Configure evaluation criteria
You can customize agent behavior by setting the system_prompt in config.yaml:
agents:
planner:
system_prompt: |
You are an expert research planner specializing in HCI.
Focus on recent publications and seminal works.
After creating the plan, say "PLAN COMPLETE".Important: Custom prompts must include handoff signals:
- Planner: Must include
"PLAN COMPLETE" - Researcher: Must include
"RESEARCH COMPLETE" - Writer: Must include
"DRAFT COMPLETE" - Critic: Must include
"APPROVED - RESEARCH COMPLETE"or"NEEDS REVISION"
Leave system_prompt: "" (empty) to use the default prompts.
This template provides the structure - you need to implement the core functionality. Here's what needs to be done:
-
Implement Agent Logic (in
src/agents/)- Complete
planner_agent.py- Integrate LLM to break down queries - Complete
researcher_agent.py- Integrate search APIs (Tavily, Semantic Scholar) - Complete
critic_agent.py- Implement quality evaluation logic - Complete
writer_agent.py- Implement synthesis with proper citations
- Complete
-
Implement Tools (in
src/tools/)- Complete
web_search.py- Integrate Tavily or Brave API - Complete
paper_search.py- Integrate Semantic Scholar API - Complete
citation_tool.py- Implement APA citation formatting
- Complete
Choose your preferred framework to implement the multi-agent system. The current assignment template code uses AutoGen, but you can also choose to use other frameworks as you prefer (e.g., LangGraph and Crew.ai).
- Update
orchestrator.py- Integrate your chosen framework
- Implement the workflow: plan → research → write → critique → revise
- Add error handling
- Implement Guardrails (in
src/guardrails/)- Choose framework: Guardrails AI or NeMo Guardrails
- Define safety policies in
safety_manager.py - Implement input validation in
input_guardrail.py - Implement output validation in
output_guardrail.py - Set up safety event logging
-
Implement LLM-as-a-Judge (in
src/evaluation/)- Complete
judge.py- Integrate LLM API for judging - Define evaluation rubrics for each criterion
- Implement score parsing and aggregation
- Complete
-
Create Test Dataset
- Add more test queries to
data/example_queries.json - Define expected outputs or ground truths where possible
- Cover different query types and topics
- Add more test queries to
- Complete UI (choose one or both)
- Finish CLI implementation in
src/ui/cli.py - Finish web UI in
src/ui/streamlit_app.py - Display agent traces clearly
- Show citations and sources
- Indicate safety events
- Finish CLI implementation in
python main.py --mode clipython main.py --mode web
# OR directly:
streamlit run src/ui/streamlit_app.pypython main.py --mode evaluateThis will:
- Load test queries from
data/example_queries.json - Run each query through your system
- Evaluate outputs using LLM-as-a-Judge
- Generate report in
outputs/
Run tests (if you create them):
pytest tests/- uv Documentation - Fast Python package installer
- AutoGen Documentation
- LangGraph Documentation
- Guardrails AI
- NeMo Guardrails
- Tavily API
- Semantic Scholar API