Multi-Agent Research System - Assignment 3

A multi-agent system for deep research on HCI topics, featuring orchestrated agents, safety guardrails, and LLM-as-a-Judge evaluation.

Overview

This template provides a starting point for building a multi-agent research assistant system. The system uses multiple specialized agents to:

Plan research tasks
Gather evidence from academic papers and web sources
Synthesize findings into coherent responses
Evaluate quality and verify accuracy
Ensure safety through guardrails

Project Structure

.
├── src/
│   ├── agents/              # Agent implementations
│   │   ├── base_agent.py    # Base agent class
│   │   ├── planner_agent.py # Task planning agent
│   │   ├── researcher_agent.py # Evidence gathering agent
│   │   ├── critic_agent.py  # Quality verification agent
│   │   └── writer_agent.py  # Response synthesis agent
│   ├── guardrails/          # Safety guardrails
│   │   ├── safety_manager.py # Main safety coordinator
│   │   ├── input_guardrail.py # Input validation
│   │   └── output_guardrail.py # Output validation
│   ├── tools/               # Research tools
│   │   ├── web_search.py    # Web search integration
│   │   ├── paper_search.py  # Academic paper search
│   │   └── citation_tool.py # Citation formatting
│   ├── evaluation/          # Evaluation system
│   │   ├── judge.py         # LLM-as-a-Judge implementation
│   │   └── evaluator.py     # System evaluator
│   ├── ui/                  # User interfaces
│   │   ├── cli.py           # Command-line interface
│   │   └── streamlit_app.py # Web interface
│   └── orchestrator.py      # Agent orchestration
├── data/
│   └── example_queries.json # Example test queries
├── logs/                    # Log files (created at runtime)
├── outputs/                 # Evaluation results (created at runtime)
├── config.yaml              # System configuration
├── requirements.txt         # Python dependencies
├── .env.example            # Environment variables template
└── main.py                 # Main entry point

Setup Instructions

1. Prerequisites

Python 3.9 or higher
uv package manager (recommended) or pip
Virtual environment

2. Installation

Installing uv (Recommended)

uv is a fast Python package installer and resolver. Install it first:

# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Alternative: Using pip
pip install uv

Setting up the Project

Clone the repository and navigate to the project directory:

cd is-492-assignment-3

Option A: Using uv (Recommended - Much Faster)

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On macOS/Linux
# OR
.venv\Scripts\activate     # On Windows

# Install dependencies
uv pip install -r requirements.txt

Option B: Using standard pip

# Create virtual environment
python -m venv venv
source venv/bin/activate   # On macOS/Linux
# OR
venv\Scripts\activate      # On Windows

# Install dependencies
pip install -r requirements.txt

3. Security Setup (Important!)

Before committing any code, set up pre-commit hooks to prevent API key leaks:

# Quick setup - installs hooks and runs security checks
./scripts/install-hooks.sh

# Or manually
pre-commit install

This will automatically scan for hardcoded API keys and secrets before each commit. See SECURITY_SETUP.md for full details.

4. API Keys Configuration

Copy the example environment file:

cp .env.example .env

Edit .env and add your API keys:

# Required: At least one LLM API
GROQ_API_KEY=your_groq_api_key_here
# OR
OPENAI_API_KEY=your_openai_api_key_here

# Recommended: At least one search API
TAVILY_API_KEY=your_tavily_api_key_here
# OR
BRAVE_API_KEY=your_brave_api_key_here

# Optional: For academic paper search
SEMANTIC_SCHOLAR_API_KEY=your_key_here

Getting API Keys

Groq (Recommended for students): https://console.groq.com - Free tier available
OpenAI: https://platform.openai.com - Paid, requires credits
Tavily: https://www.tavily.com - Student free quota available
Brave Search: https://brave.com/search/api
Semantic Scholar: https://www.semanticscholar.org/product/api - Free tier available

5. Configuration

Edit config.yaml to customize your system:

Choose your research topic
Configure agent prompts (see below)
Set model preferences (Groq vs OpenAI)
Define safety policies
Configure evaluation criteria

Customizing Agent Prompts

You can customize agent behavior by setting the system_prompt in config.yaml:

agents:
  planner:
    system_prompt: |
      You are an expert research planner specializing in HCI.
      Focus on recent publications and seminal works.
      After creating the plan, say "PLAN COMPLETE".

Important: Custom prompts must include handoff signals:

Planner: Must include "PLAN COMPLETE"
Researcher: Must include "RESEARCH COMPLETE"
Writer: Must include "DRAFT COMPLETE"
Critic: Must include "APPROVED - RESEARCH COMPLETE" or "NEEDS REVISION"

Leave system_prompt: "" (empty) to use the default prompts.

Implementation Guide

This template provides the structure - you need to implement the core functionality. Here's what needs to be done:

Phase 1: Core Agent Implementation

Implement Agent Logic (in src/agents/)
- Complete planner_agent.py - Integrate LLM to break down queries
- Complete researcher_agent.py - Integrate search APIs (Tavily, Semantic Scholar)
- Complete critic_agent.py - Implement quality evaluation logic
- Complete writer_agent.py - Implement synthesis with proper citations
Implement Tools (in src/tools/)
- Complete web_search.py - Integrate Tavily or Brave API
- Complete paper_search.py - Integrate Semantic Scholar API
- Complete citation_tool.py - Implement APA citation formatting

Phase 2: Orchestration

Choose your preferred framework to implement the multi-agent system. The current assignment template code uses AutoGen, but you can also choose to use other frameworks as you prefer (e.g., LangGraph and Crew.ai).

Update orchestrator.py
- Integrate your chosen framework
- Implement the workflow: plan → research → write → critique → revise
- Add error handling

Phase 3: Safety Guardrails

Implement Guardrails (in src/guardrails/)
- Choose framework: Guardrails AI or NeMo Guardrails
- Define safety policies in safety_manager.py
- Implement input validation in input_guardrail.py
- Implement output validation in output_guardrail.py
- Set up safety event logging

Phase 4: Evaluation

Implement LLM-as-a-Judge (in src/evaluation/)
- Complete judge.py - Integrate LLM API for judging
- Define evaluation rubrics for each criterion
- Implement score parsing and aggregation
Create Test Dataset
- Add more test queries to data/example_queries.json
- Define expected outputs or ground truths where possible
- Cover different query types and topics

Phase 5: User Interface

Complete UI (choose one or both)
- Finish CLI implementation in src/ui/cli.py
- Finish web UI in src/ui/streamlit_app.py
- Display agent traces clearly
- Show citations and sources
- Indicate safety events

Running the System

Command Line Interface

python main.py --mode cli

Web Interface

python main.py --mode web
# OR directly:
streamlit run src/ui/streamlit_app.py

Running Evaluation

python main.py --mode evaluate

This will:

Load test queries from data/example_queries.json
Run each query through your system
Evaluate outputs using LLM-as-a-Judge
Generate report in outputs/

Testing

Run tests (if you create them):

pytest tests/

Resources

Documentation

uv Documentation - Fast Python package installer
AutoGen Documentation
LangGraph Documentation
Guardrails AI
NeMo Guardrails
Tavily API
Semantic Scholar API

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
data		data
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitleaksignore		.gitleaksignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
ASSIGNMENT_INSTRUCTIONS.md		ASSIGNMENT_INSTRUCTIONS.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
example_autogen.py		example_autogen.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.sh		test.sh
test_vllm_oss.py		test_vllm_oss.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Agent Research System - Assignment 3

Overview

Project Structure

Setup Instructions

1. Prerequisites

2. Installation

Installing uv (Recommended)

Setting up the Project

3. Security Setup (Important!)

4. API Keys Configuration

Getting API Keys

5. Configuration

Customizing Agent Prompts

Implementation Guide

Phase 1: Core Agent Implementation

Phase 2: Orchestration

Phase 3: Safety Guardrails

Phase 4: Evaluation

Phase 5: User Interface

Running the System

Command Line Interface

Web Interface

Running Evaluation

Testing

Resources

Documentation

About

Uh oh!

Releases

Packages

Languages

License

SALT-Lab-Human-AI/assignment-3-building-and-evaluating-mas-jazzduvv-56

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Research System - Assignment 3

Overview

Project Structure

Setup Instructions

1. Prerequisites

2. Installation

Installing uv (Recommended)

Setting up the Project

3. Security Setup (Important!)

4. API Keys Configuration

Getting API Keys

5. Configuration

Customizing Agent Prompts

Implementation Guide

Phase 1: Core Agent Implementation

Phase 2: Orchestration

Phase 3: Safety Guardrails

Phase 4: Evaluation

Phase 5: User Interface

Running the System

Command Line Interface

Web Interface

Running Evaluation

Testing

Resources

Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages