Skip to content

๐Ÿง  Make your agents learn from experience. Based on the Agentic Context Engineering (ACE) framework.

License

kayba-ai/agentic-context-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Kayba Logo

Agentic Context Engine (ACE)

GitHub stars Discord Twitter Follow PyPI version Python 3.11+ License: MIT

AI agents that get smarter with every task ๐Ÿง 

Agentic Context Engine learns from your agent's successes and failures. Just plug in and watch your agents improve.

Star โญ๏ธ this repo if you find it useful!


๐Ÿค– LLM Quickstart

  1. Direct your favorite coding agent (Cursor, Claude Code, Codex, etc) to Agents.md
  2. Prompt away!

โœ‹ Quick Start

1. Install

pip install ace-framework

2. Set Your API Key

export OPENAI_API_KEY="your-api-key"
# Or use Claude, Gemini, or 100+ other providers

3. Create Your First ACE Agent

from ace import LiteLLMClient, Generator, Reflector, Curator, Playbook

# Initialize with any LLM
client = LiteLLMClient(model="gpt-4o-mini")
generator = Generator(client)
reflector = Reflector(client)
curator = Curator(client)
playbook = Playbook()

# Teach your agent through examples
# (See examples/ folder for complete training patterns)

# Now it can solve new problems with learned strategies
result = generator.generate(
    question="Give me the seahorse emoji",
    context="",
    playbook=playbook
)
print(result.final_answer)  # Agent applies learned strategies

That's it! Your agent is now learning and improving. ๐ŸŽ‰


Why Agentic Context Engine (ACE)?

AI agents make the same mistakes repeatedly.

ACE enables agents to learn from execution feedback: what works, what doesn't, and continuously improve.
No training data, no fine-tuning, just automatic improvement.

Clear Benefits

  • ๐Ÿ“ˆ 20-35% Better Performance: Proven improvements on complex tasks
  • ๐Ÿง  Self-Improving: Agents get smarter with each task
  • ๐Ÿ”„ No Context Collapse: Preserves valuable knowledge over time
  • ๐Ÿš€ 100+ LLM Providers: Works with OpenAI, Anthropic, Google, and more
  • ๐Ÿ“Š Production Observability: Built-in Opik integration for enterprise monitoring

Demos

๐ŸŒŠ The Seahorse Emoji Challenge

A challenge where LLMs often hallucinate that a seahorse emoji exists (it doesn't). Watch ACE learn from its own mistakes in real-time. This demo shows how ACE handles the infamous challenge!

Kayba Test Demo

In this example:

  • Round 1: The agent incorrectly outputs ๐Ÿด (horse emoji)
  • Self-Reflection: ACE reflects without any external feedback
  • Round 2: With learned strategies from ACE, the agent successfully realizes there is no seahorse emoji

Try it yourself:

python examples/kayba_ace_test.py

How does Agentic Context Engine (ACE) work?

Based on the ACE research framework from Stanford & SambaNova.

ACE uses three specialized roles that work together:

  1. ๐ŸŽฏ Generator - Executes tasks using learned strategies from the playbook
  2. ๐Ÿ” Reflector - Analyzes what worked and what didn't after each execution
  3. ๐Ÿ“ Curator - Updates the playbook with new strategies based on reflection

ACE teaches your agent and internalises:

  • โœ… Successes โ†’ Extract patterns that work
  • โŒ Failures โ†’ Learn what to avoid
  • ๐Ÿ”ง Tool usage โ†’ Discover which tools work best for which tasks
  • ๐ŸŽฏ Edge cases โ†’ Remember rare scenarios and how to handle them

The magic happens in the Playbookโ€”a living document of strategies that evolves with experience.
Key innovation: All learning happens in context through incremental updatesโ€”no fine-tuning, no training data, and complete transparency into what your agent learned.

---
config:
  look: neo
  theme: neutral
---
flowchart LR
    Playbook[("`**๐Ÿ“š Playbook**<br>(Evolving Context)<br><br>โ€ขStrategy Bullets<br> โœ“ Helpful strategies <br>โœ— Harmful patterns <br>โ—‹ Neutral observations`")]
    Start(["**๐Ÿ“Query** <br>User prompt or question"]) --> Generator["**โš™๏ธGenerator** <br>Executes task using playbook"]
    Generator --> Reflector
    Playbook -. Provides Context .-> Generator
    Environment["**๐ŸŒ Task Environment**<br>Evaluates answer<br>Provides feedback"] -- Feedback+ <br>Optional Ground Truth --> Reflector
    Reflector["**๐Ÿ” Reflector**<br>Analyzes and provides feedback what was helpful/harmful"]
    Reflector --> Curator["**๐Ÿ“ Curator**<br>Produces improvement deltas"]
    Curator --> DeltaOps["**๐Ÿ”€Merger** <br>Updates the playbook with deltas"]
    DeltaOps -- Incremental<br>Updates --> Playbook
    Generator <--> Environment
Loading

Installation Options

# Basic installation
pip install ace-framework

# With demo support (browser automation)
pip install ace-framework[demos]

# With LangChain support
pip install ace-framework[langchain]

# With local model support
pip install ace-framework[transformers]

# With all features
pip install ace-framework[all]

# Development
pip install ace-framework[dev]

# Development from source (contributors) - UV Method (10-100x faster)
git clone https://github.com/kayba-ai/agentic-context-engine
cd agentic-context-engine
uv sync

# Development from source (contributors) - Traditional Method
git clone https://github.com/kayba-ai/agentic-context-engine
cd agentic-context-engine
pip install -e .

Configuration

ACE works with any LLM provider through LiteLLM:

# OpenAI
client = LiteLLMClient(model="gpt-4o")

# Anthropic Claude
client = LiteLLMClient(model="claude-3-5-sonnet-20241022")

# Google Gemini
client = LiteLLMClient(model="gemini-pro")

# Ollama (local)
client = LiteLLMClient(model="ollama/llama2")

# With fallbacks for reliability
client = LiteLLMClient(
    model="gpt-4",
    fallbacks=["claude-3-haiku", "gpt-3.5-turbo"]
)

Observability with Opik

ACE includes built-in Opik integration for production monitoring and debugging.

Quick Start

# Install with Opik support
pip install ace-framework opik

# Set your Opik API key (or use local deployment)
export OPIK_API_KEY="your-api-key"
export OPIK_PROJECT_NAME="ace-project"

What Gets Tracked

When Opik is available, ACE automatically logs:

  • Generator: Input questions, reasoning, and final answers
  • Reflector: Error analysis and bullet classifications
  • Curator: Playbook updates and delta operations
  • Playbook Evolution: Changes to strategies over time

Viewing Traces

# Opik tracing is automatic - just run your ACE code normally
from ace import Generator, Reflector, Curator, Playbook
from ace.llm_providers import LiteLLMClient

# All role interactions are automatically tracked
generator = Generator(llm_client)
output = generator.generate(
    question="What is 2+2?",
    context="Show your work",
    playbook=playbook
)
# View traces at https://www.comet.com/opik or your local Opik instance

Graceful Degradation

If Opik is not installed or configured, ACE continues to work normally without tracing. No code changes needed.


๐Ÿ“Š Benchmarks

Evaluate ACE performance with scientific rigor using our comprehensive benchmark suite.

Quick Benchmark

# Compare baseline vs ACE on any benchmark
uv run python scripts/run_benchmark.py simple_qa --limit 50 --compare

# Run with proper train/test split (prevents overfitting)
uv run python scripts/run_benchmark.py finer_ord --limit 100

# Baseline evaluation (no ACE learning)
uv run python scripts/run_benchmark.py hellaswag --limit 50 --skip-adaptation

Available Benchmarks

Benchmark Description Domain
simple_qa Question Answering (SQuAD) General
finer_ord Financial Named Entity Recognition Finance
mmlu Massive Multitask Language Understanding General Knowledge
hellaswag Commonsense Reasoning Common Sense
arc_easy/arc_challenge AI2 Reasoning Challenge Reasoning

Evaluation Modes

  • ACE Mode: Train/test split with learning (shows true generalization)
  • Baseline Mode: Direct evaluation without learning (--skip-adaptation)
  • Comparison Mode: Side-by-side baseline vs ACE (--compare)

The benchmark system prevents overfitting with automatic 80/20 train/test splits and provides overfitting analysis to ensure honest metrics.

โ†’ Full Benchmark Documentation


Documentation


Contributing

We love contributions! Check out our Contributing Guide to get started.


Acknowledgment

Based on the ACE paper and inspired by Dynamic Cheatsheet.

If you use ACE in your research, please cite:

@article{zhang2024ace,title={Agentic Context Engineering},author={Zhang et al.},journal={arXiv:2510.04618},year={2024}}

โญ Star this repo if you find it useful!
Built with โค๏ธ by Kayba and the open-source community.

About

๐Ÿง  Make your agents learn from experience. Based on the Agentic Context Engineering (ACE) framework.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages