Skip to content

Commit a5e8267

Browse files
authored
Merge pull request #48 from attach-dev/dev
Release 0.3.7 – Quota, Usage Metering & Parity patch
2 parents 8984814 + 70992db commit a5e8267

21 files changed

+1174
-166
lines changed

.env.example

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ WEAVIATE_URL=http://localhost:8081
1919
MAX_TOKENS_PER_MIN=60000
2020
QUOTA_ENCODING=cl100k_base
2121

22+
# Metering Option (null, prometheus, openmeter)
23+
USAGE_METERING=null
24+
2225
# Development: Auth0 credentials for dev_login script
2326
# AUTH0_DOMAIN=your-domain.auth0.com
2427
# AUTH0_CLIENT=your-client-id

AGENTS.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,18 @@ This repo is used for the Attach Gateway service. Follow these guidelines for co
1313

1414
## Development Tools
1515
- Code should be formatted with `black` and imports sorted with `isort`.
16+
17+
## 🔒 Memory & /mem/events are **read-only**
18+
19+
> **Do not touch any memory-related code.**
20+
21+
* **Off-limits files / symbols**
22+
* `mem/**`
23+
* `main.py` → the `/mem/events` route and **all** `MemoryEvent` logic
24+
* Any Weaviate queries, inserts, or schema
25+
26+
* PRs that change, remove, or “refactor” these areas **will be rejected**.
27+
Only work on the explicitly assigned task (e.g. billing hooks).
28+
29+
* If your change needs to interact with memory, open an issue first and wait
30+
for maintainer approval.

README.md

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -234,36 +234,82 @@ curl -X POST /v1/logs \
234234
# => HTTP/1.1 202 Accepted
235235
```
236236

237+
## Usage hooks
238+
239+
Emit token usage metrics for every request. Choose a backend via
240+
`USAGE_METERING` (alias `USAGE_BACKEND`):
241+
242+
```bash
243+
export USAGE_METERING=prometheus # or null
244+
```
245+
246+
A Prometheus counter `attach_usage_tokens_total{user,direction,model}` is
247+
exposed for Grafana dashboards.
248+
Set `USAGE_METERING=null` (the default) to disable metering entirely.
249+
250+
> **⚠️ Usage hooks depend on the quota middleware.**
251+
> Make sure `MAX_TOKENS_PER_MIN` is set (any positive number) so the
252+
> `TokenQuotaMiddleware` is enabled; the middleware is what records usage
253+
> events that feed Prometheus.
254+
255+
```bash
256+
# Enable usage tracking (set any reasonable limit)
257+
export MAX_TOKENS_PER_MIN=60000
258+
export USAGE_METERING=prometheus
259+
```
260+
261+
#### OpenMeter (Stripe / ClickHouse)
262+
263+
```bash
264+
# No additional dependencies needed - uses direct HTTP API
265+
export MAX_TOKENS_PER_MIN=60000 # Required: enables quota middleware
266+
export USAGE_METERING=openmeter # Required: activates OpenMeter backend
267+
export OPENMETER_API_KEY=your-api-key-here # Required: API authentication
268+
export OPENMETER_URL=https://openmeter.cloud # Optional: defaults to https://openmeter.cloud
269+
```
270+
271+
Events are sent directly to OpenMeter's HTTP API and are processed by the LLM tokens meter for billing integration with Stripe.
272+
273+
> **⚠️ All three variables are required for OpenMeter to work:**
274+
> - `MAX_TOKENS_PER_MIN` enables the quota middleware that records usage events
275+
> - `USAGE_METERING=openmeter` activates the OpenMeter backend
276+
> - `OPENMETER_API_KEY` provides authentication to OpenMeter's API
277+
278+
The gateway gracefully falls back to `NullUsageBackend` if any required variable is missing.
279+
280+
### Scraping metrics
281+
282+
```bash
283+
curl -H "Authorization: Bearer $JWT" http://localhost:8080/metrics
284+
```
285+
237286
## Token quotas
238287

239288
Attach Gateway can enforce per-user token limits. Install the optional
240-
dependency with `pip install attach-gateway[quota]` and set
289+
dependency with `pip install attach-dev[quota]` and set
241290
`MAX_TOKENS_PER_MIN` in your environment to enable the middleware. The
242291
counter defaults to the `cl100k_base` encoding; override with
243292
`QUOTA_ENCODING` if your model uses a different tokenizer. The default
244293
in-memory store works in a single process and is not shared between
245294
workers—requests retried across processes may be double-counted. Use Redis
246295
for production deployments.
296+
If `tiktoken` is missing, a byte-count fallback is used which counts about
297+
four times more tokens than the `cl100k` tokenizer – install `tiktoken` in
298+
production.
247299

248300
### Enable token quotas
249301

250302
```bash
251303
# Optional: Enable token quotas
252304
export MAX_TOKENS_PER_MIN=60000
253-
pip install tiktoken # or pip install attach-gateway[quota]
305+
pip install tiktoken # or pip install attach-dev[quota]
254306
```
255307

256308
To customize the tokenizer:
257309
```bash
258310
export QUOTA_ENCODING=cl100k_base # default
259311
```
260312

261-
## Roadmap
262-
263-
* **v0.2** — Protected‑resource metadata endpoint (OAuth 2.1), enhanced DID resolvers.
264-
* **v0.3** — Token‑exchange (RFC 8693) for on‑behalf‑of delegation.
265-
* **v0.4** — Attach Store v1 (Git‑style, policy guards).
266-
267313
---
268314

269315
## License

attach/__init__.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,20 @@
44
Add OIDC SSO, agent-to-agent handoff, and pluggable memory to any Python project.
55
"""
66

7-
__version__ = "0.2.2"
7+
__version__ = "0.3.7"
88
__author__ = "Hammad Tariq"
99
__email__ = "hammad@attach.dev"
1010

11-
# Clean imports - no sys.path hacks needed since everything will be in the wheel
12-
from .gateway import create_app, AttachConfig
11+
# Remove this line that causes early failure:
12+
# from .gateway import create_app, AttachConfig
13+
14+
# Optional: Add lazy import for convenience
15+
def create_app(*args, **kwargs):
16+
from .gateway import create_app as _real
17+
return _real(*args, **kwargs)
18+
19+
def AttachConfig(*args, **kwargs):
20+
from .gateway import AttachConfig as _real
21+
return _real(*args, **kwargs)
1322

1423
__all__ = ["create_app", "AttachConfig", "__version__"]

attach/__main__.py

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
CLI entry point - replaces the need for main.py in wheel
33
"""
44
import uvicorn
5-
from .gateway import create_app
5+
import click
66

77
def main():
88
"""Run Attach Gateway server"""
@@ -13,17 +13,42 @@ def main():
1313
except ImportError:
1414
pass # python-dotenv not installed, that's OK for production
1515

16-
import click
17-
1816
@click.command()
1917
@click.option("--host", default="0.0.0.0", help="Host to bind to")
2018
@click.option("--port", default=8080, help="Port to bind to")
2119
@click.option("--reload", is_flag=True, help="Enable auto-reload")
2220
def cli(host: str, port: int, reload: bool):
23-
app = create_app()
24-
uvicorn.run(app, host=host, port=port, reload=reload)
21+
try:
22+
# Import here AFTER .env is loaded and CLI is parsed
23+
from .gateway import create_app
24+
app = create_app()
25+
uvicorn.run(app, host=host, port=port, reload=reload)
26+
except RuntimeError as e:
27+
_friendly_exit(e)
28+
except Exception as e: # unexpected crash
29+
click.echo(f"❌ Startup failed: {e}", err=True)
30+
raise click.Abort()
2531

2632
cli()
2733

34+
def _friendly_exit(err):
35+
"""Convert RuntimeError to clean user message."""
36+
err_str = str(err)
37+
38+
if "OPENMETER_API_KEY" in err_str:
39+
msg = (f"❌ {err}\n\n"
40+
"💡 Fix:\n"
41+
" export OPENMETER_API_KEY=\"sk_live_...\"\n"
42+
" (or) export USAGE_METERING=null # to disable metering\n\n"
43+
"📖 See README.md for complete setup")
44+
else:
45+
msg = (f"❌ {err}\n\n"
46+
"💡 Required environment variables:\n"
47+
" export OIDC_ISSUER=\"https://your-domain.auth0.com/\"\n"
48+
" export OIDC_AUD=\"your-api-identifier\"\n\n"
49+
"📖 See README.md for complete setup instructions")
50+
51+
raise click.ClickException(msg)
52+
2853
if __name__ == "__main__":
2954
main()

attach/gateway.py

Lines changed: 58 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,33 @@
33
"""
44

55
import os
6+
from contextlib import asynccontextmanager
67
from typing import Optional
78

89
import weaviate
910
from fastapi import APIRouter, FastAPI, HTTPException, Request
11+
from fastapi.middleware.cors import CORSMiddleware
12+
from starlette.middleware.base import BaseHTTPMiddleware
1013
from pydantic import BaseModel
1114

1215
from a2a.routes import router as a2a_router
13-
14-
# Clean relative imports
15-
from auth import verify_jwt
1616
from auth.oidc import _require_env
17-
18-
# from logs import router as logs_router
17+
import logs
18+
logs_router = logs.router
1919
from mem import get_memory_backend
2020
from middleware.auth import jwt_auth_mw
21-
from middleware.quota import TokenQuotaMiddleware
2221
from middleware.session import session_mw
2322
from proxy.engine import router as proxy_router
23+
from usage.factory import _select_backend, get_usage_backend
24+
from usage.metrics import mount_metrics
25+
from utils.env import int_env
26+
27+
# Guard TokenQuotaMiddleware import (matches main.py pattern)
28+
try:
29+
from middleware.quota import TokenQuotaMiddleware
30+
QUOTA_AVAILABLE = True
31+
except ImportError: # optional extra not installed
32+
QUOTA_AVAILABLE = False
2433

2534
# Import version from parent package
2635
from . import __version__
@@ -49,7 +58,7 @@ async def get_memory_events(request: Request, limit: int = 10):
4958
return {"data": {"Get": {"MemoryEvent": []}}}
5059

5160
result = (
52-
client.query.get("MemoryEvent", ["timestamp", "role", "content"])
61+
client.query.get("MemoryEvent", ["timestamp", "event", "user", "state"])
5362
.with_additional(["id"])
5463
.with_limit(limit)
5564
.with_sort([{"path": ["timestamp"], "order": "desc"}])
@@ -97,6 +106,21 @@ class AttachConfig(BaseModel):
97106
auth0_client: Optional[str] = None
98107

99108

109+
@asynccontextmanager
110+
async def lifespan(app: FastAPI):
111+
"""Manage application lifespan - startup and shutdown."""
112+
# Startup
113+
backend_selector = _select_backend()
114+
app.state.usage = get_usage_backend(backend_selector)
115+
mount_metrics(app)
116+
117+
yield
118+
119+
# Shutdown
120+
if hasattr(app.state.usage, 'aclose'):
121+
await app.state.usage.aclose()
122+
123+
100124
def create_app(config: Optional[AttachConfig] = None) -> FastAPI:
101125
"""
102126
Create a FastAPI app with Attach Gateway functionality
@@ -127,17 +151,38 @@ def create_app(config: Optional[AttachConfig] = None) -> FastAPI:
127151
title="Attach Gateway",
128152
description="Identity & Memory side-car for LLM engines",
129153
version=__version__,
154+
lifespan=lifespan,
155+
)
156+
157+
@app.get("/auth/config")
158+
async def auth_config():
159+
return {
160+
"domain": config.auth0_domain,
161+
"client_id": config.auth0_client,
162+
"audience": config.oidc_audience,
163+
}
164+
165+
# Add middleware in correct order (CORS outer-most)
166+
app.add_middleware(
167+
CORSMiddleware,
168+
allow_origins=["http://localhost:9000", "http://127.0.0.1:9000"],
169+
allow_methods=["*"],
170+
allow_headers=["*"],
171+
allow_credentials=True,
130172
)
173+
174+
# Only add quota middleware if available and explicitly configured
175+
limit = int_env("MAX_TOKENS_PER_MIN", 60000)
176+
if QUOTA_AVAILABLE and limit is not None:
177+
app.add_middleware(TokenQuotaMiddleware)
131178

132-
# Add middleware
133-
app.middleware("http")(jwt_auth_mw)
134-
app.middleware("http")(session_mw)
135-
app.add_middleware(TokenQuotaMiddleware)
179+
app.add_middleware(BaseHTTPMiddleware, dispatch=jwt_auth_mw)
180+
app.add_middleware(BaseHTTPMiddleware, dispatch=session_mw)
136181

137182
# Add routes
138-
app.include_router(a2a_router)
183+
app.include_router(a2a_router, prefix="/a2a")
139184
app.include_router(proxy_router)
140-
# app.include_router(logs_router)
185+
app.include_router(logs_router)
141186
app.include_router(mem_router)
142187

143188
# Setup memory backend
File renamed without changes.

0 commit comments

Comments
 (0)