A modular, token-aware data sampler that uses an LLM agent to decide structured & unstructured sampling strategies. This repo demonstrates turning the sampler into a FastAPI SaaS app.
Features
- Structured (Parquet/CSV) profiling via DuckDB
- Unstructured text chunking + optional semantic sampling (sentence-transformers)
- Pluggable agent adapter (Bedrock/OpenAI/local)
- Token estimation, budget enforcement, and fallbacks
- Async FastAPI endpoints + background tasks
Getting started (dev)
-
Copy environment variables into
.env. -
Build & run: docker build -t sampler-app . docker run -p 8000:8000 --env-file .env sampler-app
-
Open docs: http://localhost:8000/docs
Recommended next steps
- Add persistent storage (Postgres), API key auth, and Prometheus metrics.
- Add unit & integration tests.
License: MIT