Skip to content

Agentic Data Sampler — token-aware, LLM-driven sampler for structured (Parquet/CSV) and unstructured (text) data. FastAPI SaaS-ready with DuckDB profiling, optional semantic sampling (sentence-transformers), and pluggable agent adapters (Bedrock/OpenAI/mock). Budget-aware, reproducible sampling with robust fallbacks, clear pydantic schemas.

Notifications You must be signed in to change notification settings

bhanushakya2004/Agentic-Data-Sampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Agentic Data Sampler — FastAPI SaaS Prototype

A modular, token-aware data sampler that uses an LLM agent to decide structured & unstructured sampling strategies. This repo demonstrates turning the sampler into a FastAPI SaaS app.

Features

  • Structured (Parquet/CSV) profiling via DuckDB
  • Unstructured text chunking + optional semantic sampling (sentence-transformers)
  • Pluggable agent adapter (Bedrock/OpenAI/local)
  • Token estimation, budget enforcement, and fallbacks
  • Async FastAPI endpoints + background tasks

Getting started (dev)

  1. Copy environment variables into .env.

  2. Build & run: docker build -t sampler-app . docker run -p 8000:8000 --env-file .env sampler-app

  3. Open docs: http://localhost:8000/docs

Recommended next steps

  • Add persistent storage (Postgres), API key auth, and Prometheus metrics.
  • Add unit & integration tests.

License: MIT

About

Agentic Data Sampler — token-aware, LLM-driven sampler for structured (Parquet/CSV) and unstructured (text) data. FastAPI SaaS-ready with DuckDB profiling, optional semantic sampling (sentence-transformers), and pluggable agent adapters (Bedrock/OpenAI/mock). Budget-aware, reproducible sampling with robust fallbacks, clear pydantic schemas.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published