Skip to content

An AI “data universe explorer” that turns a plain‑English query into curated, profiled, and visualizable datasets.

License

Notifications You must be signed in to change notification settings

riyanshibohra/Nexora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nexora: Data Explorer Agent

Agentic dataset discovery, profiling, and visualization. From query to analysis‑ready data in minutes.


Overview

Nexora is an AI‑native “data universe explorer” that turns a plain‑English query into curated, profiled, and visualizable datasets. It runs an agentic workflow to search Kaggle, download relevant columnar files (CSV/XLSX/JSON), profile them with pandas, and generate safe matplotlib plots via LLM codegen - all surfaced through a clean FastAPI backend and a React + Three.js frontend.


🎥 Demo

Nexora.mp4

🚀 Features

  • Agentic pipeline (LangGraph + LangChain) orchestrating: search → download → profile → describe → plot
  • Targeted discovery via Tavily + Kaggle API (columnar-first, dedup, file size caps)
  • Profiling with pandas: row/column counts, dtype map, missingness; task fit inference
  • Sandboxed plotting: GPT‑4o/4o‑mini → matplotlib, headless (Agg) in a restricted Python REPL, returns base64 PNG
  • Durable storage: SQLite (WAL) with idempotent upserts and sensible indexes
  • Production-friendly FastAPI with CORS for local dev
  • Frontend: React/Vite + Three.js interactive results, instant previews, one‑click exports

⚙️ Tech Stack

  • Backend: FastAPI, LangGraph, LangChain, pandas, matplotlib, SQLite
  • Tooling/Integrations: Tavily, Kaggle API, python‑dotenv
  • Frontend: React, Vite, Three.js, React Router

🧰 Prerequisites

  • Python 3.12+
  • Node.js 18+
  • Kaggle credentials configured (~/.kaggle/kaggle.json)
  • API keys in environment (.env):
    • OPEN_API_KEY (OpenAI)
    • TAVILY_API_KEY

🔧 Quickstart

  1. Backend setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn backend.main:app --host 127.0.0.1 --port 8000
  1. Frontend setup (in frontend/)
cd frontend
npm install
npm run dev

Visit the app at http://127.0.0.1:5173.


🧠 How It Works

  1. Search: The agent queries Tavily for Kaggle dataset links and ranks for columnar relevance.
  2. Download: Datasets are fetched via Kaggle API with safe filenames and size limits.
  3. Profile: pandas computes rows/cols, dtypes, and missingness with resilient CSV/Excel/JSON parsing.
  4. Describe: A concise dataset description and task fit (classification/regression/etc.).
  5. Plot: GPT‑4o‑mini generates matplotlib code executed headless in a restricted Python REPL; images are returned as base64.
  6. Persist + Serve: Metadata stored in SQLite (WAL); FastAPI exposes endpoints for the frontend.

🧱 Architecture

Frontend (React/Vite + Three.js)
   │
   ▼
FastAPI (backend/main.py)
   ├─ Agent pipeline (backend/agent.py)
   │   ├─ Tavily search → Kaggle download → pandas profile → LLM describe
   │   └─ LangGraph state machine orchestration
   ├─ Plot agent (backend/plot_agent.py)
   │   └─ GPT‑4o‑mini → matplotlib in restricted PythonREPL (Agg)
   └─ DB layer (backend/db.py, SQLite WAL)

Key endpoints:

  • POST /run-agent — Run the pipeline for a query
  • GET /datasets — List profiled datasets
  • GET /dataset?source_url= — Get one dataset + files
  • GET /file-preview — Sample rows for preview
  • GET /download-file — Download a specific file
  • GET /download-dataset-zip — Zip all available files
  • POST /plot/suggestions — Heuristic plot prompts
  • POST /plot/generate — LLM‑generated matplotlib plot

📦 Environment Variables

Create a .env at the repo root:

OPEN_API_KEY=sk-...
TAVILY_API_KEY=tvly-...

Kaggle setup: ensure ~/.kaggle/kaggle.json exists and is readable.


🤝 Contributing

PRs and issues are welcome. Please open an issue to discuss significant changes.


📄 License

MIT © 2025 Riyanshi Bohra — see LICENSE

About

An AI “data universe explorer” that turns a plain‑English query into curated, profiled, and visualizable datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published