From 91b339420cc2154f34ce13cc90b67e4407d726cc Mon Sep 17 00:00:00 2001 From: Gemini Agent Date: Thu, 19 Feb 2026 17:17:35 +0100 Subject: [PATCH] Add CLAUDE.md and repoint agent/gemini symlinks Create a dedicated CLAUDE.md with project conventions, git workflow (PR-based), tech stack, testing instructions, and common operations. Repoint agent.md and gemini.md symlinks from README.md to CLAUDE.md. --- CLAUDE.md | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ agent.md | 2 +- gemini.md | 2 +- 3 files changed, 103 insertions(+), 2 deletions(-) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..caee0b3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,101 @@ +# AI Knowledge Base - Agent Instructions + +## Project Overview + +AI-powered knowledge base seeded from Confluence, with Slack bot interface, RAG pipeline, and Graphiti/Neo4j knowledge graph. + +- **GCP Project**: `ai-knowledge-base-42`, region `us-central1` +- **Architecture doc**: `docs/ARCHITECTURE.md` (read this first) +- **ADRs**: `docs/adr/` (design decisions and rationale) + +## Git Workflow + +- **NEVER push directly to `main`** — always create a feature branch and open a PR +- Branch naming: `feature/`, `fix/`, `chore/`, `refactor/` prefixes +- CI/CD runs on every push: unit tests, build, deploy staging, e2e tests, deploy production +- PRs require passing CI before merge + +```bash +git checkout -b feature/my-change +# ... make changes ... +git push -u origin feature/my-change +gh pr create --title "Add my change" --body "..." +``` + +## Tech Stack + +| Component | Technology | Notes | +|-----------|------------|-------| +| Knowledge Graph | Neo4j 5.26 + Graphiti-core | Source of truth for all knowledge data | +| Metadata DB | SQLite + SQLAlchemy 2.0 async (NullPool) | Page sync, checkpoints, feedback | +| LLM | Gemini 2.5 Flash (Vertex AI) | Entity extraction, answer generation | +| Embeddings | Vertex AI text-embedding-005 | 768-dim vectors | +| Bot | Slack Bolt (HTTP mode) | Primary user interface | +| Infra | Cloud Run (bot, jobs) + GCE (Neo4j) | Terraform in `deploy/terraform/` | + +## Key Conventions + +### Code + +- Python 3.11, async/await for I/O +- Type hints on all function signatures +- `pydantic-settings` for config — all settings via environment variables, no hardcoded values +- SQLite uses NullPool (connections close immediately) and WAL mode +- Checkpoint writes use raw `aiosqlite` (bypass SQLAlchemy to avoid lock contention) +- Tests in `tests/` — unit, integration, e2e subdirectories + +### Pipeline + +- CLI: `python -m knowledge_base.cli pipeline` +- Steps: download (Confluence) -> parse (chunks) -> index (Graphiti) +- `ConfluenceDownloader(index_to_graphiti=False)` in pipeline — Step 3 handles indexing +- Checkpoints persisted to GCS FUSE after every batch (see ADR-0010) +- Resume is automatic — already-indexed chunks are skipped on restart + +### Docker Images + +Two separate images — must rebuild both when code changes: +- `Dockerfile.jobs` -> `jobs:latest` (pipeline, background tasks) +- `Dockerfile.slack` -> `slack-bot:latest` (Slack bot service) + +### Environments + +- **Staging**: Neo4j at `bolt+s://neo4j.staging.keboola.dev:443` +- **Production**: Neo4j at `bolt://10.0.0.27:7687` (internal VPC) +- Always test on staging first before production +- Use `--dry-run` when available + +### Terraform + +- Located in `deploy/terraform/` +- `google-beta` provider required for GCS FUSE volumes +- Run `terraform plan` before `terraform apply` +- State locks can go stale — use `terraform force-unlock` if needed + +## Testing + +```bash +# Unit + integration tests +python -m pytest tests/ -v + +# E2E tests (needs staging secrets) +./scripts/setup-e2e-env.sh +set -a && source .env.e2e && set +a +python -m pytest tests/e2e/ -v +``` + +## Common Operations + +```bash +# Check pipeline job +gcloud run jobs executions list --job=sync-pipeline --region=us-central1 --project=ai-knowledge-base-42 --limit=5 + +# Check staging sync +gcloud run jobs executions list --job=confluence-sync-staging --region=us-central1 --project=ai-knowledge-base-42 --limit=5 + +# Build and push jobs image +printf 'steps:\n - name: "gcr.io/cloud-builders/docker"\n args: ["build", "-t", "us-central1-docker.pkg.dev/ai-knowledge-base-42/knowledge-base/jobs:latest", "-f", "deploy/docker/Dockerfile.jobs", "."]\nimages:\n - "us-central1-docker.pkg.dev/ai-knowledge-base-42/knowledge-base/jobs:latest"\n' | gcloud builds submit --config /dev/stdin --project=ai-knowledge-base-42 . + +# Build and push slack-bot image +printf 'steps:\n - name: "gcr.io/cloud-builders/docker"\n args: ["build", "-t", "us-central1-docker.pkg.dev/ai-knowledge-base-42/knowledge-base/slack-bot:latest", "-f", "deploy/docker/Dockerfile.slack", "."]\nimages:\n - "us-central1-docker.pkg.dev/ai-knowledge-base-42/knowledge-base/slack-bot:latest"\n' | gcloud builds submit --config /dev/stdin --project=ai-knowledge-base-42 . +``` diff --git a/agent.md b/agent.md index 42061c0..681311e 120000 --- a/agent.md +++ b/agent.md @@ -1 +1 @@ -README.md \ No newline at end of file +CLAUDE.md \ No newline at end of file diff --git a/gemini.md b/gemini.md index 42061c0..681311e 120000 --- a/gemini.md +++ b/gemini.md @@ -1 +1 @@ -README.md \ No newline at end of file +CLAUDE.md \ No newline at end of file