Skip to content

feat: full hackathon pipeline refactor (milestones 1-3) + CLI polish#80

Merged
madjin merged 36 commits intomainfrom
milestone-1-dry-foundations
Mar 1, 2026
Merged

feat: full hackathon pipeline refactor (milestones 1-3) + CLI polish#80
madjin merged 36 commits intomainfrom
milestone-1-dry-foundations

Conversation

@madjin
Copy link
Member

@madjin madjin commented Feb 28, 2026

Overview

Full implementation of the Hackathon Pipeline Refactor across all 3 milestones, plus follow-on CLI polish. The refactor's goal: eliminate a 3,395-line god file, deduplicate config/vote-weight/DB access scattered across 6+ files, and create a unified clanktank CLI entry point.


Milestone 1 — DRY Foundations + Script Consolidation

  • hackathon/backend/config.py (new) — single source for all os.getenv() reads, shared sqlite3 context manager, vote weight calculation; eliminates 6-file duplication of HACKATHON_DB_PATH and 3x copy-pasted vote weight logic
  • hackathon/backend/http_client.py (new) — shared retry session with 30s timeout ceiling for all external API calls (OpenRouter, GitHub, Helius, Perplexity had no timeout or retry)
  • Archived dead code: standalone_generator/ and add_required_constraints.py moved to hackathon/tmp/
  • Merged research CLI into hackathon_manager — research logic stays as library in research.py, manager gains --research subcommand
  • Created hackathon/__main__.py — master CLI dispatcher; pyproject.toml gains clanktank script entry point

Milestone 2 — Split app.py Into Router Modules

  • hackathon/backend/models.py (new) — all 15+ inline Pydantic models extracted from app.py
  • hackathon/backend/routes/auth.py (new) — Discord OAuth + session routes (~350 lines)
  • hackathon/backend/routes/voting.py (new) — Solana vote collection + WebSocket prize pool (~650 lines)
  • hackathon/backend/routes/submissions.py (new) — submission CRUD, scoring, leaderboard, image upload (~1,000 lines)
  • app.py reduced from 3,395 → ~300 lines — composition root only; all tests pass, OpenAPI schema unchanged
  • generate_static_data() extracted to hackathon/scripts/generate_static_data.py

Milestone 3 — Performance Fixes + DB Convergence

  • Fixed N+1 query in list_submissions — per-submission score query inside loop replaced with CTE precompute (query count: N+1 → 2)
  • Deduplicated leaderboard/stats queries — shared helper functions replace repeated SQL in versioned + latest endpoints
  • Migrated all scripts to config.get_connection()collect_votes.py, discord_bot.py, upload_youtube.py, generate_episode.py now use a single DB access pattern
  • discord_bot.py gains 30s DB timeout (was infinite — real bug fix)
  • Removed unused SQLAlchemy import from populate_prize_pool.py

Follow-on CLI Polish (post-milestones)

  • Merged submission <id> + submissions into one smart command — positional ID for detail view, -s/--search for full-text search, -j/--json for pipe-friendly output, -b/--brief to skip scores/research
  • Fixed clanktank serve crash — replaced import-time load of hackathon.backend.app (pulls in uvicorn against project venv) with subprocess.run(["uvicorn", ...])
  • Fixed judge score table alignment — ANSI escape codes counted as column width by f"{judge:<14}"; now pads by visible character length
  • Replaced custom upload route with StaticFiles mount — removed hand-rolled GET /api/uploads/{filename} in favour of app.mount("/api/uploads", StaticFiles(...)); adds ETags, range requests, correct content-type; data/uploads/ auto-created on startup

Stats

Metric Value
app.py reduction 3,395 → ~300 lines
Net code moved ~3,000 lines → route modules + models
Net code archived ~1,000 lines → tmp/
Dead code removed ~100 lines
New shared modules config.py, http_client.py, models.py, main.py, routes/*
Public API contract Unchanged (OpenAPI diff: empty)

Test plan

  • ruff check hackathon/ — zero warnings
  • pytest hackathon/tests/ — no regressions vs baseline
  • clanktank submissions — text table, all rows
  • clanktank submissions 10 — detail view, aligned judge scores
  • clanktank submissions -s "gaming" — case-insensitive search
  • clanktank submissions -j — JSON array, no ANSI
  • clanktank submissions 10 -j -b — lean JSON object
  • clanktank serve — no crash at startup
  • GET /api/uploads/<file> — served via StaticFiles

🤖 Generated with Claude Code

madjin and others added 13 commits February 27, 2026 18:51
…d CLI

- Create hackathon/backend/config.py centralizing env vars, get_connection(),
  and calculate_vote_weight() (was duplicated 3x in app.py)
- Create hackathon/backend/http_client.py with retry/timeout session factory
- Replace inline os.getenv() calls across 6 files with config imports
- Replace bare requests.post/get in 3 files with session-based calls
- Delete debug_request_middleware, commented imports, dead vote weight copies
  (~100 lines removed from app.py)
- Remove standalone_generator/ and add_required_constraints.py (dead code)
- Merge research CLI into hackathon_manager (--research flag)
- Create hackathon/__main__.py unified CLI dispatcher (python -m hackathon)
- Update pyproject.toml with [project.scripts] entry
- Update CLAUDE.md with new unified CLI docs

Net: -866 lines, 3 new modules, 1 unified entry point

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
app.py reduced from 3,312 → 263 lines. Extracted by domain:

- hackathon/backend/models.py: all 15 Pydantic models
- hackathon/backend/routes/auth.py: Discord OAuth helpers + 4 auth routes
- hackathon/backend/routes/submissions.py: like/dislike, submission CRUD,
  upload, leaderboard, stats, versioned endpoints, feedback, deprecated stubs
- hackathon/backend/routes/voting.py: BirdeyePriceService, prize pool,
  community scores, webhooks, WebSocket, vote stats
- hackathon/scripts/generate_static_data.py: moved from app.py

Also: add argparse to populate_prize_pool.py, cache_discord_avatars.py,
create_db.py; fix populate_prize_pool.py to use config.get_connection
instead of importing engine from app.

Zero API contract changes. Tests identical to baseline (12p/7f/25e).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Step 1: Fix N+1 query in list_submissions
- Replace per-submission score query in loop with a CTE that pre-aggregates
  avg_score and judge_count for all submissions in one query
- list_submissions now runs 1 query instead of N+1

Step 2: Deduplicate leaderboard/stats queries
- Extract _get_leaderboard_data(conn, version) shared helper
- Extract _get_stats_data(conn, version) shared helper
- get_leaderboard_latest() and get_leaderboard() both delegate to helper
- get_stats_latest() and get_stats() both delegate to helper
- Removes ~70 lines of duplicate SQL

Step 3: Standardize DB connections
- discord_bot.py: add timeout=30 + row_factory (was no timeout — deadlock risk)
- collect_votes.py: add row_factory, use HACKATHON_DB_PATH from config
- upload_youtube.py: add timeout=30 + row_factory to all 3 connections
- generate_episode.py: add timeout=30 + row_factory

Step 4: Remove import fallbacks + clean up
- research.py: remove try/except ModuleNotFoundError schema fallback
- upload_youtube.py: remove try/except ModuleNotFoundError schema fallback
- generate_episode.py: replace sys.path hack with proper module imports
- All hackathon imports now at top of file (no more mid-file noqa: E402)

Gate: ruff clean, tests identical to baseline (12p/7f/25e),
no os.getenv("HACKATHON_DB_PATH") outside config.py, no schema import fallbacks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pyproject.toml: remove hackathon script alias, keep clanktank; add
  build-system (hatchling) + tool.uv.package=true so uv run clanktank works
- __main__.py: prog="clanktank" so --help shows correct name
- CLAUDE.md: update pipeline examples to use clanktank command

python -m hackathon still works as backward compat (package path unchanged)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- models.py: replace Optional[str] with str | None in create_model()
  calls so UP045 never fires (works fine with pydantic v2)
- pyproject.toml: remove UP045 from ignore list (no longer needed)
- api.ts: use vite proxy /api in dev mode instead of hardcoded
  localhost:8000 — frontend now works without changing URL when
  backend is on a different port or host

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- __main__.py: subcommands now ordered db→serve→research→score→votes→
  synthesize→leaderboard→episode→upload with [step N] labels and
  epilog showing the full pipeline flow
- README.md: full rewrite using clanktank CLI, pipeline-first structure
- models.py: Optional[str] → str | None (Pydantic v2 compatible)
- pyproject.toml: remove UP045 ignore (no longer needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
config command:
  clanktank config           — show env var status (set/missing/optional)
  clanktank config --generate — write .env template to repo root
  Standalone (no hackathon deps needed) so it works before setup

Color scheme in help output:
  blue   = infrastructure  (config, db, serve)
  yellow = write/pipeline  (research, score, votes, synthesize, episode)
  green  = read-only       (leaderboard, static-data)
  red    = irreversible    (upload)
  No-ops when stdout is not a TTY (pipes, CI)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove --generate/--force (bad: could overwrite a working .env)
- Add --setup: interactive wizard that prompts ONLY for missing required
  vars and writes them one at a time via _set_env_key()
- _set_env_key() updates existing KEY= lines in-place or appends new
  ones — never touches other lines, comments, or existing values
- Default `clanktank config` is now purely read-only status display

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
os.getenv() only sees exported shell vars, not .env file contents.
Now parses .env inline (no dotenv import) so status accurately shows
what's configured, whether vars are exported to the shell or not.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three new read-only commands (stdlib sqlite3 only, no package deps):

  clanktank stats
    - Total count, status breakdown with progress bars, category breakdown
    - Score summary (avg, range, how many scored) if scores exist

  clanktank submissions [--status <status>]
    - Aligned table: ID, project, category, status, score, judge count
    - Status color-coded, sorted by score desc

  clanktank submission <id> [--brief]
    - Header: name, status, category, description excerpt
    - Links: github, demo, discord, twitter, solana, submitted date
    - Per-judge scores: Inn/Tech/Mkt/UX + weighted total + community bonus
    - Round 2 final verdicts (skipped with --brief)
    - Research summary if available (skipped with --brief)

All commands read DB path from env / .env file directly (no imports).
Gracefully handles empty DB.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Consolidate `submission <id>` and `submissions` into a single
  `submissions` command with three modes: list, detail (positional id),
  and search (-s). Adds -j/--json and -b/--brief flags.
- Fix `clanktank serve` crash: call uvicorn via subprocess instead of
  importing hackathon.backend.app (which pulls in uvicorn at module level,
  missing from project venv)
- Fix judge name column alignment in score table: ANSI escape codes were
  counted as width by the format spec; now pad by visible char length
- Replace hand-rolled GET /api/uploads/{filename} route with a StaticFiles
  mount at /api/uploads — handles ETags, range requests, and content-type
  detection; auto-creates data/uploads/ on startup

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@madjin madjin changed the title feat: smart submissions command, serve fix, score alignment, StaticFiles uploads feat: full hackathon pipeline refactor (milestones 1-3) + CLI polish Feb 28, 2026
madjin and others added 2 commits February 28, 2026 00:22
- Fix project_image paths in DB for all 16 submissions (now point to
  /media/projects/ files that already exist in the frontend public dir)
- Add project_image to submissions CLI JSON output (both list and detail modes)
- Fix leaderboard KeyError: 'team_name' → use 'category' from actual query
- Fix episode --validate-only to exit 1 when --episode-file is missing
- Wire pyproject.toml dependencies (was empty []); uv sync now works
- Add data/hackathon.db to git as reference dataset (unignore exception)
- Remove empty data/uploads/ directory
- Add hackathon/tests/test_cli.sh: 89 CLI integration tests, 0 skipped
- Regenerate static JSON (submissions, leaderboard, stats, api mirror)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Moved 5 outdated READMEs to tmp/ (gitignored) rather than deleting:
  hackathon/backend/README.md       → tmp/README-hackathon-backend.md
  hackathon/scripts/README.md       → tmp/README-hackathon-scripts.md
  hackathon/dashboard/README.md     → tmp/README-hackathon-dashboard.md
  hackathon/tests/README.md         → tmp/README-hackathon-tests.md
  hackathon/dashboard/frontend/README.md (320 lines) → trimmed to 60 lines

Root README.md: rewritten around current reality — hackathon edition as
primary deployment, clanktank CLI quick start, uv setup, accurate tech stack.
Old version archived to tmp/README-root-old.md.

hackathon/README.md: add test_cli.sh to dev commands, fix validate-only
example to include required --episode-file flag.

Net: ~2100 lines removed from tracked docs. CLAUDE.md + hackathon/README.md
remain the authoritative developer references.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@madjin
Copy link
Member Author

madjin commented Feb 28, 2026

Follow-up: quick wins + docs (commits f6daa99, de9207e)

Bugs fixed

  • Leaderboard crashKeyError: 'team_name' in hackathon_manager.py display code (query returns category, header said "Team")
  • episode --validate-only — exited 0 when --episode-file was missing; now exits 1
  • project_image missing from CLI JSONsubmissions -j and submissions <id> -j both omitted the field; fixed in both list-mode query and detail-mode output

Images restored

All 16 project images were broken in live API mode (9 had NULL project_image, 7 had /api/uploads/UUID.jpg pointing to an empty data/uploads/). The image files already existed at public/media/projects/ — DB just needed to be synced. Updated all 16 rows and regenerated static JSON.

Removed the now-orphaned empty data/uploads/ directory.

Dependencies wired

pyproject.toml had dependencies = []. Added all runtime deps so uv sync produces a working environment. Also discovered and added slowapi, base58, python-multipart that were missing from hackathon/requirements.txt.

data/hackathon.db now tracked

Added !data/hackathon.db exception to .gitignore — 2MB reference dataset, static JSON on GitHub Pages is already derived from it.

CLI test suite

hackathon/tests/test_cli.sh — 89 integration tests covering every subcommand and meaningful flag combination. Auto-detects missing deps and skips gracefully. Currently: 89/89 pass, 0 skipped.

README consolidation

Moved 5 stale subdirectory READMEs to tmp/ (gitignored). Rewrote root README.md around current reality (hackathon edition as primary, clanktank CLI quick start). Trimmed frontend README from 320 → 60 lines. Net: ~2100 lines removed from tracked docs.

- submissions.py: remove redundant `import json` inside function body
  (json already imported at module level line 3)
- __main__.py: add comment to bare except explaining intentional suppression
  (malformed research JSON → skip research section in CLI output)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@madjin
Copy link
Member Author

madjin commented Feb 28, 2026

Fixed the two code quality bot flags in 87560b7:

  • submissions.py:406 — removed redundant import json inside function body (already imported at module level)
  • __main__.py:316 — added comment to bare except: pass clarifying the intentional silent suppression (malformed research JSON → skip section)

madjin and others added 4 commits February 28, 2026 00:49
config.py: AI_MODEL_NAME default was "anthropic/claude-3-opus" (outdated,
opinionated). Now defaults to "" — callers must set AI_MODEL_NAME in .env.
Promoted from optional to required in clanktank config output.
Suggested values documented in comment: openrouter/auto, or any specific
model ID (e.g. anthropic/claude-sonnet-4-5).

github_analyzer.py: had its own local default "moonshotai/kimi-k2:free"
that bypassed config.py entirely. Removed — now falls through to env var
same as rest of codebase.

pyproject.toml: tightened stale lower bounds:
  anthropic>=0.20 → >=0.50  (SDK changed significantly at 0.50+)
  openai>=1.0     → >=2.0   (installed: 2.24.0)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
No package versions changed — all were already at latest.
Lockfile now reflects the updated anthropic>=0.50 and openai>=2.0 bounds.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
`sys.exit(1)` at line 476 would crash with NameError when
`--validate-only` is used without `--episode-file`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@madjin
Copy link
Member Author

madjin commented Feb 28, 2026

Final review: security audit + fix (4938ea9)

Fix applied

  • generate_episode.py: Added missing import syssys.exit(1) at line 476 would crash with NameError when --validate-only is used without --episode-file

Security audit results

10 areas verified solid:

Area Status
Webhook auth HMAC compare_digest, production-enforced secret
Discord OAuth Proper code exchange, test token blocked in prod
Submission ownership Edit + upload verify owner_discord_id match
SQL injection Table name whitelist (ALLOWED_TABLES frozenset)
Rate limiting 5/min on POST/PUT via slowapi
CORS Production locked to clanktank.tv
Credential masking CLI config shows (set) for secrets
Audit logging Security events tracked in simple_audit
File upload Full chain: filename, MIME, magic bytes, Pillow verify, dimensions, EXIF strip, UUID filenames
Security headers HSTS, CSP, X-Frame-Options, X-Content-Type-Options

Test results (no regressions)

  • ruff check hackathon/ — all checks passed
  • pytest — 12 passed, 7 failed, 25 errors (matches baseline exactly)

Low-priority items (not blocking)

  • print() in submissions.py could become logging.info()
  • allow_headers=["*"] in CORS could be narrowed
  • Script-level SQL uses f-string table names (mitigated by argparse choices)

🤖 Generated with Claude Code

madjin and others added 3 commits February 28, 2026 15:44
The hackathon.db was emptied during CLI test runs (db create
reinitializes tables). Restored 16 submissions + 260 scores.
Cleared simple_audit table — audit logs shouldn't be tracked in git.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Leaderboard now shows both rounds side by side with R1 and R2 columns.
Default sorts by latest round (R2); --round 1 sorts by first impression.
Removes the combined/Final column — rankings shift meaningfully between
rounds (e.g. Refraktor #2#1, Dev Rel #12#7).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scores:
- Add UNIQUE(submission_id, judge_name, round) to hackathon_scores so
  INSERT OR REPLACE actually upserts (was inserting 3-4 dupes per judge)
- Dedup existing data: 260 → 128 rows
- Merge R1 + R2 into one table (R2 only has totals, no category breakdown)
- Commentary grouped by judge with R1/R2 together

Output:
- Submission detail is now clean markdown (headers, tables, lists) that
  pipes to LLM processes without ANSI noise
- Box-drawing table in TTY with ANSI-aware padding
- Full column names: Innovation, Technical, Market, UX

Images:
- Fix all 16 project_image paths: /api/uploads/UUID.jpg (404) →
  /media/projects/<name>_project.jpg (actual files)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ASCII banner + submission count when run with no args
- Safe `db create` refuses to overwrite existing DB without -f/--force
- Fix static-data color from green (read) to yellow (write)
- Contextual tips after submissions, leaderboard, stats, score output
- Add missing env vars to `clanktank config` (JUDGE_CONFIG, HELIUS_API_KEY, etc.)
- Mark required vars with * in config display
- Add -r/--research flag to `submissions <id>` for full markdown research view
- Add openapi.json URL to help epilog (replaces broken /docs reference)
- Fix CSP blocking Swagger UI: relax script-src for /docs and /redoc paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
madjin and others added 4 commits February 28, 2026 16:44
Scores and rankings updated to reflect deduplicated judge scores
from 704c2a4. No logic changes, just regenerated JSON.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wraps scripts/recorder.js as step 9 in the pipeline, between episode
generation and YouTube upload. Validates node availability, forwards
all recorder flags (--headless, --format, --date, --stop-at, etc.),
and prints the full command for transparency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…re commands

- `clanktank config JUDGE_CONFIG` renders judge personas and scoring weights
- `clanktank config RESEARCH_CONFIG` renders penalty thresholds and prompt template
  with highlighted template variables
- Any JSON env var gets formatted output; non-JSON vars print raw value
- `clanktank research`, `score`, `synthesize` with no args now show help
  instead of dispatching to hackathon_manager and erroring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…H_CONFIG`

JUDGE_CONFIG now renders:
- Personas as headed sections with full text
- Criteria table with descriptions and max scores
- Weight multipliers table with bold >1.0 / dim <1.0
- Score scale, scoring task, and round2 template as readable text
  with highlighted {template_vars}

RESEARCH_CONFIG now renders:
- Penalty thresholds as key-value table
- All prompt templates (research + github analysis) with highlighted
  {template_vars}, rendered in natural reading order

Generic JSON vars still get pretty-printed as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
madjin and others added 3 commits February 28, 2026 18:34
…on/frontend/

The intermediate dashboard/ directory was a leftover from when app.py
also lived there. Now that the backend is at hackathon/backend/, the
frontend sits as a sibling at hackathon/frontend/.

- git mv hackathon/dashboard/frontend → hackathon/frontend
- Fix relative paths in vite.config.ts, package.json, sync_schema_to_frontend.py
- Update all backend/script/CI/doc references to new location
- Fix stale recovery_tool.py backup dir default
- Move dashboard leftovers (venv, requirements.txt, __pycache__) to tmp/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move hackathon/scripts/recovery_tool.py to tmp/deprecated/ (backup dir
  it reads from was never populated; static-data JSON + DB copies are the
  real backup story)
- Remove `clanktank recovery` CLI subcommand and README docs
- Delete stale hackathon/dashboard/requirements.txt
- Add regenerated submissionSchema.ts from sync-schema
- Include pending branch changes: config.py load_json_config helper,
  clanktank config edit/rich-print, prompt tweaks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove hackathon/dashboard/ leftovers, move scripts/legacy/ and
scripts/cli.py to tmp/deprecated/, relocate recorder.js into
hackathon/scripts/ and fix the path reference in __main__.py.
Remove stale Main Platform Pipeline section from CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
madjin and others added 2 commits February 28, 2026 19:15
recorder.js already defaults to ./episodes as output dir.
Move existing recordings into episodes/recordings/ and update
all references in scripts, .gitignore, and .env.example.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove redundant imports, unused variables/functions, and add
explanatory comments to empty except blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
madjin and others added 2 commits February 28, 2026 19:29
- actions/upload-artifact v3 → v4 (v3 deprecated, fails CI)
- actions/setup-python v4 → v5
- Delete unused getSlugFromUrl function in recorder.js (PR #80 review)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@madjin madjin merged commit c209e4b into main Mar 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant