Sync2Cal · nikhilutexas · Feb 6, 2026
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -0,0 +1,36 @@
+name: Tests
+
+on:
+  push:
+    branches: ['**']
+  pull_request:
+    branches: [main]
+  workflow_dispatch:
+
+jobs:
+  tests:
+    name: Run Tests
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+          cache: 'pip'
+
+      - name: Install dependencies
+        run: pip install -r requirements-dev.txt
+
+      - name: Run tests with coverage
+        run: pytest --cov=. --cov-report=term-missing --cov-report=xml -v
+
+      - name: Upload coverage report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-report
+          path: coverage.xml
+          retention-days: 7
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,126 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Sync2Cal Events API — a Python FastAPI application that converts data from various websites into calendar events. Serves JSON and ICS feeds compatible with Google Calendar, Apple Calendar, Outlook, etc. Backend for sync2cal.com.
+
+## Commands
+
+```bash
+# Development
+pip install -r requirements.txt       # Install dependencies
+uvicorn main:app --reload             # Dev server (localhost:8000)
+uvicorn main:app --host 0.0.0.0      # Production server
+
+# Documentation
+# Interactive API docs at http://localhost:8000/docs (auto-generated by FastAPI)
+```
+
+## Architecture
+
+### Tech Stack
+- **Python 3.11+** with **FastAPI** + **Uvicorn**
+- **Requests** for HTTP calls, **BeautifulSoup4** + **lxml** for web scraping
+- **gspread** + **google-auth** for Google Sheets integration
+- **python-dotenv** for environment variable management
+
+### Key Directories
+- `base/` — Core framework: `Event` dataclass, `CalendarBase`, `IntegrationBase`, `mount_integration_routes()`
+- `integrations/` — Individual source integrations (12 total)
+- `docs/` — API endpoint documentation
+- `.cursor/rules/` — Cursor IDE rules (project conventions)
+
+### Plugin Architecture
+Every integration follows the same pattern:
+1. A `CalendarBase` subclass with `fetch_events(*args, **kwargs) -> List[Event]`
+2. An `IntegrationBase` subclass registered in `main.py`
+3. `mount_integration_routes()` auto-creates `GET /<id>/events` endpoint from `fetch_events` signature
+4. The `ics` query param (default `true`) toggles ICS vs JSON response
+
+### Integrations
+| ID | Source | Method | Credentials Required |
+|----|--------|--------|---------------------|
+| twitch | Twitch | API | Yes (TWITCH_CLIENT_ID/SECRET) |
+| google-sheets | Google Sheets | API | Yes (service account JSON) |
+| thetvdb | TheTVDB | API | Yes (API key + bearer token) |
+| sportsdb | TheSportsDB | API | Yes (SPORTSDB_API_KEY) |
+| daily-weather-forecast | OpenWeatherMap | API | Yes (OPENWEATHERMAP_API_KEY) |
+| investing | Investing.com | Scraping | No |
+| imdb | IMDb | Scraping | No |
+| moviedb | TheMovieDB | Scraping | No |
+| wwe | WWE | Scraping | No |
+| shows | TVInsider | Scraping | No |
+| releases | Releases.com | Scraping | No |
+
+### Data Model
+```python
+@dataclass
+class Event:
+    uid: str          # Unique identifier
+    title: str        # Event name
+    start: datetime   # Start datetime
+    end: datetime     # End datetime
+    all_day: bool     # All-day event flag
+    description: str  # Event description
+    location: str     # Event location
+    extra: Dict       # Provider-specific metadata
+```
+
+### API Pattern
+All endpoints follow: `GET /<integration-id>/events?<params>&ics=true|false`
+- Integration ID uses hyphens (e.g., `google_sheets` becomes `/google-sheets/events`)
+- `fetch_events` parameters become query params automatically
+- `ics=true` (default): returns `text/plain` ICS content
+- `ics=false`: returns JSON list of Event objects
+
+## Patterns & Conventions
+
+### Adding New Integrations
+1. Create `integrations/<name>.py` with `<Name>Calendar(CalendarBase)` and `<Name>Integration(IntegrationBase)`
+2. Implement `fetch_events()` — all params must be JSON-serializable primitives with defaults
+3. Register in `main.py`: create integration instance, mount routes via loop
+4. Add any required env vars to `env.template`
+
+### HTTP Requests
+- Always set explicit timeouts (10-20s) on `requests.get/post`
+- Set `User-Agent` header when scraping websites
+- Use `response.raise_for_status()` for error detection
+- Wrap in try/except, raise `HTTPException` with appropriate status codes (400, 401, 429, 500, 502)
+
+### Scraping
+- Use BeautifulSoup with `lxml` parser
+- Skip individual items that fail to parse (don't crash the whole request)
+- Construct deterministic UIDs (e.g., `tmdb-{title}-{date}`)
+
+### All-Day Events
+- Set `all_day=True` and `end = start + timedelta(days=1)`
+
+### ICS Generation
+- `utils.generate_ics()` handles RFC 5545 compliance: line folding, text escaping, VTIMEZONE
+- `utils.make_slug()` for URL-friendly text conversion
+
+## Critical Files
+- `main.py` — App setup, CORS config, integration registration loop
+- `base/routes.py` — `mount_integration_routes()` — the glue that turns `fetch_events` into API endpoints
+- `base/models.py` — `Event` dataclass
+- `base/calendar.py` — `CalendarBase` abstract class
+- `base/integration.py` — `IntegrationBase` abstract class
+- `utils.py` — `generate_ics()` and `make_slug()` utilities
+- `env.template` — Required environment variables reference
+
+## Environment Variables
+See `env.template` for the full list. Key ones:
+- `TWITCH_CLIENT_ID` / `TWITCH_CLIENT_SECRET` — Twitch API
+- `GOOGLE_SHEETS_SERVICE_ACCOUNT_FILE` — Path to service account JSON
+- `THE_TVDB_API_KEY` / `THE_TVDB_BEARER_TOKEN` — TheTVDB API
+- `SPORTSDB_API_KEY` — TheSportsDB API
+- `OPENWEATHERMAP_API_KEY` — Weather integration
+- `CORS_ORIGINS` — Comma-separated allowed origins (defaults to sync2cal.com)
+
+## Gotchas
+- **No deployment config**: No Dockerfile, Procfile, or railway.json exists yet.
+- **CORS with credentials**: `allow_credentials=True` means `allow_origins=["*"]` is not allowed. Must specify exact origins via `CORS_ORIGINS` env var.
+- **Scraping fragility**: IMDb, TMDB, and other scraped sources may break if the site changes its HTML structure.
+- **`multi_calendar`**: Only Twitch uses `multi_calendar=True`. The `master_csv()` method on IntegrationBase is a TODO stub.
diff --git a/docs/plans/2026-02-01-scraper-consolidation-design.md b/docs/plans/2026-02-01-scraper-consolidation-design.md
@@ -0,0 +1,145 @@
+# Scraper Consolidation Design
+
+**Goal:** Retire `sync2cal-custom-scraper` and consolidate all ICS feed generation into `S2C-events-api`, eliminating a redundant Railway service.
+
+**Date:** 2026-02-01
+
+---
+
+## Background
+
+Two services produce ICS calendar feeds from external sources:
+
+| Service | Repo | Domain | Status |
+|---------|------|--------|--------|
+| custom-scraper | sync2cal-custom-scraper | `sync2cal-scraper.up.railway.app` | Private, no tests, no CI |
+| events-api | S2C-events-api | `api.sync2cal.com` | Public, 215 tests, 92% coverage, CI |
+
+All 10 shared integrations are **identical code** (copy-pasted). Events-api additionally has weather integration, better architecture (loop-based routing, CORS), and a contributor guide.
+
+**7,959 categories** in the database have SOURCE URLs pointing to custom-scraper. Zero point to events-api.
+
+## Problem: URL Patterns Don't Match
+
+A simple domain swap won't work. The two services use different URL structures:
+
+| Integration | Count | custom-scraper URL | events-api URL |
+|---|---|---|---|
+| TheTVDB | 7,849 | `/thetvdb/series/{id}/episodes.ics` | `/thetvdb/events?series_id={id}` |
+| TV Shows | 41 | `/tv/platform/{slug}.ics` or `/tv/genre/{slug}.ics` | `/shows/events?mode=platform&slug={slug}` or `?mode=genre&slug={slug}` |
+| SportsDB | 20 | `/sportsdb/league/{id}.ics` or `/sportsdb/team/{id}.ics` | `/sportsdb/events?mode=league&id={id}` or `?mode=team&id={id}` |
+| Google Sheets | 15 | `/sheets/events.ics?sheet_url=...` | `/google-sheets/events?sheet_url=...` |
+| Investing | 10 | `/investing/earnings.ics` or `/investing/ipo.ics` | `/investing/events?kind=earnings` or `?kind=ipo` |
+| Yahoo Finance | 8 | `/yahoo/generate_earnings_ics?k=100&ticker=NVDA` | **Does not exist** (broken anyway — expired cookies) |
+| Releases | 7 | `/releases/generate_game_ics` | `/releases/events?kind=games` |
+| Twitch | 5 | `/twitch/{name}/schedule.ics` | `/twitch/events?streamer_name={name}` |
+| IMDb | 4 | `/imdb/movies.ics?genre=...&actor=...&country=...` | `/imdb/events?genre=...&actor=...&country=...` |
+
+## Approach: Database Migration Script
+
+Write a Python migration script that:
+
+1. Connects to the production PostgreSQL database
+2. Reads all categories with SOURCE URLs containing `sync2cal-scraper.up.railway.app`
+3. Rewrites each URL to the equivalent `api.sync2cal.com` endpoint
+4. Updates the database in a single transaction (atomic — all or nothing)
+
+### URL Rewrite Rules
+
+```python
+REWRITE_RULES = [
+    # TheTVDB: /thetvdb/series/{id}/episodes.ics -> /thetvdb/events?series_id={id}
+    (r'sync2cal-scraper\.up\.railway\.app/thetvdb/series/(\d+)/episodes\.ics',
+     r'api.sync2cal.com/thetvdb/events?series_id=\1'),
+
+    # TV Shows platform: /tv/platform/{slug}.ics -> /shows/events?mode=platform&slug={slug}
+    (r'sync2cal-scraper\.up\.railway\.app/tv/platform/([^/.]+)\.ics',
+     r'api.sync2cal.com/shows/events?mode=platform&slug=\1'),
+
+    # TV Shows genre: /tv/genre/{slug}.ics -> /shows/events?mode=genre&slug={slug}
+    (r'sync2cal-scraper\.up\.railway\.app/tv/genre/([^/.]+)\.ics',
+     r'api.sync2cal.com/shows/events?mode=genre&slug=\1'),
+
+    # SportsDB league: /sportsdb/league/{id}.ics -> /sportsdb/events?mode=league&id={id}
+    (r'sync2cal-scraper\.up\.railway\.app/sportsdb/league/(\d+)\.ics',
+     r'api.sync2cal.com/sportsdb/events?mode=league&id=\1'),
+
+    # SportsDB team: /sportsdb/team/{id}.ics -> /sportsdb/events?mode=team&id={id}
+    (r'sync2cal-scraper\.up\.railway\.app/sportsdb/team/(\d+)\.ics',
+     r'api.sync2cal.com/sportsdb/events?mode=team&id=\1'),
+
+    # Google Sheets: /sheets/events.ics?sheet_url=... -> /google-sheets/events?sheet_url=...
+    (r'sync2cal-scraper\.up\.railway\.app/sheets/events\.ics\?',
+     r'api.sync2cal.com/google-sheets/events?'),
+
+    # Investing: /investing/{kind}.ics -> /investing/events?kind={kind}
+    (r'sync2cal-scraper\.up\.railway\.app/investing/earnings\.ics',
+     r'api.sync2cal.com/investing/events?kind=earnings'),
+    (r'sync2cal-scraper\.up\.railway\.app/investing/ipo\.ics',
+     r'api.sync2cal.com/investing/events?kind=ipo'),
+
+    # Releases: /releases/generate_game_ics -> /releases/events?kind=games
+    (r'sync2cal-scraper\.up\.railway\.app/releases/generate_game_ics',
+     r'api.sync2cal.com/releases/events?kind=games'),
+
+    # Twitch: /twitch/{name}/schedule.ics -> /twitch/events?streamer_name={name}
+    (r'sync2cal-scraper\.up\.railway\.app/twitch/([^/]+)/schedule\.ics',
+     r'api.sync2cal.com/twitch/events?streamer_name=\1'),
+
+    # IMDb: /imdb/movies.ics?... -> /imdb/events?...
+    (r'sync2cal-scraper\.up\.railway\.app/imdb/movies\.ics\?',
+     r'api.sync2cal.com/imdb/events?'),
+]
+```
+
+### Yahoo Finance (8 categories)
+
+The Yahoo Finance integration is broken (hardcoded expired cookies). Options:
+1. **Remove the SOURCE** from these 8 categories so the scheduled job skips them
+2. **Replace with Investing.com earnings** if that integration supports per-ticker queries
+3. **Leave broken** — the scheduled job already handles download failures gracefully
+
+Recommendation: Remove the SOURCE field. These can be re-enabled if/when a working earnings integration is built.
+
+## Migration Steps
+
+### Pre-migration (verify)
+
+1. Confirm events-api endpoints work for each integration type by testing one URL per integration against `api.sync2cal.com`
+2. Compare ICS output between old and new endpoints to verify identical calendar data
+
+### Execute migration
+
+3. Run the migration script against production database (during the 14-hour gap between scheduled job runs)
+4. Verify row count matches: 7,959 rows updated
+
+### Post-migration (validate)
+
+5. Wait for next scheduled job run
+6. Check Discord report for success/failure counts — should match pre-migration baseline
+7. Spot-check a few categories (TheTVDB, SportsDB, Sheets) to confirm events populated
+
+### Retire custom-scraper
+
+8. Keep custom-scraper running for 1 week as safety net (in case rollback needed)
+9. After 1 week with no issues, remove custom-scraper from Railway
+10. Archive the sync2cal-custom-scraper repo on GitHub
+
+## Rollback Plan
+
+If migration fails or scheduled job reports increased failures:
+
+```sql
+-- Reverse the migration (swap api.sync2cal.com back to sync2cal-scraper.up.railway.app)
+-- Keep the reverse rewrite rules in the migration script
+```
+
+The migration script should log all changes (old URL -> new URL) to enable reversal.
+
+## References to Update
+
+After migration, update these files that reference `sync2cal-scraper.up.railway.app`:
+- `sync2cal-ics-version/CLAUDE.md` — Railway architecture table
+- `new-baklava/app/admin/scraper/page.tsx` — embedded iframe to scraper docs (change to events-api docs)
+- `S2C-events-api/BACKEND_STAGING_SETUP_PLAN.md` — production URL reference
+- `S2C-frontend/server/meta/categoryMap.json` — regenerated automatically by prebuild script
diff --git a/integrations/google_sheets.py b/integrations/google_sheets.py
@@ -1,4 +1,5 @@
 from fastapi import HTTPException
+import json
 import os
 from base import CalendarBase, Event, IntegrationBase
 from typing import List
@@ -33,8 +34,13 @@ def fetch_events(
         """
         try:
             try:
-                sa_path = os.getenv("GOOGLE_SHEETS_SERVICE_ACCOUNT_FILE", "service_account.json")
-                gc = gspread.service_account(filename=sa_path)
+                sa_json = os.getenv("GOOGLE_SHEETS_SERVICE_ACCOUNT_JSON")
+                if sa_json:
+                    creds = json.loads(sa_json)
+                    gc = gspread.service_account_from_dict(creds)
+                else:
+                    sa_path = os.getenv("GOOGLE_SHEETS_SERVICE_ACCOUNT_FILE", "service_account.json")
+                    gc = gspread.service_account(filename=sa_path)
             except Exception as auth_error:
                 raise HTTPException(
                     status_code=500,