- Eioku - Video Intelligence Platform
Eioku transforms your video library into a searchable database. You film it, we index it, you find it.
Audience: content creator who produces video essays
Problem: I have 100 GBs / 80+ hours of of video from recording a video game that I cover lore for
Solution: Eioku allows me to find the exact moments I’ve storyboarded for my YouTube video (via on screen text, audio transcripts) vs me opening each video independently to find the relevant moments
Audience: I am a content creator / corporation / enterprise / business with a trove of content over the years
Problem: what topics have I covered (or not) or mentioned, and what degree of quality did I deliver it in? I can use this to prepare future material / marketing / training material (video or otherwise)
Solution: Eioku in its current form allows you to find exactly when you’ve spoken about. In the future, semantic search makes querying more natural, an LLM integration can review your transcript and opine on improvements to delivery style or script, etc. (transcript), and trends/topics can be surfaced
Audience: Film editors that get hundreds of GBs of footage from the field production team (sometimes daily!) with tight deadlines on delivery
Problem: I need to find A/B-roll of my travels for my next vlog. Perhaps this video references multiple places, at various periods of time, and I’m comparing experiences (boat rides, hiking excursions, city life)
Solution: Eioku lets you search videos by location, objects, and in the future: recognized faces
Audience: Family and friends who record memorable moments
Problem: Remember that one time when person X did that thing Y? Can we compile a video focused on person Z over the past year?
Solution: Eioku’s features let you quickly find and relive those moments, and makes compiling them for that special sentimental video much less daunting of a task
-
Automatic ML Analysis: Drop videos in a folder, Eioku automatically runs:
- Object detection (YOLO) - find every dog, car, person
- Speech transcription (Whisper) - searchable transcripts
- OCR (EasyOCR) - text visible in frames
- Place recognition (Places365) - indoor/outdoor scene classification
- Scene detection - shot boundary detection
- Metadata extraction - EXIF timestamps, GPS, duration
-
Cross-Video Search: Search across your entire library chronologically
- "Find every dog scene" - jumps between videos automatically
- Ordered by when you actually filmed (EXIF date), not upload date
-
Global Jump Navigation: Cmd+F for your video archive
- Next/Previous buttons navigate across all videos
- Full-text search on transcripts and OCR text
Hackathon judges: Use the Google Drive videos from the submission, or bring your own!
- Add videos to
test-videos/directory (variety: dogs, people, speech, text, GPS) - Start the environment using the Quick Start instructions below
- Open http://localhost:9080
- Wait for ML tasks to complete (check status in video player view)
- Try features:
- Video Player: Explore detected objects, transcripts, OCR, places
- Clip Export: Set timestamps and download a clip
- Global Jump: Search + use Previous/Next to jump across videos
- Artifact Gallery: See all matches with thumbnails
- Some ML Models download per boot: Each time the ml-service reboots, it may download some YOLO models, which will delay processing for up to a minute or so (I missed this during the build process >:|)
- Video discovery: Videos are only discovered when the backend starts - no hot-reload for new files
- Face search: Face detection runs but face search/clustering is not implemented yet
- No combined filters: Global jump searches one artifact type at a time (can't search "dog AND tokyo")
- Single language OCR: OCR is configured for English only
- No semantic search: Text search is exact match only - no embeddings or similarity search yet
- Transcription language: Whisper auto-detects language but works best with English
- GPS reverse geocoding: Requires internet connection for location names
- Large video files: Very long videos (>1hr) may timeout during processing
- No search suggestions: No aggregation of available labels/terms - users must guess what to search for -- you can view individual videos to get an idea for what does exist, but it's far from ideal (e.g., no "show me all detected objects or spoken words")
Important: the ml-service image is very large due to containing ML models (~10GB). It may take a while to pull depending on your network.
Important: the images are only built for
amd64, so expect slower performance on Apple Silicon + Rosetta2.
Two deployment options are available depending on your hardware:
For systems with NVIDIA GPUs and the NVIDIA Container Toolkit installed:
# Start with GPU acceleration (~10x faster ML processing)
docker compose -f docker/docker-compose.cuda.yml up
# Access the app
open http://localhost:9080For Mac (Apple Silicon), Windows without NVIDIA GPU, or Linux without CUDA:
# Start with CPU-only processing (slower but universal)
docker compose -f docker/docker-compose.cpu.yml up
# Access the app
open http://localhost:9080Note for Apple Silicon users: Use the CPU environment. The CUDA environment requires NVIDIA GPUs which are not available on Apple hardware. MPS is not supported in docker containers, and I did not have time to create a non-docker environment for the demo.
# Stop the environment
docker compose -f docker/docker-compose.cuda.yml down
docker compose -f docker/docker-compose.cpu.yml down
# Stop and remove all data (fresh start)
docker compose -f docker/docker-compose.cuda.yml down -v
docker compose -f docker/docker-compose.cpu.yml down -vBenchmark on 13 test videos (557 MB total, 2.5-275 MB each):
| Environment | Hardware | Tasks | Duration | Rate |
|---|---|---|---|---|
| Docker CUDA | RTX 3070 Ti (8GB) | 130 | 9 min 9 sec | ~14.2 tasks/min |
| Docker CPU | Ryzen 9 5900X | 80 | 1 hr 7 min | ~1.2 tasks/min |
GPU is roughly 10-12x faster than CPU for ML processing.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Frontend │────▶│ API Service│────▶│ PostgreSQL │
│ (React) │ │ (FastAPI) │ │ (JSONB) │
└─────────────┘ └──────┬──────┘ └──────▲──────┘
│ │
┌──────▼──────┐ │
│ Redis │ │
│ (Valkey) │ │
└──────┬──────┘ │
│ │
┌──────▼──────┐ │
│ ML Service │────────────┘
│ (GPU/arq) │ writes artifacts
└─────────────┘
See changes.md for full C4 diagrams and architecture evolution.
When running locally:
Note: this is when using the compose files under the docker folder. For developing, see: Contributing.
- Swagger UI: http://localhost:9080/api/docs
- ReDoc: http://localhost:9080/api/redoc
- OpenAPI JSON: http://localhost:9080/api/openapi.json
This project extensively uses Kiro's spec-driven development workflow and automation features.
flowchart TD
subgraph Spec["Spec-Driven Design (Kiro)"]
REQ[Define Requirements] --> DES[Create Design]
DES --> TASKS[Generate Tasks]
end
subgraph Dev["Development Loop (per task)"]
CODE[Write Code] --> REVIEW[Review with Agent Prompt]
REVIEW --> APPLY[Apply Suggestions]
APPLY --> LINT[Lint]
LINT --> TEST[Test]
TEST --> BUILD[Build]
end
subgraph PR["Pull Request"]
COMMIT[Commit] --> CREATE[Create PR]
CREATE --> DESC[Describe PR]
DESC --> MERGE[Merge to Main]
end
TASKS --> CODE
BUILD --> COMMIT
MERGE --> REQ
%% Kiro Hooks
H1([🪝 principal-code-review]):::hook -.-> REVIEW
H2([🪝 lint-backend-code]):::hook -.-> LINT
H3([🪝 lint-ml-service]):::hook -.-> LINT
H4([🪝 lint-frontend]):::hook -.-> LINT
H5([🪝 run-backend-tests]):::hook -.-> TEST
H6([🪝 run-ml-service-tests]):::hook -.-> TEST
H7([🪝 pre-commit-checks]):::hook -.-> COMMIT
classDef hook fill:#2563eb,stroke:#1d4ed8,stroke-width:1px,color:#fff
Note: the architecture for this project evolved a lot as I made progress. Review those changes here: Architecture Evolution.
Each feature was designed using Kiro's requirements → design → tasks workflow. I made extensive use of the principal-software-engineer.agent for software architecture decisions and code reviews.
| Spec | Description |
|---|---|
artifact-envelope-architecture |
Unified artifact storage model with schema registry, projections, and selection policies |
global-jump-navigation |
Cross-video search and navigation API using global timeline ordering |
global-jump-navigation-gui |
Frontend UI for global jump with search controls |
artifact-thumbnails-gallery |
Thumbnail extraction task and gallery search API/UI |
worker-ml-service-separation |
Split monolith into API, Worker, and ML services with Redis job queue |
video-metadata-extraction |
EXIF metadata extraction and GPS location handling |
semantic-video-search |
Core video search and artifact query functionality |
Always-on guidance for consistent development:
development-principles.md- Incremental commits, approval workflow, conventional commits, quality gatesfastapi-dev-environment.md- FastAPI best practices, async-first, Pydantic everywhere, container-first devtrunk-based-development.md- Short-lived branches, small PRs, CI/CD integration
User-triggered commands for common workflows:
| Hook | Action |
|---|---|
start-dev-env |
Start Docker dev environment |
stop-dev-env |
Stop Docker dev environment |
reset-database |
Reset PostgreSQL and restart services |
lint-backend-code |
Run Ruff on backend |
lint-ml-service |
Run Ruff on ml-service |
lint-frontend |
Run ESLint on frontend |
run-backend-tests |
Run pytest on backend |
run-ml-service-tests |
Run pytest on ml-service |
pre-commit-checks |
Full lint + test + commit workflow |
principal-code-review |
AI code review using principal engineer agent |
principal-software-engineer.agent.md- Expert-level engineering guidance for code reviews, focusing on design patterns, SOLID principles, testing strategy, and technical debt management with GitHub issue creation
- Architecture Evolution - C4 diagrams, trade-offs, phases
- Development Log - Timeline, decisions, challenges
- Attribution - Third-party libraries
See: CONTRIBUTING.