A comprehensive data analytics platform that collects, processes, and analyzes Pokémon data from PokéAPI, featuring real-time visualizations and automated data pipelines.
- 🔄 Automated ETL pipeline with Apache Airflow
- 📊 Interactive Streamlit dashboard with:
- Top Pokémon analysis by stats
- Type distribution visualization
- Type effectiveness matrix
- Individual Pokémon analysis
- 🔍 FastAPI backend for data access
- 🐘 PostgreSQL database with efficient schema
- 🐋 Docker-based deployment
- ⚡ Rate-limited API fetching with caching
- Docker and Docker Compose
- Python 3.10+
- Clone the repository
- Start all services:
docker-compose -f docker/docker-compose.dev.yml up -d
- Access the services:
- Dashboard: http://localhost:8000
- Airflow UI: http://localhost:8080 (login: admin/admin)
pokemon-data-platform/
├── src/
│ ├── analytics/ # Dashboard and data analysis
│ ├── ingestion/ # PokéAPI data fetching
│ ├── transformation/ # Data transformation
│ ├── loading/ # Database operations
│ ├── models/ # SQLAlchemy models
│ └── dags/ # Airflow DAG definitions
├── database/ # DB migrations and schema
├── docker/ # Docker configuration
└── requirements/ # Python dependencies
The ETL pipeline runs daily and includes:
-
Extract: Fetches data from PokéAPI
- Pokémon details
- Type information
- Abilities
-
Transform: Processes raw data
- Normalizes data structures
- Calculates derived statistics
- Generates type effectiveness matrices
-
Load: Updates PostgreSQL database
- Maintains data consistency
- Handles incremental updates
- Preserves historical data
-
Analyze: Provides insights through:
- Base stat distributions
- Type effectiveness analysis
- Move coverage metrics
- Generation-based comparisons
Key configuration options in docker-compose.dev.yml:
- Database credentials
- API rate limits
- Cache settings
- Resource limits
-
Install dev dependencies:
pip install -r requirements/dev.txt
-
Run tests:
python -m pytest tests/
- API requests are cached to minimize external calls
- Database queries are optimized with proper indexing
- Docker containers have defined resource limits
- Airflow tasks use efficient parallel execution
MIT