MLB Stats & Fantasy Baseball Analytics

A modern data analytics platform combining MLB statistics with fantasy baseball insights.

Getting Started • Architecture • Documentation • Contributing

🚀 Getting Started

Prerequisites

Docker and Docker Compose
Python 3.9+
Make (optional, but recommended)
Cloudflare R2 bucket populated with MLB stats data (see MLB Stats Pipeline below)

MLB Stats Pipeline

Before running this project, you need to populate your Cloudflare R2 bucket with MLB statistics data using the MLB Stats Dagster Pipeline:

Clone and set up the MLB Stats Pipeline:

git clone https://github.com/waaronmorris/mlb_stats_dagster.git
cd mlb_stats_dagster

Configure the pipeline:

cp .env.example .env
# Edit .env with your Cloudflare R2 credentials and MLB season configuration

Start the pipeline using Docker Compose:
```
docker-compose up -d
```
Access the Dagster UI at http://localhost:3000 and run the pipeline to populate your R2 bucket with MLB statistics data.
Once the pipeline has completed and your R2 bucket contains the MLB stats data, return here to continue setup.

Quick Start

Clone the repository:

git clone https://github.com/yourusername/mlb-stats.git
cd mlb-stats

Set up environment:

cp env/.env.default env/.env
# Edit env/.env with your configuration including:
# - CLOUDFLARE_R2_ACCESS_KEY
# - CLOUDFLARE_R2_SECRET_KEY
# - CLOUDFLARE_R2_BUCKET_NAME
ln -s env/.env .env

Start services:
```
docker-compose up -d
```
Access Superset at http://localhost:8088

🏗 Architecture

Data Flow

graph LR
    R2[(Cloudflare R2)]
    DDB[(DuckDB)]
    DBT[dbt Transformations]
    SUP[Superset Dashboards]

    R2 -->|Raw Data| DDB
    DDB -->|Source Tables| DBT
    DBT -->|Transformed Models| DDB
    DDB -->|Analytics Tables| SUP

ETL Process

flowchart TD
    A[MLB Stats Dagster Pipeline] -->|Parquet Files| B[(Cloudflare R2)]
    
    B -->|Source Data| C[dbt Processing]
    C -->|Transformed Data| D[(DuckDB Analytics)]
    D -->|Metrics| E[Superset Dashboards]

The ETL pipeline consists of several key stages, with initial data loading occurring during container build:

MLB Stats Dagster Pipeline
- External pipeline that provides structured Parquet files
- Data is uploaded to Cloudflare R2
- Initial data is loaded during container build
Data Processing (dbt)
- Sources: Direct mappings to R2 Parquet files
- Staging: Cleaned and standardized data models
- Intermediate: Core business logic and relationships
- Marts: Analytics-ready aggregated tables
Data Loading
- Transformed data loaded into DuckDB analytics tables
- Optimized for query performance with appropriate indexes
- Partitioned by season and update frequency
Data Consumption
- Superset dashboards for visualization
- Interactive analytics queries
- Performance-optimized views

Key Features:

MLB Stats Dagster pipeline for data ingestion
Centralized R2 storage
Data quality checks at ingestion
Full data lineage tracking
Automated recovery procedures
Performance optimization through partitioning

Data Model

erDiagram
    PLAYERS ||--o{ GAME_STATS : has
    PLAYERS {
        int player_id
        string name
        string team
        string position
    }
    GAME_STATS {
        int game_id
        int player_id
        date game_date
        float batting_avg
        int home_runs
        int rbis
    }
    GAME_STATS ||--o{ FANTASY_POINTS : generates
    FANTASY_POINTS {
        int game_id
        int player_id
        float points
        string category
    }

🛠 Tech Stack

Data Storage
- Cloudflare R2: Object storage for raw data
- DuckDB: High-performance analytics database
- PostgreSQL: Metadata and user management
Processing & Analytics
- dbt: Data transformation
- Apache Superset: Visualization and dashboards
- Redis: Caching and queue management
Infrastructure
- Docker: Containerization
- Make: Development automation

📁 Project Structure

.
├── env/                    # Environment configuration
│   ├── .env.default       # Default environment variables
│   ├── .env.schema        # Environment variables documentation
│   └── README.md          # Environment setup guide
├── docker/                # Docker configuration files
├── superset/              # Superset configuration and dashboards
├── dbt/                   # Data transformation models
├── scripts/               # Utility scripts
├── docker-compose.yml     # Service definitions
├── Makefile              # Development commands
└── README.md             # This file

🔧 Development

Common development tasks are automated through the Makefile:

make up              # Start all services
make down            # Stop all services
make logs            # View logs
make test            # Run tests
make clean           # Clean up containers and volumes

📚 Documentation

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

MLB Stats API for providing baseball statistics
Apache Superset community
DuckDB team

💬 Support

For support, please open an issue or contact the maintainers.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
env		env
mlb_dbt		mlb_dbt
shared		shared
superset		superset
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements-test.txt		requirements-test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLB Stats & Fantasy Baseball Analytics

🚀 Getting Started

Prerequisites

MLB Stats Pipeline

Quick Start

🏗 Architecture

Data Flow

ETL Process

Data Model

🛠 Tech Stack

📁 Project Structure

🔧 Development

📚 Documentation

🤝 Contributing

📜 License

🙏 Acknowledgments

💬 Support

About

Releases

Packages

Languages

waaronmorris/mlb-superset-dashboard

Folders and files

Latest commit

History

Repository files navigation

MLB Stats & Fantasy Baseball Analytics

🚀 Getting Started

Prerequisites

MLB Stats Pipeline

Quick Start

🏗 Architecture

Data Flow

ETL Process

Data Model

🛠 Tech Stack

📁 Project Structure

🔧 Development

📚 Documentation

🤝 Contributing

📜 License

🙏 Acknowledgments

💬 Support

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages