Skip to content

Serjii/databricks_showcase_b3_options

Repository files navigation

Databricks Showcase: B3 Options Data Pipeline

A comprehensive data engineering showcase project demonstrating end-to-end ETL processing of Brazilian stock exchange (B3) options data using Databricks and the medallion architecture pattern.

📋 Overview

This project implements a complete data pipeline for processing and analyzing B3 options market data, featuring:

  • Data Ingestion: Automated download of consolidated trades and open positions data from B3 APIs
  • Medallion Architecture: Bronze → Silver → Gold data transformation layers
  • Real-time Dashboard: Interactive visualization of options positions (Calls vs Puts)
  • Databricks Integration: Full Databricks Asset Bundles (DAB) deployment
  • ETL Jobs: Scheduled data processing workflows

🏗️ Architecture

The project follows Databricks' medallion architecture:

Raw Data (Bronze)
    ↓
Transformed Data (Silver)
    ↓
Business-Ready Data (Gold)
    ↓
Interactive Dashboard

Data Sources

  • Open Positions: Daily options positions data from B3
  • Instruments: Options instruments reference data
  • Consolidated Trades: Daily trading activity data

Data Layers

  • Bronze: Raw ingested data with minimal transformations
  • Silver: Cleaned, standardized, and enriched data
  • Gold: Aggregated business metrics and analytics-ready datasets

🚀 Quick Start

Prerequisites

  • Databricks workspace with Unity Catalog enabled
  • Python 3.12+
  • Databricks CLI configured

Installation

  1. Clone the repository

    git clone <repository-url>
    cd databricks_showcase_b3_options
  2. Install dependencies

    pip install -e .
  3. Configure Databricks environment

    databricks configure
  4. Deploy to Databricks

    databricks bundle deploy --target dev

Running the Pipeline

  1. Execute ETL jobs

    databricks bundle run --target dev download_consolidated_trades_equities_file
    databricks bundle run --target dev bronze_ingestion
  2. Access the dashboard

    • Navigate to your Databricks workspace
    • Open the "Open Positions Dashboard" in the dashboard section

📁 Project Structure

├── databricks.yml                 # Databricks Asset Bundle configuration
├── pyproject.toml                 # Python project configuration
├── resources/                     # Databricks resources
│   ├── dashboard.dashboard.yml    # Dashboard configuration
│   ├── etl.pipeline.yml           # ETL pipeline definition
│   └── jobs.job.yml               # Job scheduling configuration
├── src/databricks_showcase_b3_options/
│   ├── dashboard/                 # Dashboard definitions
│   ├── jobs/                      # ETL job scripts
│   │   ├── download_consolidated_trades_equities_file.py
│   │   ├── download_file.py
│   │   ├── get_last_business_date.py
│   │   └── remove_first_n_rows_from_file.py
│   └── transformations/           # Data transformation logic
│       ├── bronze_ingestion.py    # Raw data ingestion
│       ├── dim_derivatives.py     # Dimension tables
│       ├── fact_derivatives.py    # Fact tables
│       ├── int_derivatives_open_positions.py
│       ├── int_derivatives.py
│       └── utils.py               # Utility functions
└── tests/                         # Unit tests

🔧 Key Components

Data Ingestion Jobs

Ingestion Job

  • Consolidated Trades Download: Fetches daily trading data from B3 APIs
  • File Processing: Handles ZIP extraction and CSV processing
  • Date Utilities: Business date calculations for B3 market calendar

Transformations

Pipeline

  • Bronze Layer: Raw data ingestion with column standardization
  • Silver Layer: Data cleansing, deduplication, and enrichment
  • Gold Layer: Business aggregations and analytics-ready datasets

Dashboard

Dashboard

Interactive visualization showing:

  • Call vs Put options positions
  • Strike price analysis
  • Position volumes and distributions

🛠️ Development

Local Development

# Install in development mode
pip install -e .

# Run tests
pytest tests/

# Format code
black src/
isort src/

Databricks Development

# Validate bundle
databricks bundle validate

# Deploy to development
databricks bundle deploy --target dev

# Run specific job
databricks bundle run --target dev <job-name>

📊 Data Model

Key Tables

  • fact_derivatives: Options trading facts
  • dim_derivatives: Options instruments dimensions
  • int_derivatives_open_positions: Open positions intermediate table

Metrics

  • Covered/uncovered positions
  • Blocked positions totals
  • Strike price distributions
  • Call/Put ratios

🔒 Security & Compliance

  • Data stored in Unity Catalog with proper governance
  • Secure API authentication for B3 data sources
  • Row-level security on sensitive financial data

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages