A comprehensive data engineering showcase project demonstrating end-to-end ETL processing of Brazilian stock exchange (B3) options data using Databricks and the medallion architecture pattern.
This project implements a complete data pipeline for processing and analyzing B3 options market data, featuring:
- Data Ingestion: Automated download of consolidated trades and open positions data from B3 APIs
- Medallion Architecture: Bronze → Silver → Gold data transformation layers
- Real-time Dashboard: Interactive visualization of options positions (Calls vs Puts)
- Databricks Integration: Full Databricks Asset Bundles (DAB) deployment
- ETL Jobs: Scheduled data processing workflows
The project follows Databricks' medallion architecture:
Raw Data (Bronze)
↓
Transformed Data (Silver)
↓
Business-Ready Data (Gold)
↓
Interactive Dashboard
- Open Positions: Daily options positions data from B3
- Instruments: Options instruments reference data
- Consolidated Trades: Daily trading activity data
- Bronze: Raw ingested data with minimal transformations
- Silver: Cleaned, standardized, and enriched data
- Gold: Aggregated business metrics and analytics-ready datasets
- Databricks workspace with Unity Catalog enabled
- Python 3.12+
- Databricks CLI configured
-
Clone the repository
git clone <repository-url> cd databricks_showcase_b3_options
-
Install dependencies
pip install -e . -
Configure Databricks environment
databricks configure
-
Deploy to Databricks
databricks bundle deploy --target dev
-
Execute ETL jobs
databricks bundle run --target dev download_consolidated_trades_equities_file databricks bundle run --target dev bronze_ingestion
-
Access the dashboard
- Navigate to your Databricks workspace
- Open the "Open Positions Dashboard" in the dashboard section
├── databricks.yml # Databricks Asset Bundle configuration
├── pyproject.toml # Python project configuration
├── resources/ # Databricks resources
│ ├── dashboard.dashboard.yml # Dashboard configuration
│ ├── etl.pipeline.yml # ETL pipeline definition
│ └── jobs.job.yml # Job scheduling configuration
├── src/databricks_showcase_b3_options/
│ ├── dashboard/ # Dashboard definitions
│ ├── jobs/ # ETL job scripts
│ │ ├── download_consolidated_trades_equities_file.py
│ │ ├── download_file.py
│ │ ├── get_last_business_date.py
│ │ └── remove_first_n_rows_from_file.py
│ └── transformations/ # Data transformation logic
│ ├── bronze_ingestion.py # Raw data ingestion
│ ├── dim_derivatives.py # Dimension tables
│ ├── fact_derivatives.py # Fact tables
│ ├── int_derivatives_open_positions.py
│ ├── int_derivatives.py
│ └── utils.py # Utility functions
└── tests/ # Unit tests
- Consolidated Trades Download: Fetches daily trading data from B3 APIs
- File Processing: Handles ZIP extraction and CSV processing
- Date Utilities: Business date calculations for B3 market calendar
- Bronze Layer: Raw data ingestion with column standardization
- Silver Layer: Data cleansing, deduplication, and enrichment
- Gold Layer: Business aggregations and analytics-ready datasets
Interactive visualization showing:
- Call vs Put options positions
- Strike price analysis
- Position volumes and distributions
# Install in development mode
pip install -e .
# Run tests
pytest tests/
# Format code
black src/
isort src/# Validate bundle
databricks bundle validate
# Deploy to development
databricks bundle deploy --target dev
# Run specific job
databricks bundle run --target dev <job-name>fact_derivatives: Options trading factsdim_derivatives: Options instruments dimensionsint_derivatives_open_positions: Open positions intermediate table
- Covered/uncovered positions
- Blocked positions totals
- Strike price distributions
- Call/Put ratios
- Data stored in Unity Catalog with proper governance
- Secure API authentication for B3 data sources
- Row-level security on sensitive financial data
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request


