Databricks Showcase: B3 Options Data Pipeline

A comprehensive data engineering showcase project demonstrating end-to-end ETL processing of Brazilian stock exchange (B3) options data using Databricks and the medallion architecture pattern.

📋 Overview

This project implements a complete data pipeline for processing and analyzing B3 options market data, featuring:

Data Ingestion: Automated download of consolidated trades and open positions data from B3 APIs
Medallion Architecture: Bronze → Silver → Gold data transformation layers
Real-time Dashboard: Interactive visualization of options positions (Calls vs Puts)
Databricks Integration: Full Databricks Asset Bundles (DAB) deployment
ETL Jobs: Scheduled data processing workflows

🏗️ Architecture

The project follows Databricks' medallion architecture:

Raw Data (Bronze)
    ↓
Transformed Data (Silver)
    ↓
Business-Ready Data (Gold)
    ↓
Interactive Dashboard

Data Sources

Open Positions: Daily options positions data from B3
Instruments: Options instruments reference data
Consolidated Trades: Daily trading activity data

Data Layers

Bronze: Raw ingested data with minimal transformations
Silver: Cleaned, standardized, and enriched data
Gold: Aggregated business metrics and analytics-ready datasets

🚀 Quick Start

Prerequisites

Databricks workspace with Unity Catalog enabled
Python 3.12+
Databricks CLI configured

Installation

Clone the repository

git clone <repository-url>
cd databricks_showcase_b3_options

Install dependencies
```
pip install -e .
```
Configure Databricks environment
```
databricks configure
```
Deploy to Databricks
```
databricks bundle deploy --target dev
```

Running the Pipeline

Execute ETL jobs

databricks bundle run --target dev download_consolidated_trades_equities_file
databricks bundle run --target dev bronze_ingestion

Access the dashboard
- Navigate to your Databricks workspace
- Open the "Open Positions Dashboard" in the dashboard section

📁 Project Structure

├── databricks.yml                 # Databricks Asset Bundle configuration
├── pyproject.toml                 # Python project configuration
├── resources/                     # Databricks resources
│   ├── dashboard.dashboard.yml    # Dashboard configuration
│   ├── etl.pipeline.yml           # ETL pipeline definition
│   └── jobs.job.yml               # Job scheduling configuration
├── src/databricks_showcase_b3_options/
│   ├── dashboard/                 # Dashboard definitions
│   ├── jobs/                      # ETL job scripts
│   │   ├── download_consolidated_trades_equities_file.py
│   │   ├── download_file.py
│   │   ├── get_last_business_date.py
│   │   └── remove_first_n_rows_from_file.py
│   └── transformations/           # Data transformation logic
│       ├── bronze_ingestion.py    # Raw data ingestion
│       ├── dim_derivatives.py     # Dimension tables
│       ├── fact_derivatives.py    # Fact tables
│       ├── int_derivatives_open_positions.py
│       ├── int_derivatives.py
│       └── utils.py               # Utility functions
└── tests/                         # Unit tests

🔧 Key Components

Data Ingestion Jobs

Consolidated Trades Download: Fetches daily trading data from B3 APIs
File Processing: Handles ZIP extraction and CSV processing
Date Utilities: Business date calculations for B3 market calendar

Transformations

Bronze Layer: Raw data ingestion with column standardization
Silver Layer: Data cleansing, deduplication, and enrichment
Gold Layer: Business aggregations and analytics-ready datasets

Dashboard

Interactive visualization showing:

Call vs Put options positions
Strike price analysis
Position volumes and distributions

🛠️ Development

Local Development

# Install in development mode
pip install -e .

# Run tests
pytest tests/

# Format code
black src/
isort src/

Databricks Development

# Validate bundle
databricks bundle validate

# Deploy to development
databricks bundle deploy --target dev

# Run specific job
databricks bundle run --target dev <job-name>

📊 Data Model

Key Tables

fact_derivatives: Options trading facts
dim_derivatives: Options instruments dimensions
int_derivatives_open_positions: Open positions intermediate table

Metrics

Covered/uncovered positions
Blocked positions totals
Strike price distributions
Call/Put ratios

🔒 Security & Compliance

Data stored in Unity Catalog with proper governance
Secure API authentication for B3 data sources
Row-level security on sensitive financial data

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.vscode		.vscode
images		images
resources		resources
src/databricks_showcase_b3_options		src/databricks_showcase_b3_options
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
databricks.yml		databricks.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks Showcase: B3 Options Data Pipeline

📋 Overview

🏗️ Architecture

Data Sources

Data Layers

🚀 Quick Start

Prerequisites

Installation

Running the Pipeline

📁 Project Structure

🔧 Key Components

Data Ingestion Jobs

Transformations

Dashboard

🛠️ Development

Local Development

Databricks Development

📊 Data Model

Key Tables

Metrics

🔒 Security & Compliance

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Databricks Showcase: B3 Options Data Pipeline

📋 Overview

🏗️ Architecture

Data Sources

Data Layers

🚀 Quick Start

Prerequisites

Installation

Running the Pipeline

📁 Project Structure

🔧 Key Components

Data Ingestion Jobs

Transformations

Dashboard

🛠️ Development

Local Development

Databricks Development

📊 Data Model

Key Tables

Metrics

🔒 Security & Compliance

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages