A Databricks Asset Bundle for multi-environment deployment with user, stage, and prod environments.
This project was generated from the Databricks Asset Bundles template:
https://github.com/vmariiechko/databricks-bundle-template
For template updates, fixes, and release notes, check the template repository.
New to this project? Follow the QUICKSTART.md for step-by-step deployment instructions.
Before deploying, ensure you have:
-
Databricks CLI minimum version v0.274.0 installed (reference docs)
pip install databricks-cli databricks --version
-
Unity Catalog with these pre-existing catalogs (created by your platform/infra team):
dev_analytics(development - shared byuserand targets)stage_analytics(pre-production)prod_analytics(production)
Before you start: Update the
workspace.hostindatabricks.yml— replaceyour-workspace.cloud.databricks.comwith your actual Databricks workspace URL (find it in your browser when logged in).
# Validate configuration
databricks bundle validate -t user
# Deploy to your personal environment
databricks bundle deploy -t user
# Run the sample job
databricks bundle run my_data_project_ingestion -t userdatabricks-bundle-template-example/
├── databricks.yml # Bundle configuration
├── variables.yml # Shared variables (catalogs, SPs)
├── resources/
│ ├── my_data_project_ingestion.job.yml # ETL ingestion job
│ ├── my_data_project_pipeline.pipeline.yml # LDP pipeline
│ ├── my_data_project_pipeline_trigger.job.yml
│ └── schemas.yml # Unity Catalog schemas
├── src/
│ ├── jobs/ # Job Python scripts
│ └── pipelines/ # LDP notebook code
├── tests/ # Unit tests (run by CI pipeline)
├── templates/ # Cluster config examples
├── .github/workflows/ # GitHub Actions workflows
├── bundle_init_config.json # Template config used during generation
└── docs/ # Setup guides
| Target | Purpose | Catalog | Schema Isolation |
|---|---|---|---|
user |
Personal development | dev_analytics |
<username>_bronze/silver/gold |
stage |
Pre-production testing | stage_analytics |
bronze/silver/gold |
prod |
Production | prod_analytics |
bronze/silver/gold |
databricks bundle validate -t <target> # Validate
databricks bundle deploy -t <target> # Deploy
databricks bundle run <job> -t <target> # Run job
databricks bundle destroy -t <target> # Cleanup| File | Purpose |
|---|---|
databricks.yml |
Targets, permissions, resource includes |
variables.yml |
Catalogs, service principals |
resources/*.yml |
Job and pipeline definitions |
This bundle uses classic clusters. Cluster configurations are in resource files.
See templates/cluster_configs.yml for configuration examples.
By default, all environments deploy to the same workspace using Unity Catalog for isolation.
For a separate production workspace, update databricks.yml:
targets:
prod:
workspace:
host: https://your-prod-workspace.azuredatabricks.netService principals are only required for CI/CD deployments (stage, prod targets).
The user target runs as your personal identity and works immediately without SP configuration.
To configure SPs for CI/CD:
- Create service principals in your Databricks workspace
- Search for
SP_PLACEHOLDERinvariables.yml - Replace with your service principal application IDs
- Grant Unity Catalog permissions (
USE CATALOG,CREATE SCHEMA)
Important: Service principals need Unity Catalog permissions before CI/CD can deploy. See docs/CI_CD_SETUP.md for detailed setup instructions.
The src/ directory contains sample code demonstrating the medallion architecture pattern. Replace with your business logic:
src/jobs/ingest_to_raw.py- Data ingestionsrc/jobs/transform_to_silver.py- Transformationssrc/pipelines/bronze.py- LDP bronze layersrc/pipelines/silver.py- LDP silver layer
Create new job files in resources/:
# resources/my_job.job.yml
resources:
jobs:
my_job:
name: "${bundle.target} My Job"
tasks:
- task_key: main
job_cluster_key: job_cluster
spark_python_task:
python_file: ../src/jobs/my_script.pyCatalogs must be pre-existing (created by a metastore admin or platform team).
Verify that the required catalogs exist and you have USE CATALOG permission:
SHOW CATALOGS;
-- Required: dev_analytics, stage_analytics, prod_analyticsIf deploying to stage, or prod:
- Ensure
SP_PLACEHOLDERvalues invariables.ymlare replaced - Verify the SP exists in your workspace
- The
usertarget does not require SP configuration
This project includes pre-configured CI/CD pipelines for GitHub Actions.
| Pipeline Stage | Trigger | Action |
|---|---|---|
| Bundle CI | Pull Request to main |
Runs unit tests and validates bundle |
| Staging CD | Merge to main |
Deploys to staging |
| Production CD | Merge to release |
Deploys to production |
Setup required: See docs/CI_CD_SETUP.md for configuration instructions.
Run unit tests locally:
# Install development dependencies
pip install -r requirements_dev.txt
# Run tests
pytest tests/ -VUnit tests are located in the tests/ directory and run automatically in the CI pipeline.
For data quality validation, use SDP/LDP expectations in your pipeline code.
See the pipeline notebooks in src/pipelines/ for examples.
This project is a pre-generated example from databricks-bundle-template. It is periodically regenerated when the template is updated.
- To report issues with the bundle structure or configuration: open an issue in the template repo
- This repository is not intended for direct contribution — fix the template, then regenerate