Local PySpark Testing with Databricks Runtime

A demonstration project for the Manchester Databricks User Group showing how to run local unit tests using the official Databricks Runtime Docker image.

Why?

Testing PySpark code locally with the exact same runtime as production ensures:

No surprises from version mismatches
Fast feedback loops during development
CI pipelines that catch issues before deployment

Quick Start

Prerequisites

Docker
VS Code with Dev Containers extension

Local Development

Clone the repo
Open in VS Code
Click "Reopen in Container" when prompted
Run tests:
```
pytest -v
```

That's it. You're running tests against Databricks Runtime 17.3 LTS (Spark 4.0.0) locally.

Project Structure

├── src/local_pyspark_testing/
│   ├── environment.py      # Spark session factory (local vs Databricks)
│   ├── transforms.py       # UDFs using pycountry
│   └── jobs/               # Pipeline entry points
│       ├── bronze_to_silver.py
│       └── silver_to_gold.py
├── tests/
│   └── test_transforms.py  # DataFrame-based unit tests
├── resources/              # Databricks Asset Bundle configs
│   ├── clusters.yml
│   └── jobs.yml
├── Dockerfile              # Dual-mode: CI wheel install or dev editable
└── .devcontainer/          # VS Code devcontainer config

How It Works

Dual-Mode Dockerfile

The same Dockerfile serves both CI and local development:

ARG INSTALL_MODE=wheel

# CI: installs pre-built wheel
# Dev: skips, postCreateCommand handles editable install
RUN if [ "$INSTALL_MODE" = "wheel" ]; then \
      uv pip install --system --break-system-packages *.whl pytest; \
    fi

Spark Session Factory

environment.py creates the right Spark session based on context:

def get_spark() -> SparkSession:
    if os.environ.get("LOCAL_SPARK") == "true":
        return _create_local_spark()  # Optimized for testing
    return _create_databricks_spark()  # Uses existing cluster session

CI/CD Pipeline

Build - uv build --wheel creates the package
Test - Docker runs pytest against the wheel
Push - Image pushed to GitHub Container Registry
Deploy - Databricks Asset Bundles updates clusters and jobs

Testing

Tests use assertDataFrameEqual for DataFrame comparisons:

def test_converts_valid_codes(self, spark_session):
    df = spark_session.createDataFrame([("GB",), ("DE",)], ["country_code"])
    result = df.withColumn("country_name", country_code_to_name("country_code"))

    expected = spark_session.createDataFrame(
        [("GB", "United Kingdom"), ("DE", "Germany")],
        ["country_code", "country_name"],
    )
    assertDataFrameEqual(result, expected)

Configuration

GitHub Secrets Required

DATABRICKS_HOST - Workspace URL
DATABRICKS_TOKEN - Personal access token

Databricks Workspace

Enable Container Services in Admin Settings (or via CLI):

databricks workspace-conf set-status --json '{"enableDcs": "true"}'

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
resources		resources
src/local_pyspark_testing		src/local_pyspark_testing
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
databricks.yml		databricks.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local PySpark Testing with Databricks Runtime

Why?

Quick Start

Prerequisites

Local Development

Project Structure

How It Works

Dual-Mode Dockerfile

Spark Session Factory

CI/CD Pipeline

Testing

Configuration

GitHub Secrets Required

Databricks Workspace

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local PySpark Testing with Databricks Runtime

Why?

Quick Start

Prerequisites

Local Development

Project Structure

How It Works

Dual-Mode Dockerfile

Spark Session Factory

CI/CD Pipeline

Testing

Configuration

GitHub Secrets Required

Databricks Workspace

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages