Skip to content

Integrate Apache Airflow Deployment into Calypr / Gen3 Helm Chart #84

@bwalsh

Description

@bwalsh

Issue: Integrate Apache Airflow Deployment into Calypr / Gen3 Helm Chart

Related: bmeg/tractor#13


Summary

Implement a reusable Apache Airflow Helm sub-chart within the gen3-helm repository to support Calypr (and other Gen3-derived environments) in orchestrating data submission, transfer, and background jobs.

This builds on lessons learned from bmeg/tractor#13, where Airflow was upgraded and stabilized for production use (now using apache/airflow:3.1.0).


Motivation

Many Calypr and Gen3-based environments require workflow automation for:

  • background file preparation and validation jobs
  • dataset publication pre-checks (e.g., TIFF offset calculation, QC workflows)
  • automatic orchestration of TES, DRS, and Indexd integration
  • long-running metadata enrichment or monitoring tasks

Integrating Airflow as a Helm-managed component provides:

  • A standardized, reproducible orchestration environment across deployments
  • Seamless integration with Fence-issued JWTs and Calypr service accounts
  • RBAC alignment with existing Gen3 authorization models
  • CI/CD reproducibility through declarative Helm values

Scope

Create a new Helm subchart (charts/airflow/) providing a modular Airflow deployment for Gen3-based clusters.

Components

  1. Airflow Core

    • Base image: apache/airflow:3.1.0 (latest stable).
    • Support for both LocalExecutor and CeleryExecutor.
    • Includes webserver, scheduler, worker, and optional triggerer pods.
  2. Persistent Storage

    • PostgreSQL metadata DB (shared or standalone).
    • Persistent volumes for DAGs and logs.
  3. Ingress

    • Optional ingress for the Airflow web UI.
    • Compatible with Gen3 TLS and revproxy patterns.
  4. Auth Integration

    • OIDC integration using Fence-issued JWTs.
    • Optional API token injection via Kubernetes secrets.
  5. Example DAGs

    • Calypr DAGs for:
      • TIFF offset calculator
      • QC validation pipeline
      • Metadata publication pre-check
  6. Configuration

    • values.yaml controls:
      airflow:
        enabled: true
        executor: LocalExecutor
        ingress:
          enabled: false
    • Mirrors Gen3’s optional-service pattern.

Deliverables

  • Helm subchart under charts/airflow/
  • Default values.yaml configuration for local and production clusters
  • Example dags/ folder mount with sample pipelines
  • CI smoke test (helm template + minimal pod startup)
  • Documentation (README.md)
    • Architecture diagram
    • Auth integration steps
    • Environment variable references

Acceptance Criteria

  • Airflow deploys via Helm using apache/airflow:3.1.0
  • Metadata DB initializes (airflow db migrate)
  • Webserver and scheduler pods start successfully
  • Sample DAG runs to completion
  • Optional OIDC authentication via Fence tokens works
  • helm lint and CI tests pass

Implementation Plan

  1. Chart Initialization

    • Scaffold charts/airflow/Chart.yaml, templates, and values.
    • Include optional dependencies (PostgreSQL, Redis).
  2. Configuration Defaults

    • Define image, executor, and resource settings.
    • Mount dags/ volume and log directories.
  3. Ingress + Auth

    • Support toggling ingress and external authentication.
    • Map JWT claims to Airflow RBAC roles.
  4. Testing

    • Validate helm template output.
    • Deploy in a local Kind or Minikube cluster.
    • Run example DAG.

References


Future Enhancements

  • Add GA4GH TES and Nextflow executor integration.
  • Add Prometheus/Grafana dashboards for workflow monitoring.
  • Register DAG runs in Gen3 Indexd or Metadata service for provenance tracking.
  • Support dynamic DAG generation from DRS or submission metadata.

Proposed by:
OHSU / ACED-IDP Calypr Team
Date: October 2025

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions