-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Issue: Integrate Apache Airflow Deployment into Calypr / Gen3 Helm Chart
Related: bmeg/tractor#13
Summary
Implement a reusable Apache Airflow Helm sub-chart within the gen3-helm repository to support Calypr (and other Gen3-derived environments) in orchestrating data submission, transfer, and background jobs.
This builds on lessons learned from bmeg/tractor#13, where Airflow was upgraded and stabilized for production use (now using apache/airflow:3.1.0).
Motivation
Many Calypr and Gen3-based environments require workflow automation for:
- background file preparation and validation jobs
- dataset publication pre-checks (e.g., TIFF offset calculation, QC workflows)
- automatic orchestration of TES, DRS, and Indexd integration
- long-running metadata enrichment or monitoring tasks
Integrating Airflow as a Helm-managed component provides:
- A standardized, reproducible orchestration environment across deployments
- Seamless integration with Fence-issued JWTs and Calypr service accounts
- RBAC alignment with existing Gen3 authorization models
- CI/CD reproducibility through declarative Helm values
Scope
Create a new Helm subchart (charts/airflow/) providing a modular Airflow deployment for Gen3-based clusters.
Components
-
Airflow Core
- Base image:
apache/airflow:3.1.0(latest stable). - Support for both
LocalExecutorandCeleryExecutor. - Includes
webserver,scheduler,worker, and optionaltriggererpods.
- Base image:
-
Persistent Storage
- PostgreSQL metadata DB (shared or standalone).
- Persistent volumes for DAGs and logs.
-
Ingress
- Optional ingress for the Airflow web UI.
- Compatible with Gen3 TLS and revproxy patterns.
-
Auth Integration
- OIDC integration using Fence-issued JWTs.
- Optional API token injection via Kubernetes secrets.
-
Example DAGs
- Calypr DAGs for:
- TIFF offset calculator
- QC validation pipeline
- Metadata publication pre-check
- Calypr DAGs for:
-
Configuration
values.yamlcontrols:airflow: enabled: true executor: LocalExecutor ingress: enabled: false
- Mirrors Gen3’s optional-service pattern.
Deliverables
- Helm subchart under
charts/airflow/ - Default
values.yamlconfiguration for local and production clusters - Example
dags/folder mount with sample pipelines - CI smoke test (
helm template+ minimal pod startup) - Documentation (
README.md)- Architecture diagram
- Auth integration steps
- Environment variable references
Acceptance Criteria
- Airflow deploys via Helm using
apache/airflow:3.1.0 - Metadata DB initializes (
airflow db migrate) - Webserver and scheduler pods start successfully
- Sample DAG runs to completion
- Optional OIDC authentication via Fence tokens works
-
helm lintand CI tests pass
Implementation Plan
-
Chart Initialization
- Scaffold
charts/airflow/Chart.yaml, templates, and values. - Include optional dependencies (PostgreSQL, Redis).
- Scaffold
-
Configuration Defaults
- Define image, executor, and resource settings.
- Mount
dags/volume and log directories.
-
Ingress + Auth
- Support toggling ingress and external authentication.
- Map JWT claims to Airflow RBAC roles.
-
Testing
- Validate
helm templateoutput. - Deploy in a local Kind or Minikube cluster.
- Run example DAG.
- Validate
References
- Upstream Airflow upgrade: bmeg/tractor#13
- Airflow 3.1.0 release notes: https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html
- Official Docker image tags: https://hub.docker.com/r/apache/airflow/tags
- Calypr automation use cases: internal proposal (Calypr workflow automation document)
Future Enhancements
- Add GA4GH TES and Nextflow executor integration.
- Add Prometheus/Grafana dashboards for workflow monitoring.
- Register DAG runs in Gen3 Indexd or Metadata service for provenance tracking.
- Support dynamic DAG generation from DRS or submission metadata.
Proposed by:
OHSU / ACED-IDP Calypr Team
Date: October 2025