Skip to content

Helm chart for Argo + Github Actions + Nextflow runner #85

@bwalsh

Description

@bwalsh

Goals

  • K8s-native orchestration via Argo Workflows running Nextflow (k8s executor).
  • Multi-tenant isolation by namespace, RBAC, quotas, and network policies.
  • Triggers from internal Git servers and/or GitHub Actions → Argo Events.
  • Results and cache on Amazon S3 (or S3-compatible on-prem), surfaced in Argo UI and CI artifacts.

High-Level Architecture (Multi-Tenant)

flowchart TB
  subgraph Git[Tenant Git Servers & GitHub Actions]
    GH[GitHub Actions per-tenant workflows]
    GS[Internal Git servers GHES/GitLab/Gitea]
  end

  GH -->|Webhook / API| AE[Argo Events Gateway per-tenant or shared]
  GS -->|Webhook / Polling| AE

  subgraph ARGO[Cluster: Argo Control Plane namespace: argo]
    AE --> SENS[Argo Sensors]
    SENS -->|submit| WFCTRL[Argo Workflows Controller]
    UI[Argo Server SSO + RBAC]
  end

  subgraph TENANT_A[Namespace: wf-teamA]
    WFT_A[WorkflowTemplate: nextflow-runner]
    SA_A[(ServiceAccount: nextflow-launcher)]
    CFG_A[ConfigMap: nextflow.config k8s profile]
    SEC_A[(Secret: S3 creds / WI)]
    NF_A[Nextflow Executor Pods]
  end

  subgraph TENANT_B[Namespace: wf-teamB]
    WFT_B[WorkflowTemplate: nextflow-runner]
    SA_B[(ServiceAccount: nextflow-launcher)]
    CFG_B[ConfigMap: nextflow.config]
    SEC_B[(Secret: S3 creds / WI)]
    NF_B[Nextflow Executor Pods]
  end

  WFCTRL -->|launch| WFT_A
  WFCTRL -->|launch| WFT_B
  WFT_A --> NF_A
  WFT_B --> NF_B

  subgraph S3[Amazon S3 Multi-tenant buckets/prefixes]
    CACHE[Nextflow workDir cache]
    ART[Reports & results: report.html, timeline.html, trace.txt]
  end

  NF_A -->|read/write| S3
  NF_B -->|read/write| S3
  UI -->|view| WFCTRL
Loading

Key multi-tenant controls

  • One namespace per team: wf-<team>, each with own SA/RBAC/Quotas/NetworkPolicies.
  • Argo Server SSO + RBAC: users only see their namespace.
  • S3 buckets or strict prefixes per tenant + lifecycle rules.
  • GitHub Actions can trigger Argo (webhook → Argo Events) or call Argo API directly with a short-lived token.

Milestones

M1 — Control Plane & Guardrails

  • Install Argo Workflows + Argo Events into argo.
  • Configure Argo Server SSO (OIDC) and Argo RBAC (map IdP groups → namespaces).
  • Baseline Pod Security and default deny NetworkPolicy cluster-wide.
  • Prepare S3: per-tenant bucket (or prefix), IAM policies, lifecycle (cache expiry).

M2 — Tenant Blueprint (repeatable)

Create per tenant:

  • Namespace wf-<team>.
  • ServiceAccount nextflow-launcher + namespace-scoped Role/RoleBinding (pods/jobs + argo resources).
  • ResourceQuota/LimitRange, NetworkPolicies (egress only to Git, registry, S3, DNS).
  • ConfigMap nextflow-config (k8s profile).
  • Secret for S3 (or workload identity).

RBAC (namespace-scoped)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nextflow-launcher
  namespace: wf-teamA
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: nextflow-executor
  namespace: wf-teamA
rules:
  - apiGroups: ["", "batch"]
    resources: ["pods","pods/log","pods/status","jobs"]
    verbs: ["create","get","list","watch","delete","patch"]
  - apiGroups: ["argoproj.io"]
    resources: ["workflows","workflowtemplates","cronworkflows"]
    verbs: ["create","get","list","watch","delete","patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nextflow-executor-binding
  namespace: wf-teamA
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nextflow-executor
subjects:
  - kind: ServiceAccount
    name: nextflow-launcher
    namespace: wf-teamA

Nextflow k8s profile (uses S3)

apiVersion: v1
kind: ConfigMap
metadata:
  name: nextflow-config
  namespace: wf-teamA
data:
  nextflow.config: |
    profiles {
      k8s {
        process.executor = 'k8s'
        workDir = System.getenv('NF_WORKDIR') ?: 's3://teamA-nf-workdir'
        k8s {
          namespace = 'wf-teamA'
          serviceAccount = 'nextflow-launcher'
          pod = [
            imagePullPolicy: 'IfNotPresent',
            resources: [ limits: [ cpu: '4', memory: '8Gi' ], requests: [ cpu: '1', memory: '2Gi' ] ]
          ]
        }
        aws.client.bucketRegion = System.getenv('AWS_REGION') ?: 'us-west-2'
        // For S3-compatible endpoints, also set aws.client.endpoint & pathStyleAccess
      }
    }

M3 — Artifact Surfacing (Argo ↔ S3)

  • Configure ArtifactRepositoryRef so Argo links HTML reports while Nextflow uses S3 as workDir.
  • Option A: one global default in argo namespace (simple).
  • Option B: per-tenant artifact repo (strict separation).
apiVersion: v1
kind: ConfigMap
metadata:
  name: artifact-repositories
  namespace: argo
data:
  default-v1: |
    s3:
      bucket: teamA-argo-artifacts
      endpoint: s3.amazonaws.com
      region: us-west-2
      accessKeySecret: { name: teamA-s3, key: accessKey }
      secretKeySecret: { name: teamA-s3, key: secretKey }
    archiveLogs: true

M4 — WorkflowTemplate (Nextflow runner, reusable)

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: nextflow-runner
  namespace: wf-teamA
spec:
  entrypoint: run
  arguments:
    parameters:
      - { name: pipeline, value: "./" }
      - { name: params,   value: "" }
  templates:
    - name: run
      container:
        image: ghcr.io/yourorg/nf-runtime:jdk17
        command: ["/bin/bash","-lc"]
        args:
          - >-
            set -euo pipefail;
            export NXF_HOME="/workspace/.nxf";
            cp /config/nextflow.config ./nextflow.config || true;
            if [[ "{{workflow.parameters.pipeline}}" == "./" ]]; then
              PIPE="./";
            else
              PIPE="-r {{workflow.parameters.pipeline}}";
            fi;
            nextflow run $PIPE -profile k8s {{workflow.parameters.params}} \
              -with-report report.html -with-timeline timeline.html -with-trace trace.txt;
      volumeMounts:
        - name: cfg
          mountPath: /config
      volumes:
        - name: cfg
          configMap: { name: nextflow-config }
      env:
        - name: AWS_REGION
          valueFrom: { secretKeyRef: { name: s3-credentials, key: AWS_REGION } }
        - name: AWS_ACCESS_KEY_ID
          valueFrom: { secretKeyRef: { name: s3-credentials, key: AWS_ACCESS_KEY_ID } }
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom: { secretKeyRef: { name: s3-credentials, key: AWS_SECRET_ACCESS_KEY } }
        - name: NF_WORKDIR
          valueFrom: { secretKeyRef: { name: s3-credentials, key: NF_WORKDIR } }
        # For S3-compatible endpoints, add S3_ENDPOINT + pathStyleAccess in config
      outputs:
        artifacts:
          - { name: report,   path: /workspace/report.html }
          - { name: timeline, path: /workspace/timeline.html }
          - { name: trace,    path: /workspace/trace.txt }
      serviceAccountName: nextflow-launcher

M5 — Triggers

A) GitHub Actions → Argo (Nextflow-based Actions emphasized)

  • Preferred: GHA workflow sends a webhook (signed secret) → Argo Events.
  • Alternative: GHA calls Argo Server API (OIDC token or robot user) to submit a Workflow from WorkflowTemplate.

GHA example (dispatch → Argo API)

name: trigger-argo-nextflow
on:
  workflow_dispatch:
    inputs:
      pipeline: { description: "Path or repo", default: "./", required: true }
      params:   { description: "Nextflow params", required: false, default: "" }

jobs:
  submit:
    runs-on: ubuntu-latest
    steps:
      - name: Submit to Argo (tenant A)
        env:
          ARGO_ADDR: https://argo.example.org
          ARGO_TOKEN: ${{ secrets.ARGO_TEAM_A_BEARER }}
        run: |
          cat > wf.json <<'JSON'
          {
            "namespace": "wf-teamA",
            "resourceKind": "Workflow",
            "resource": {
              "apiVersion": "argoproj.io/v1alpha1",
              "kind": "Workflow",
              "metadata": { "generateName": "nf-" },
              "spec": {
                "workflowTemplateRef": { "name": "nextflow-runner" },
                "arguments": {
                  "parameters": [
                    { "name": "pipeline", "value": "${{ github.event.inputs.pipeline }}" },
                    { "name": "params",   "value": "${{ github.event.inputs.params }}" }
                  ]
                }
              }
            }
          }
          JSON
          curl -sS -X POST "$ARGO_ADDR/api/v1/workflows/wf-teamA/submit" \
            -H "Authorization: Bearer $ARGO_TOKEN" \
            -H "Content-Type: application/json" \
            --data-binary @wf.json

B) Argo Events (webhook) from Git servers

  • EventSource exposes /webhook/<repo>; Sensor parameterizes pipeline & params and submits from WorkflowTemplate.
  • Polling fallback: CronJob that git ls-remote and fires a local event when refs change.

M6 — Ad-Hoc & Scheduled

  • Ad-hoc: Argo UI “Submit” (RBAC isolates per namespace), or argo submit --from workflowtemplate/… -p ….
  • Scheduled: CronWorkflow per tenant:
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: nightly-nf
  namespace: wf-teamA
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: "Forbid"
  workflowSpec:
    workflowTemplateRef: { name: nextflow-runner }
    arguments:
      parameters:
        - { name: pipeline, value: "./pipelines/rnaseq" }
        - { name: params,   value: "--reads 's3://teamA/raw/*.fastq.gz'" }

M7 — Results & Observability

  • S3 is source of truth: s3://teamA-nf-workdir/<pipeline>/<run-id>/…
  • Argo UI shows HTML reports (report.html, timeline.html, trace.txt) via artifacts.
  • Centralize logs to Loki/ELK with labels: tenant, pipeline, run_id, commit, trigger=gha|webhook.
  • S3 lifecycle: expire cache/work dirs after N days; retain final outputs per policy.

Multi-Tenant Checklist

  • Namespaces: one per team; no shared runtime.
  • RBAC: namespace-scoped Roles; Argo RBAC maps IdP groups → only their namespace.
  • NetworkPolicies: default deny; egress allowlist (Git, registry, S3, DNS).
  • Quotas: per namespace CPU/MEM/POD caps; optional PriorityClasses.
  • S3 segmentation: buckets per tenant or prefixes with IAM boundaries.
  • Supply chain: signed/pinned images; OPA/Gatekeeper allow-list.
  • Secrets: External Secrets / workload identity; never mount cluster-wide creds.

Acceptance

  • Ad-hoc run (UI + CLI) and GitHub Actions-triggered run both succeed.
  • Nextflow executor pods stay within tenant namespace; no cross-tenant access.
  • Artifacts visible in Argo UI; S3 paths follow tenant/run-ID convention.
  • Quota breach is contained to the tenant; other tenants unaffected.
  • Lifecycle rules reclaim cache without deleting retained outputs.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions