Skip to content

feature/Dedicated Kubernetes-Aware Logging Facility #90

@bwalsh

Description

@bwalsh

Feature Request: Integrate a Dedicated Kubernetes-Aware Logging Facility

Summary

Argo Workflows’ built-in archiveLogs feature is explicitly described as “not recommended” and “naive” and the docs recommend integrating a dedicated, Kubernetes-aware logging facility instead. :contentReference[oaicite:0]{index=0}

This feature request proposes:

  • Adding first-class integration with an open-source, Kubernetes-native logging stack that:
    • Can run on premises
    • Supports multi-tenant isolation
    • Exposes query and visualization APIs consumable from a GitApp UI
  • Providing opinionated defaults (e.g., Loki / OpenSearch) while keeping the logging backend pluggable
  • Offering a consistent UX in the GitApp UI for viewing, filtering, and correlating workflow logs.

Motivation

From the Argo Workflows documentation:

“We do not recommend relying on Argo to archive logs as it is naive and not purpose-built for indexing, searching, and storing logs… we recommend you integrate a dedicated, Kubernetes-aware logging facility.” :contentReference[oaicite:1]{index=1}

Current archiveLogs behavior:

  • Stores logs as artifacts in the configured artifact repository.
  • Works as a convenience for viewing logs of garbage-collected Pods.
  • Does not provide:
    • Rich querying (labels, time ranges, regex)
    • Long-term retention and lifecycle policies
    • Efficient indexing across clusters/namespaces
    • Multi-tenant log boundaries
    • Security and audit features expected in a production logging stack.

For multi-tenant, GitOps-driven environments, we need:

  • A standard logging backend (or set of supported backends).
  • A clear library and API for pushing and querying logs.
  • Direct integration into the GitApp UI, so users do not have to context-switch between multiple dashboards.

Goals

  1. Provide a Kubernetes-aware, open-source logging integration that can run fully on premises.
  2. Support at least one reference implementation, e.g.:
    • Loki + Promtail + Grafana, or
    • OpenSearch + Fluent Bit / Fluentd + dashboards.
  3. Expose logs to the GitApp UI with:
    • Per-tenant scoping (based on namespace / labels)
    • Workflow-level and step-level log views
    • Time-range and label filters
  4. Keep the logging backend pluggable via configuration so sites can bring their own stack.
  5. Document migration guidance away from relying on archiveLogs as the primary source of truth.

Non-Goals

  • Re-implementing a full log indexer inside Argo or GitApp.
  • Mandating a single logging vendor or SaaS solution.
  • Solving organization-wide logging beyond the workflow / tenant scope.

Proposal

1. Pluggable Logging Backend Abstraction

Introduce a logging backend abstraction in configuration, e.g.:

logging:
  enabled: true

  backend: loki     # one of: loki, opensearch, elasticsearch, custom

  loki:
    url: https://loki.example.org
    queryRangePath: /loki/api/v1/query_range
    writeEndpoint:  https://loki.example.org/loki/api/v1/push

  opensearch:
    url: https://opensearch.example.org
    indexPrefix: workflows-
    authSecretRef: opensearch-credentials

  # optionally custom:
  custom:
    queryUrl: https://logs.example.org/api/logs

Characteristics:

  • Configurable at cluster level (and optionally overridable per tenant / namespace).

  • Backends are required to expose:

    • A write API (for log ingestion).
    • A read API (for querying logs with label & time filters).
  • Support TLS, authentication headers, and token-based access via Secrets.


2. Kubernetes-Aware Log Collection

Leverage standard agents (Promtail, Fluent Bit, etc.) running as DaemonSets:

  • Agents discover Pods using Kubernetes metadata:

    • namespace
    • pod
    • container
    • workflow-specific labels (e.g., workflows.argoproj.io/workflow, workflows.argoproj.io/step).
  • Agents push logs to the configured backend with labels:

    • tenant (derived from namespace or RepoRegistration)
    • repo / owner
    • workflow, nodeId, phase.

This follows the recommendation in the docs to use a Kubernetes-aware logging facility instead of artifact-based log storage. ([Argo Workflows][1])


3. GitApp UI Integration

Add a dedicated Logs tab to the GitApp UI, with:

  • Tenant view

    • Filter by repository, branch, workflow name, label set, and time range.
  • Workflow detail view

    • Tabs: Overview | Graph | Logs.

    • Per-step log viewer with streaming and paging:

      • “All logs”
      • “By step” (each container/step selectable)
  • Search

    • Time range picker

    • Text/regex search in logs (implemented via backend query)

    • Filters for:

      • namespace / tenant
      • workflow name
      • label keys/values

Implementation:

  • GitApp UI calls a Logs API in the backend (GitApp server), which proxies to the configured logging backend.

  • Backend enforces multi-tenant access control:

    • A user can only query logs for tenants they have access to (reuse writeUsers/readUsers mappings from RepoRegistration or similar).

4. Workflow Metadata Linking

To align with existing Argo concepts:

  • Ensure each workflow’s Pods carry stable, structured labels:

    • workflows.argoproj.io/workflow
    • workflows.argoproj.io/template
    • workflows.argoproj.io/node-id
  • Logging backend stores these labels in the log streams / indices.

  • GitApp queries logs using:

    • Workflow name or UID
    • Node ID / step ID
    • Time window derived from workflow start/end timestamps

This enables direct click-through from a workflow node to associated logs.


5. Migration and Co-existence with archiveLogs

Given that archiveLogs is not recommended for full logging, but still convenient for quick artifact-style access, we propose:

  • Keep archiveLogs available for short-term convenience, but:

    • Update docs and UI to mark it as “not recommended as primary logging” (matching upstream docs). ([Argo Workflows][1])
  • Encourage operators to:

    • Enable the dedicated logging facility.
    • Use the GitApp Logs UI for most log inspection.
  • Provide a migration checklist:

    • Enable agents and backend.
    • Verify labels and queries.
    • Audit tenant RBAC.
    • Optionally reduce or disable log archiving to artifacts.

Open-Source, On-Prem Requirements

The default reference stack must be:

  • Fully open source (e.g., Loki or OpenSearch ecosystem).

  • Self-hostable on-prem or within a private cloud.

  • Packaged as:

    • Helm chart values (for the logging backend itself or integration with an existing cluster-wide deployment).
    • Example Kubernetes manifests.

Operators should be able to choose:

  • Managed / external logging endpoints if desired (by pointing config at an existing Loki/OpenSearch/Elastic deployment), or
  • A fully in-cluster on-prem logging stack.

Risks and Considerations

  • Additional operational complexity (managing & scaling the logging backend).
  • Storage cost and retention policies for logs, especially in large clusters.
  • Need to be careful about log access security in multi-tenant environments.
  • Performance of UI when querying large log volumes; should include reasonable defaults and pagination.

Definition of Done

  • Configuration schema for logging backend introduced and documented.

  • Reference implementation for at least one backend (e.g., Loki) with:

    • Log agents (Promtail/Fluent Bit) configured
    • Kubernetes metadata and workflow labels attached.
  • GitApp backend exposes a Logs API with tenant-aware access control.

  • GitApp UI has:

    • Tenant-level Logs view
    • Workflow-level Logs tab (with per-step filtering and time range).
  • Documentation updated to:

    • Explain the recommended logging integration.
    • Clarify limitations of archiveLogs and position it as secondary.
    • Provide examples and diagrams for the logging architecture.
  • At least one end-to-end scenario tested:

    • Run workflow → generate logs → view in GitApp UI → filter by step/time → verify RBAC boundaries between tenants.

See [1]: https://argo-workflows.readthedocs.io/en/latest/configure-archive-logs/

"Configuring Archive Logs - Argo Workflows - The workflow engine for Kubernetes"

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions