-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Feature Request: Integrate a Dedicated Kubernetes-Aware Logging Facility
Summary
Argo Workflows’ built-in archiveLogs feature is explicitly described as “not recommended” and “naive” and the docs recommend integrating a dedicated, Kubernetes-aware logging facility instead. :contentReference[oaicite:0]{index=0}
This feature request proposes:
- Adding first-class integration with an open-source, Kubernetes-native logging stack that:
- Can run on premises
- Supports multi-tenant isolation
- Exposes query and visualization APIs consumable from a GitApp UI
- Providing opinionated defaults (e.g., Loki / OpenSearch) while keeping the logging backend pluggable
- Offering a consistent UX in the GitApp UI for viewing, filtering, and correlating workflow logs.
Motivation
From the Argo Workflows documentation:
“We do not recommend relying on Argo to archive logs as it is naive and not purpose-built for indexing, searching, and storing logs… we recommend you integrate a dedicated, Kubernetes-aware logging facility.” :contentReference[oaicite:1]{index=1}
Current archiveLogs behavior:
- Stores logs as artifacts in the configured artifact repository.
- Works as a convenience for viewing logs of garbage-collected Pods.
- Does not provide:
- Rich querying (labels, time ranges, regex)
- Long-term retention and lifecycle policies
- Efficient indexing across clusters/namespaces
- Multi-tenant log boundaries
- Security and audit features expected in a production logging stack.
For multi-tenant, GitOps-driven environments, we need:
- A standard logging backend (or set of supported backends).
- A clear library and API for pushing and querying logs.
- Direct integration into the GitApp UI, so users do not have to context-switch between multiple dashboards.
Goals
- Provide a Kubernetes-aware, open-source logging integration that can run fully on premises.
- Support at least one reference implementation, e.g.:
- Loki + Promtail + Grafana, or
- OpenSearch + Fluent Bit / Fluentd + dashboards.
- Expose logs to the GitApp UI with:
- Per-tenant scoping (based on namespace / labels)
- Workflow-level and step-level log views
- Time-range and label filters
- Keep the logging backend pluggable via configuration so sites can bring their own stack.
- Document migration guidance away from relying on
archiveLogsas the primary source of truth.
Non-Goals
- Re-implementing a full log indexer inside Argo or GitApp.
- Mandating a single logging vendor or SaaS solution.
- Solving organization-wide logging beyond the workflow / tenant scope.
Proposal
1. Pluggable Logging Backend Abstraction
Introduce a logging backend abstraction in configuration, e.g.:
logging:
enabled: true
backend: loki # one of: loki, opensearch, elasticsearch, custom
loki:
url: https://loki.example.org
queryRangePath: /loki/api/v1/query_range
writeEndpoint: https://loki.example.org/loki/api/v1/push
opensearch:
url: https://opensearch.example.org
indexPrefix: workflows-
authSecretRef: opensearch-credentials
# optionally custom:
custom:
queryUrl: https://logs.example.org/api/logsCharacteristics:
-
Configurable at cluster level (and optionally overridable per tenant / namespace).
-
Backends are required to expose:
- A write API (for log ingestion).
- A read API (for querying logs with label & time filters).
-
Support TLS, authentication headers, and token-based access via Secrets.
2. Kubernetes-Aware Log Collection
Leverage standard agents (Promtail, Fluent Bit, etc.) running as DaemonSets:
-
Agents discover Pods using Kubernetes metadata:
namespacepodcontainer- workflow-specific labels (e.g.,
workflows.argoproj.io/workflow,workflows.argoproj.io/step).
-
Agents push logs to the configured backend with labels:
tenant(derived from namespace or RepoRegistration)repo/ownerworkflow,nodeId,phase.
This follows the recommendation in the docs to use a Kubernetes-aware logging facility instead of artifact-based log storage. ([Argo Workflows][1])
3. GitApp UI Integration
Add a dedicated Logs tab to the GitApp UI, with:
-
Tenant view
- Filter by repository, branch, workflow name, label set, and time range.
-
Workflow detail view
-
Tabs:
Overview|Graph|Logs. -
Per-step log viewer with streaming and paging:
- “All logs”
- “By step” (each container/step selectable)
-
-
Search
-
Time range picker
-
Text/regex search in logs (implemented via backend query)
-
Filters for:
- namespace / tenant
- workflow name
- label keys/values
-
Implementation:
-
GitApp UI calls a Logs API in the backend (GitApp server), which proxies to the configured logging backend.
-
Backend enforces multi-tenant access control:
- A user can only query logs for tenants they have access to (reuse
writeUsers/readUsersmappings from RepoRegistration or similar).
- A user can only query logs for tenants they have access to (reuse
4. Workflow Metadata Linking
To align with existing Argo concepts:
-
Ensure each workflow’s Pods carry stable, structured labels:
workflows.argoproj.io/workflowworkflows.argoproj.io/templateworkflows.argoproj.io/node-id
-
Logging backend stores these labels in the log streams / indices.
-
GitApp queries logs using:
- Workflow name or UID
- Node ID / step ID
- Time window derived from workflow start/end timestamps
This enables direct click-through from a workflow node to associated logs.
5. Migration and Co-existence with archiveLogs
Given that archiveLogs is not recommended for full logging, but still convenient for quick artifact-style access, we propose:
-
Keep
archiveLogsavailable for short-term convenience, but:- Update docs and UI to mark it as “not recommended as primary logging” (matching upstream docs). ([Argo Workflows][1])
-
Encourage operators to:
- Enable the dedicated logging facility.
- Use the GitApp Logs UI for most log inspection.
-
Provide a migration checklist:
- Enable agents and backend.
- Verify labels and queries.
- Audit tenant RBAC.
- Optionally reduce or disable log archiving to artifacts.
Open-Source, On-Prem Requirements
The default reference stack must be:
-
Fully open source (e.g., Loki or OpenSearch ecosystem).
-
Self-hostable on-prem or within a private cloud.
-
Packaged as:
- Helm chart values (for the logging backend itself or integration with an existing cluster-wide deployment).
- Example Kubernetes manifests.
Operators should be able to choose:
- Managed / external logging endpoints if desired (by pointing config at an existing Loki/OpenSearch/Elastic deployment), or
- A fully in-cluster on-prem logging stack.
Risks and Considerations
- Additional operational complexity (managing & scaling the logging backend).
- Storage cost and retention policies for logs, especially in large clusters.
- Need to be careful about log access security in multi-tenant environments.
- Performance of UI when querying large log volumes; should include reasonable defaults and pagination.
Definition of Done
-
Configuration schema for logging backend introduced and documented.
-
Reference implementation for at least one backend (e.g., Loki) with:
- Log agents (Promtail/Fluent Bit) configured
- Kubernetes metadata and workflow labels attached.
-
GitApp backend exposes a Logs API with tenant-aware access control.
-
GitApp UI has:
- Tenant-level Logs view
- Workflow-level Logs tab (with per-step filtering and time range).
-
Documentation updated to:
- Explain the recommended logging integration.
- Clarify limitations of
archiveLogsand position it as secondary. - Provide examples and diagrams for the logging architecture.
-
At least one end-to-end scenario tested:
- Run workflow → generate logs → view in GitApp UI → filter by step/time → verify RBAC boundaries between tenants.
See [1]: https://argo-workflows.readthedocs.io/en/latest/configure-archive-logs/
"Configuring Archive Logs - Argo Workflows - The workflow engine for Kubernetes"