Centralized operational logging architecture for multi-host CLP Package deployments

# Centralized operational logging architecture for multi-host CLP deployments

> **Document structure:** This document covers Docker Compose deployments first, followed by a
> [Kubernetes-specific section](#kubernetes-considerations) that explains how the same approaches
> apply to Kubernetes.

### Request

### Background

Currently, CLP's deployment architecture requires container-to-host volume mounts for all service
logs. Each service writes logs to files within `/var/log/<component>/` which are mounted from
`${CLP_LOGS_DIR_HOST:-./var/log}` on the host. While this approach has been convenient for
single-host deployments (allowing users to easily provide logs by archiving host files), it creates
significant challenges for multi-host deployments:

1. **Multi-host incompatibility**: In Kubernetes or multi-node Docker Compose deployments, logs are
 scattered across different hosts, making centralized log access difficult
2. **Storage overhead**: Each host requires dedicated storage for log retention
3. **Operational complexity**: Admins must access (e.g. SSH into) individual hosts or set up
 additional log aggregation infrastructure
4. **Scaling limitations**: Adding nodes increases operational burden exponentially

With planned support for multi-host deployments through Kubernetes, and addition of the log-ingestor
component, there's an opportunity to modernize the operational logging architecture to leverage:

1. **CLP's own compression technology** - operational logs can benefit from the same
 high-compression-ratio storage that user logs enjoy
2. **Container-native logging** - Docker/Kubernetes native log drivers eliminate the need for host
 mounts
3. **Centralized access** - all logs accessible from a single control node
4. **WebUI integration** - operational logs viewable alongside user logs in the existing CLP
 interface

### Requirements

The new operational logging solution must satisfy the following requirements:

#### R1: Multi-host Support

- **R1.1**: Support both Kubernetes (multi-node) and Docker Compose (single or multi-host)
 deployments
- **R1.2**: All logs accessible from a central control node, regardless of where services are
 running
- **R1.3**: No per-host log file access required for normal operations

#### R2: Tiered Access

- **R2.1 (Hot)**: Real-time access to recent logs (0-X minutes) for debugging active/crashed
 services
 - Maximum acceptable lag: < 30 seconds
 - Must support live tailing
- **R2.2 (Warm)**: Recent historical logs (X minutes - Y hours) available uncompressed for immediate
 grep/analysis
 - Maximum acceptable lag: < 5 minutes
 - Must support live tailing
- **R2.3 (Cold)**: Older logs compressed using CLP for long-term storage and efficient retrieval
 - Maximum acceptable lag: < 24 hours
 - Must support full-text search

#### R3: Admin Access & Export

- **R3.1**: Deployment admins can view all logs from all services
- **R3.2**: Easy export mechanism for sending logs to support/developers
- **R3.3**: Export should include both real-time and historical logs

#### R4: WebUI Integration

- **R4.1**: Dedicated WebUI page for viewing real-time operational logs (files on disk)
- **R4.2**: Operational logs searchable through existing Search page once archived
- **R4.3**: Support filtering by service name, log level, time range
- **R4.4**: (Future) Admin-only access control for operational logs

#### R5: Lightweight & Efficient

- **R5.1**: Minimal additional resource overhead (< 50MB memory, < 0.1 CPU core)
- **R5.2**: Use CLP's compression capabilities for long-term storage
- **R5.3**: No heavyweight dependencies

#### R6: Incremental Migration

- **R6.1**: Support gradual service-by-service migration from file-based to centralized logging
- **R6.2**: Maintain backward compatibility during transition
- **R6.3**: Clear deprecation path for `CLP_LOGS_DIR` environment variables

### Possible implementation

### Architecture Overview

#### Current

```mermaid
flowchart TD
 A["All services write to files (some also write to stdout)"]
 B["Docker local logging driver"]

 subgraph Host1 [Host 1]
 C1["/var/log/&lt;component&gt;/*.log (via volume mount)"]
 D1["docker logs container-name"]
 end

 subgraph Host2 [Host 2]
 C2["/var/log/&lt;component&gt;/*.log (via volume mount)"]
 D2["docker logs container-name"]
 end

 A --> B
 B --> C1
 B --> D1
 B --> C2
 B --> D2

 E["Admin must access each host separately (SSH, copy files, etc.)"]
 C1 -.-> E
 C2 -.-> E
```

**Characteristics:**
- Services write logs to files via volume mounts from `${CLP_LOGS_DIR_HOST}`
- Logs scattered across hosts - no centralized access
- Admin must SSH to each host to view/export logs

#### After (CLP-managed Fluent Bit)

```mermaid
flowchart TD
 A[All services write to stdout]
 B["Docker fluentd logging driver"]

 subgraph ControlNode [Control Node]
 C["Fluent Bit (receives logs from all hosts)"]

 subgraph Output1 [Output 1: File with rotation]
 D["/var/log/&lt;component&gt;/*.log"]
 E["WebUI reads directly (real-time access)"]
 D --> E
 end

 subgraph Output2 [Output 2: S3 + CLP archives]
 F["S3 (IRv2 compressed logs)"]
 G["Log-ingestor (periodic ingestion)"]
 H["CLP Archives (dataset='_clp')"]
 I["WebUI Search page"]
 F --> G --> H --> I
 end

 C --> D
 C --> F
 end

 subgraph WorkerNode1 [Worker Node 1]
 W1["Container logs"]
 end

 subgraph WorkerNode2 [Worker Node 2]
 W2["Container logs"]
 end

 A --> B
 B --> W1 -->|"fluentd-address"| C
 B --> W2 -->|"fluentd-address"| C

 J["docker logs (via dual logging cache)"]
 B -.-> J
```

**Characteristics:**
- All logs centralized on control node via Fluent Bit
- Organized path structure (`/var/log/<component>/`)
- Automatic S3 upload with IRv2 compression
- Historical logs searchable in WebUI via `_clp` dataset
- `docker logs` still works via dual logging (Docker 20.10+)

### Three-Tier Data Lifecycle

1. **Hot Tier (0-X minutes)**: Files on disk at `/var/log/<component>/*.log`
 - **Access method**: WebUI new endpoint `/os/cat` (similar to existing `/os/ls`)
 - **Retention**: Managed by Fluent Bit file rotation (time-based or size-based)
 - **Purpose**: Real-time debugging, live tail, recent log access

2. **Warm Tier (X min - Y hours)**: IRv2 files on S3
 - **Access method**: (Future optimization) Query directly from IRv2 without full archive
 ingestion
 - **Retention**: Until log-ingestor processes and archives them
 - **Purpose**: Transition period; reduces ingestion urgency

3. **Cold Tier (>Y hours)**: CLP Archives
 - **Access method**: Existing WebUI Search page with `dataset=_clp` filter
 - **Retention**: Configurable archive retention policy
 - **Purpose**: Long-term searchable storage with high compression

### Component Changes

#### 1. Fluent Bit deployment

* Docker Compose
* Kubernetes: DaemonSet or single Deployment on control node (to be determined based on performance
 testing)
 * Log tailing

#### 2. Fluent Bit Configuration for Docker Compose

fluent-bit.conf

```
[SERVICE]
 Flush 5
 Daemon Off
 Log_Level info

[INPUT]
 Name forward
 Listen 0.0.0.0
 Port 24224

# Output 1: File for real-time access
[OUTPUT]
 Name file
 Match *
 Path /var/log
 Format json
 # Rotation policy (matches CLP plugin flush)
 # TBD: time-based or size-based to align with CLP plugin

# Output 2: CLP plugin for S3 + IRv2 compression
[OUTPUT]
 Name clp_s3
 Match *
 s3_region ${CLP_S3_REGION}
 s3_bucket ${CLP_S3_BUCKET}
 s3_bucket_prefix ir/${FLUENT_BIT_TAG}/%Y/%m/%d/
 upload_size_mb 16
 use_disk_buffer true
 # Uses Zstd compression for IRv2 format
```

Open Questions:

* What should be the rotation policy for file output?
 * Time-based (e.g., rotate every 5 minutes)?
 * Size-based (e.g., rotate at 50MB)?
 * Should align with CLP plugin's flush policy to ensure synchronization
* What is the exact flush/upload behavior of the CLP Fluent Bit plugin (irv2-beta)?
 * Triggered by upload_size_mb threshold?
 * Time-based interval?
 * Both?

#### 3. Service migration (Incremental)

Phase 1: Migrate CLP first-party services

```yaml
compression-worker:
 # Remove old configuration
 # environment:
 # CLP_LOGS_DIR: "/var/log/compression_worker" # DEPRECATED
 # CLP_WORKER_LOG_PATH: "/var/log/compression_worker/worker.log" # DEPRECATED
 # volumes:
 # - *volume_clp_logs # DEPRECATED

 # Add new logging driver
 logging:
 driver: "fluentd"
 options:
 fluentd-address: "fluent-bit:24224"
 tag: "clp.compression-worker"
 labels: "service,component"
 fluentd-async: "true"
```

Phase 2: Migrate third-party services (database, queue, redis, results-cache)

* Similar logging driver configuration
* Verify compatibility with each service's log format

Backward Compatibility:

* Keep ${CLP_LOGS_DIR_HOST} volume mounts during Phase 1-2 transition
 * Services log to both files (old) and Fluent Bit (new) temporarily
 After validation, remove old mounts and deprecate CLP_LOGS_DIR env vars

#### 4. S3 configuration (clp-config.yaml)

New bundled services:

bundled: ["database", "queue", "redis", "results_cache", "fluentbit", "minio"]

S3 path structure:

```
s3://<clp-bucket-name>/
├── ir/
│ ├── clp.compression-worker/
│ │ ├── 2025/01/15/
│ │ │ ├── clp.compression-worker_0_2025-01-15T10:30:00Z_<uuid>.zst
│ │ │ └── clp.compression-worker_1_2025-01-15T11:00:00Z_<uuid>.zst
│ │ └── 2025/01/16/...
│ ├── clp.query-worker/...
│ └── ...
└── archive/
 └── _clp/
 ├── 2025/01/
 └── ...
```

#### 5. Log-ingestor configuration

Ingest into dataset "_clp"

* "_clp" reserved for operational logs, which is prefixed with underscore to differentiate from user
 datasets
* Shows alongside other datasets in the dataset selector dropdown

Open Questions:

* What should be the scan interval for operational logs?
 * 5 minutes (matching buffer_timeout)?
 * Faster for quicker transition to searchable archives?

#### 6. WebUI enhancements

#### 6.1 New /os/cat API Endpoint

Volume mount (add to webui service in docker-compose-all.yaml):

#### 6.2 New "Operational Logs" Page

For querying realtime logs.

Location: /components/webui/client/src/pages/OperationalLogsPage/

Features:

* File browser using /os/ls API
* Log viewer using /os/cat API
* Download/export button
* For historical logs, redirect to the search page with the dataset filter set to _clp (see 6.3)

Open Questions:

* Should we also add `/os/tail` to support live tailing with server-sent events?
 * Pro: True live tail without client polling
 * Con: More complex implementation

#### 6.3 Search Page Enhancement

Once the log-ingestor for CLP operational logs is ready, we can verify that the logs are searchable
in the search page.

Dataset filter:

* Add URL parameter support: `/search?dataset=_clp`

### Migration Timeline

#### Phase 1: Infrastructure Setup

* Add Fluent Bit service to docker-compose-all.yaml
* Create Fluent Bit configuration with dual outputs (file + CLP plugin)
* Add fluentbit and minio (optional) to bundled services in config schema
* Update clp-config.yaml templates with S3 path structure

#### Phase 2: WebUI Development

* Implement /os/cat API endpoint
* Create Operational Logs page UI
* Add dataset URL parameter support to Search page
* Mount /var/log volume to webui service

#### Phase 3: Service Migration for Third-Party Services

Migrate bundled services (no change in our code is required)

* database (MariaDB)
* queue (RabbitMQ)
* redis
* results-cache (MongoDB)

Challenges:

* Each service has different log format
* May require custom Fluent Bit parsers
* Verify no log loss during migration

#### Phase 4: Service Migration for First-Party Services (First Wave)

Migrate Python-based services (easier log format standardization):

* compression-scheduler
* compression-worker
* query-scheduler
* query-worker
* garbage-collector
* reducer

Per-service checklist:

* Update logging driver to fluentd
* Keep CLP_LOGS_DIR env var but mark as deprecated
* Test real-time log access via WebUI
* Test log ingestion to archives
* Validate search functionality

#### Phase 5: Service Migration for First-Party Services (Second Wave)

Migrate remaining services:

* webui
* mcp-server
* api-server
* log-ingestor (tricky: logging about logging)
* spider-scheduler
* spider-compression-worker

#### Phase 6: Cleanup & Optimization

* Remove CLP_LOGS_DIR environment variables
* Remove `*volume_clp_logs mounts` from services (keep only in Fluent Bit and webui)
* Remove `${CLP_LOGS_DIR_HOST}` host mounts
* Documentation updates
* Performance tuning

#### Future Optimizations

* Direct IRv2 querying (Warm tier optimization):
 * Query worker currently supports archives only
 * Extend to support IRv2 stream files on S3
 * Would enable searching logs before full archive ingestion

* WebUI live tail (Server-sent events):
 * Current proposal: Client-side polling of /os/cat
 * Optimization: Server-sent events for true push-based tail

* Authentication & Authorization:
 * Current: No access control on operational logs
 * Future: Admin-only access to _clp dataset
 * Requires: an authentication system in the CLP Package (TBA)

* Structured logging standardization:
 * Ensure all CLP services output JSON logs
 * Consistent field names (timestamp, level, message, component, etc.)
 * Easier filtering and parsing in WebUI

* Multi-cluster support:
 * Current design: Single S3 bucket per deployment
 * Future: Multiple clusters writing to the same bucket with cluster ID prefix
 * Use case: Multi-region deployments for legal compliance

---

## Alternative approaches: Native Docker logging drivers

This section evaluates lighter-weight alternatives to the CLP-managed Fluent Bit approach, using
Docker's native logging drivers. These alternatives may appeal to users who:

- Prefer a simpler CLP Package without log aggregation infrastructure
- Already have their own log aggregation systems (Fluentd, Vector, OpenTelemetry, Loki, etc.)
- Deploy on single-host environments only

### Candidate logging drivers

#### 1. json-file driver

The default Docker logging driver. Writes JSON-formatted logs to local files.

```mermaid
flowchart TD
 A[All services write to stdout]
 B["Docker json-file logging driver (with rotation: max-size, max-file)"]

 subgraph Host1 [Host 1]
 C1["/var/lib/docker/containers/&lt;id&gt;/*.log"]
 D1["docker logs container-name"]
 end

 subgraph Host2 [Host 2]
 C2["/var/lib/docker/containers/&lt;id&gt;/*.log"]
 D2["docker logs container-name"]
 end

 A --> B
 B --> C1 --> D1
 B --> C2 --> D2

 E["External log aggregation (optional) Fluentd, Vector, OpenTelemetry, Loki, etc."]
 C1 -.-> E
 C2 -.-> E
```

**Configuration example:**

```yaml
x-service-defaults: &service_defaults
 logging:
 driver: "json-file"
 options:
 max-size: "50m"
 max-file: "5"
 compress: "true"
```

**Key options** ([Docker Docs: JSON File logging driver](https://docs.docker.com/engine/logging/drivers/json-file/#options)):

| Option | Default | Description |
|------------|------------------|-----------------------------------------------------|
| `max-size` | `-1` (unlimited) | Maximum size before rotation (e.g., `10m`, `1g`) |
| `max-file` | `1` | Maximum number of rotated files to keep |
| `compress` | `false` | Gzip compression for rotated files |
| `labels` | - | Comma-separated labels to include in log metadata |
| `env` | - | Comma-separated env vars to include in log metadata |

#### 2. syslog driver

Routes container logs to a syslog server (local or remote).

```mermaid
flowchart TD
 A[All services write to stdout]
 B["Docker syslog logging driver"]

 subgraph ControlNode [Control Node]
 C["rsyslog container (receives logs from all hosts)"]
 D["/var/log/&lt;component&gt;/*.log"]
 E["WebUI reads directly (real-time access)"]
 C --> D --> E
 end

 subgraph WorkerNode1 [Worker Node 1]
 F1["Container logs"]
 end

 subgraph WorkerNode2 [Worker Node 2]
 F2["Container logs"]
 end

 A --> B
 B --> F1 -->|"tcp://rsyslog:514"| C
 B --> F2 -->|"tcp://rsyslog:514"| C

 G["docker logs (via dual logging cache)"]
 B -.-> G
```

**Configuration example:**

```yaml
x-service-defaults: &service_defaults
 logging:
 driver: "syslog"
 options:
 syslog-address: "tcp://rsyslog:514"
 syslog-facility: "daemon"
 syslog-format: "rfc5424"
 tag: "{{.Name}}"
```

**Key options
** ([Docker Docs: Syslog logging driver](https://docs.docker.com/engine/logging/drivers/syslog/#options)):

| Option | Description |
|-------------------|----------------------------------------------------------------------------------------------------------|
| `syslog-address` | Address of syslog server: `udp://host:port`, `tcp://host:port`, `tcp+tls://host:port`, or `unix:///path` |
| `syslog-facility` | Syslog facility (e.g., `daemon`, `local0`-`local7`) |
| `syslog-format` | Message format: `rfc3164`, `rfc5424`, `rfc5424micro` |
| `syslog-tls-*` | TLS options for `tcp+tls` connections |
| `tag` | Custom tag; supports Go templates (e.g., `{{.Name}}`, `{{.ID}}`) |

### `docker logs` command availability

A critical consideration is whether `docker logs` remains functional with each driver.

| Driver | `docker logs` works? | Notes |
|-------------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `json-file` | Yes | Native support |
| `local` | Yes | Native support |
| `journald` | Yes | Native support |
| `syslog` | Yes (with dual logging) | Requires Docker Engine 20.10+ ([Docker Docs: Dual logging](https://docs.docker.com/engine/logging/dual-logging/#:~:text=Starting%20with%20Docker%20Engine%2020.10)) |
| `fluentd` | Yes (with dual logging) | Requires Docker Engine 20.10+ |

**Dual logging**
([Docker Docs: Dual logging](https://docs.docker.com/engine/logging/dual-logging/)): Starting with
Docker Engine 20.10, Docker **automatically** caches logs locally when using remote logging drivers
(like `syslog` or `fluentd`), enabling `docker logs` to work. No configuration is required to enable
this feature.

**Cache configuration:** The cache options below can be configured either:
- **Per-container** via `--log-opt` flags (e.g., `--log-opt cache-max-size=50m`)
- **Globally** in `/etc/docker/daemon.json` (applies to all new containers)

> **Note:** The Docker documentation does not provide explicit docker-compose examples for `cache-*`
> options. While the docs state these "can be specified per container", only `daemon.json` examples
> are shown. In docker-compose, you would use:
> ```yaml
> logging:
> driver: "syslog"
> options:
> syslog-address: "tcp://rsyslog:514"
> cache-max-size: "50m" # Unverified - not explicitly documented
> ```

| Option | Default | Description |
|------------------|---------|------------------------------|
| `cache-disabled` | `false` | Disable local caching |
| `cache-max-size` | `20m` | Max cache file size |
| `cache-max-file` | `5` | Max number of cache files |
| `cache-compress` | `true` | Compress rotated cache files |

---

### Component changes impact analysis

#### 1. Fluent Bit deployment

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|-----------------------------|----------------------------|-----------|-------------------------------------------------|
| Additional service required | Yes (Fluent Bit container) | No | Optional (rsyslog container for centralization) |
| Memory overhead | ~50MB | 0 | ~10-20MB (rsyslog) |
| CPU overhead | ~0.1 core | 0 | Minimal |

**Verdict:**

- `json-file`: Simplest, zero overhead
- `syslog`: Lightweight if rsyslog already deployed; can centralize to control node

#### 2. Fluent Bit configuration (dual output)

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|----------------------------------------------------|-------------------------|--------------------------------------------|------------------------------------|
| File output for real-time access | Yes (configurable path) | Yes (Docker-managed path) | Via rsyslog file output |
| S3/IRv2 output | Yes (CLP plugin) | No (would need separate collector to scan) | No (would need separate "shipper") |
| Log rotation | Fluent Bit managed | Docker managed | rsyslog managed |
| Organized path structure (`/var/log/<component>/`) | Yes | No (`/var/lib/docker/containers/<id>/`) | Yes (rsyslog templates) |

**Verdict:**

- `json-file`: Loses organized path structure and S3 pipeline
- `syslog`: Can achieve organized paths via rsyslog templates; still loses S3 pipeline

#### 3. Service migration

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|----------------------------------|------------------------|-----------|------------|
| Services write to stdout | Yes | Yes | Yes |
| Remove `CLP_LOGS_DIR` env vars | Yes | Yes | Yes |
| Remove `*volume_clp_logs` mounts | Yes | Yes | Yes |
| Migration complexity | Medium | Low | Low-Medium |

**Verdict:** All approaches support the same service migration pattern (stdout-based logging).

#### 4. S3 configuration

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|----------------------------|--------------------------------|-----------|--------|
| Automatic S3 upload | Yes | No | No |
| IRv2 compression on upload | Yes | No | No |
| Path structure on S3 | Yes (`ir/<component>/<date>/`) | N/A | N/A |

**Verdict:**

- `json-file` / `syslog`: Lose automatic S3 ingestion. Users must implement their own log 
 scanning / shipping if needed.

#### 5. Log-ingestor configuration

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|------------------------------------|------------------------|-----------|--------|
| Automatic `_clp` dataset ingestion | Yes | No | No |
| Scan interval configurable | Yes | N/A | N/A |
| Historical log searchability | Yes | No | No |

**Verdict:**

- `json-file` / `syslog`: No automated path to CLP archives. Historical operational logs not
 searchable in WebUI.

#### 6. WebUI enhancements

##### 6.1 `/os/cat` API Endpoint

| Aspect | CLP-managed Fluent Bit | json-file | syslog (centralized) |
|---------------------------|-------------------------|------------------------------------|-----------------------------------|
| Log file location | `/var/log/<component>/` | `/var/lib/docker/containers/<id>/` | `/var/log/<component>/` (rsyslog) |
| WebUI mount required | Yes (`/var/log`) | Docker socket or different path | Yes (`/var/log`) |
| Implementation complexity | Low | Medium-High | Low |

**Verdict:**

- `json-file`: WebUI would need to mount Docker's container directory or use Docker API
- `syslog`: Can achieve same organized structure as CLP-managed Fluent Bit via rsyslog templates

##### 6.2 "Operational Logs" Page

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|-----------------------------------|------------------------|--------------------|--------------------|
| File browser works | Yes | Needs adaptation | Yes |
| Download/export | Yes | Yes | Yes |
| Redirect to Search for historical | Yes | No (no historical) | No (no historical) |

##### 6.3 Search Page Enhancement (`?dataset=_clp`)

| Aspect | CLP-managed Fluent Bit | json-file | syslog |
|-----------------------------------|------------------------|-----------|--------|
| Historical operational log search | Yes | No | No |
| Dataset filter functional | Yes | No | No |

**Verdict:**

- `json-file` / `syslog`: Historical search feature not available.

---

### Requirements impact matrix

| Requirement | CLP-managed Fluent Bit | json-file | syslog (centralized) |
|------------------------------------------------|------------------------|---------------------------------|----------------------------|
| **R1.1**: Multi-host K8s support | Yes | No | Yes (with rsyslog) |
| **R1.2**: Central control node access | Yes | No | Yes |
| **R1.3**: No per-host access required | Yes | No | Yes |
| **R2.1 (Hot)**: Real-time access (<30s lag) | Yes | Yes | Yes |
| **R2.2 (Warm)**: Recent historical (<5min lag) | Yes | Partial | Partial |
| **R2.3 (Cold)**: CLP compressed archives | Yes | No | No |
| **R3.1**: Admin view all logs | Yes | Per-host only | Yes |
| **R3.2**: Easy export | Yes | `docker logs > file` | Yes |
| **R3.3**: Real-time + historical export | Yes | Real-time only | Real-time only |
| **R4.1**: WebUI real-time page | Yes | Needs adaptation | Yes |
| **R4.2**: Search page for archived | Yes | No | No |
| **R4.3**: Filter by service/level/time | Yes | Limited | Yes (rsyslog parsing) |
| **R5.1**: Lightweight (<50MB) | Marginal | Yes | Yes |
| **R5.2**: CLP compression for storage | Yes | No | No |
| **R5.3**: No heavyweight deps | Fluent Bit required | Yes | rsyslog (lightweight) |
| `docker logs` command | Yes (dual logging) | Yes (native) | Yes (dual logging) |
| Organized path structure | Yes | No (Docker internal paths) | Yes (rsyslog templates) |
| Compatible with external log aggregation | May conflict | Yes (users plug in their own) | Yes (standard syslog) |
| TLS encryption for log transport | Yes (if configured) | N/A (local only) | Yes (`tcp+tls://`) |

---

### Comparing approaches across deployment scenarios

| Scenario | CLP-managed Fluent Bit | json-file | syslog |
|----------------------------|--------------------------------------|----------------|-----------------------------------|
| Single-host Docker Compose | Works well | Works well | Works well |
| Multi-host Docker Compose | Centralized via Fluent Bit | Logs scattered | Centralized via rsyslog |
| Kubernetes | Centralized via Fluent Bit DaemonSet | Logs per node | Centralized via rsyslog DaemonSet |

To set up multi-host deployments with syslog:

- Deploy rsyslog as a container on the control
 node ([GitHub: puzzle/kubernetes-rsyslog-logging](https://github.com/puzzle/kubernetes-rsyslog-logging))
- Configure rsyslog to write to `/var/log/<component>/` using templates
- All containers forward to the central rsyslog via `syslog-address`

---

### Recommendation: Configurable modes

Consider offering multiple operational logging modes via configuration:

```yaml
# clp-config.yaml
operational_logging:
 mode: "simple" # Options: "simple", "centralized", "full"
```

| Mode | Logging Driver | S3 Pipeline | WebUI Integration | Use Case |
|---------------|------------------------|-------------|-------------------|-----------------------------------------|
| `simple` | json-file | No | Limited | Single-host, users have own aggregation |
| `centralized` | syslog + rsyslog | No | Real-time only | Multi-host, no CLP archive needed |
| `full` | CLP-managed Fluent Bit | Yes | Full | Multi-host, full CLP integration |

This allows users to choose based on their deployment complexity and existing infrastructure.

---

## Kubernetes considerations

In Kubernetes, there are no logging driver configurations at the container level like Docker Compose.
Instead, the container runtime (containerd, CRI-O) handles log collection differently.

### How Kubernetes logging works

1. **Container runtime writes logs to files on the node:**
 - Location: `/var/log/containers/<pod>_<namespace>_<container>-<id>.log`
 - These are symlinks to `/var/log/pods/<namespace>_<pod>_<uid>/<container>/0.log`

2. **Log format:** JSON by default (similar to Docker's json-file driver)

3. **`kubectl logs`:** Reads from these node files (always works, no driver dependency)

4. **Log rotation:** Configured via kubelet, not per-container

### Log rotation configuration (Helm)

Since CLP plans to use Helm for Kubernetes deployments, log rotation is configured in the kubelet
settings rather than per-container:

```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
containerLogMaxSize: "50Mi"
containerLogMaxFiles: 5
```

This applies to all containers on the node. Unlike Docker Compose, you cannot configure rotation
per-service.

### Default behavior (equivalent to json-file)

With no additional configuration, Kubernetes behaves like Docker's json-file driver:
- Logs stored per-node in `/var/log/containers/`
- No centralized access - must access each node separately
- `kubectl logs <pod>` works natively

### syslog equivalent

There is no direct syslog logging driver in Kubernetes. Achieving centralized logging requires a
DaemonSet-based log forwarder—which is essentially the Fluent Bit approach described below.

### CLP-managed Fluent Bit (recommended for Kubernetes)

This is the standard pattern for Kubernetes log aggregation:

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
 name: fluent-bit
 namespace: clp
spec:
 selector:
 matchLabels:
 app: fluent-bit
 template:
 metadata:
 labels:
 app: fluent-bit
 spec:
 containers:
 - name: fluent-bit
 image: fluent/fluent-bit:latest
 volumeMounts:
 - name: varlog
 mountPath: /var/log
 readOnly: true
 - name: containers
 mountPath: /var/log/containers
 readOnly: true
 volumes:
 - name: varlog
 hostPath:
 path: /var/log
 - name: containers
 hostPath:
 path: /var/log/containers
```

**Key difference from Docker Compose:**

| Aspect | Docker Compose | Kubernetes |
|------------------------|------------------------------------------------|---------------------------------------|
| Log delivery mechanism | Push (container → fluentd driver → Fluent Bit) | Pull (Fluent Bit tails files on node) |
| Configuration location | Per-service `logging:` block | DaemonSet + ConfigMap |
| Reliability | Depends on `fluentd-async` setting | File-based, survives pod restarts |

**How it works:**
1. Fluent Bit runs as a DaemonSet (one pod per node)
2. Mounts `/var/log/containers/` from the host (read-only)
3. Tails the JSON log files written by the container runtime
4. Forwards to:
 - Central Fluent Bit aggregator on control node, or
 - Directly to S3 with CLP plugin

This approach achieves the same result as Docker's fluentd logging driver, but through file tailing
rather than network push.

---

### References

- [Docker Docs: JSON File logging driver](https://docs.docker.com/engine/logging/drivers/json-file/)
- [Docker Docs: Syslog logging driver](https://docs.docker.com/engine/logging/drivers/syslog/)
- [Docker Docs: Configure logging drivers](https://docs.docker.com/engine/logging/configure/)
- [Docker Docs: Dual logging](https://docs.docker.com/engine/logging/dual-logging/)
- [Docker Docs: Local file logging driver](https://docs.docker.com/engine/logging/drivers/local/)
- [GitHub: puzzle/kubernetes-rsyslog-logging](https://github.com/puzzle/kubernetes-rsyslog-logging)
- [GitHub: JPvRiel/docker-rsyslog](https://github.com/JPvRiel/docker-rsyslog)
- [SigNoz: Docker Syslog - Configuring Logging for Containers](https://signoz.io/blog/docker-syslog/)
- [Simulmedia: Docker Centralized Logging with Rsyslog](https://www.simulmedia.com/blog/centralized-docker-logging-with-rsyslog)
- [Kubernetes Docs: Logging Architecture](https://kubernetes.io/docs/concepts/cluster-administration/logging/)
- [Kubernetes Docs: kubelet configuration (containerLogMaxSize)](https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration)
- [Fluent Bit Docs: Kubernetes](https://docs.fluentbit.io/manual/installation/kubernetes)

Option	Default	Description
`max-size`	`-1` (unlimited)	Maximum size before rotation (e.g., `10m`, `1g`)
`max-file`	`1`	Maximum number of rotated files to keep
`compress`	`false`	Gzip compression for rotated files
`labels`	-	Comma-separated labels to include in log metadata
`env`	-	Comma-separated env vars to include in log metadata

Option	Description
`syslog-address`	Address of syslog server: `udp://host:port`, `tcp://host:port`, `tcp+tls://host:port`, or `unix:///path`
`syslog-facility`	Syslog facility (e.g., `daemon`, `local0`-`local7`)
`syslog-format`	Message format: `rfc3164`, `rfc5424`, `rfc5424micro`
`syslog-tls-*`	TLS options for `tcp+tls` connections
`tag`	Custom tag; supports Go templates (e.g., `{{.Name}}`, `{{.ID}}`)

Option	Default	Description
`cache-disabled`	`false`	Disable local caching
`cache-max-size`	`20m`	Max cache file size
`cache-max-file`	`5`	Max number of cache files
`cache-compress`	`true`	Compress rotated cache files

Driver	`docker logs` works?	Notes
`json-file`	Yes	Native support
`local`	Yes	Native support
`journald`	Yes	Native support
`syslog`	Yes (with dual logging)	Requires Docker Engine 20.10+ (Docker Docs: Dual logging)
`fluentd`	Yes (with dual logging)	Requires Docker Engine 20.10+

Aspect	CLP-managed Fluent Bit	json-file	syslog
Additional service required	Yes (Fluent Bit container)	No	Optional (rsyslog container for centralization)
Memory overhead	~50MB	0	~10-20MB (rsyslog)
CPU overhead	~0.1 core	0	Minimal

Aspect	CLP-managed Fluent Bit	json-file	syslog
File output for real-time access	Yes (configurable path)	Yes (Docker-managed path)	Via rsyslog file output
S3/IRv2 output	Yes (CLP plugin)	No (would need separate collector to scan)	No (would need separate "shipper")
Log rotation	Fluent Bit managed	Docker managed	rsyslog managed
Organized path structure (`/var/log/<component>/`)	Yes	No (`/var/lib/docker/containers/<id>/`)	Yes (rsyslog templates)

Aspect	CLP-managed Fluent Bit	json-file	syslog
Services write to stdout	Yes	Yes	Yes
Remove `CLP_LOGS_DIR` env vars	Yes	Yes	Yes
Remove `*volume_clp_logs` mounts	Yes	Yes	Yes
Migration complexity	Medium	Low	Low-Medium

Aspect	CLP-managed Fluent Bit	json-file	syslog
Automatic S3 upload	Yes	No	No
IRv2 compression on upload	Yes	No	No
Path structure on S3	Yes (`ir/<component>/<date>/`)	N/A	N/A

Aspect	CLP-managed Fluent Bit	json-file	syslog
Automatic `_clp` dataset ingestion	Yes	No	No
Scan interval configurable	Yes	N/A	N/A
Historical log searchability	Yes	No	No

Aspect	CLP-managed Fluent Bit	json-file	syslog (centralized)
Log file location	`/var/log/<component>/`	`/var/lib/docker/containers/<id>/`	`/var/log/<component>/` (rsyslog)
WebUI mount required	Yes (`/var/log`)	Docker socket or different path	Yes (`/var/log`)
Implementation complexity	Low	Medium-High	Low

Aspect	CLP-managed Fluent Bit	json-file	syslog
File browser works	Yes	Needs adaptation	Yes
Download/export	Yes	Yes	Yes
Redirect to Search for historical	Yes	No (no historical)	No (no historical)

Aspect	CLP-managed Fluent Bit	json-file	syslog
Historical operational log search	Yes	No	No
Dataset filter functional	Yes	No	No

Requirement	CLP-managed Fluent Bit	json-file	syslog (centralized)
R1.1: Multi-host K8s support	Yes	No	Yes (with rsyslog)
R1.2: Central control node access	Yes	No	Yes
R1.3: No per-host access required	Yes	No	Yes
R2.1 (Hot): Real-time access (<30s lag)	Yes	Yes	Yes
R2.2 (Warm): Recent historical (<5min lag)	Yes	Partial	Partial
R2.3 (Cold): CLP compressed archives	Yes	No	No
R3.1: Admin view all logs	Yes	Per-host only	Yes
R3.2: Easy export	Yes	`docker logs > file`	Yes
R3.3: Real-time + historical export	Yes	Real-time only	Real-time only
R4.1: WebUI real-time page	Yes	Needs adaptation	Yes
R4.2: Search page for archived	Yes	No	No
R4.3: Filter by service/level/time	Yes	Limited	Yes (rsyslog parsing)
R5.1: Lightweight (<50MB)	Marginal	Yes	Yes
R5.2: CLP compression for storage	Yes	No	No
R5.3: No heavyweight deps	Fluent Bit required	Yes	rsyslog (lightweight)
`docker logs` command	Yes (dual logging)	Yes (native)	Yes (dual logging)
Organized path structure	Yes	No (Docker internal paths)	Yes (rsyslog templates)
Compatible with external log aggregation	May conflict	Yes (users plug in their own)	Yes (standard syslog)
TLS encryption for log transport	Yes (if configured)	N/A (local only)	Yes (`tcp+tls://`)

Scenario	CLP-managed Fluent Bit	json-file	syslog
Single-host Docker Compose	Works well	Works well	Works well
Multi-host Docker Compose	Centralized via Fluent Bit	Logs scattered	Centralized via rsyslog
Kubernetes	Centralized via Fluent Bit DaemonSet	Logs per node	Centralized via rsyslog DaemonSet

Mode	Logging Driver	S3 Pipeline	WebUI Integration	Use Case
`simple`	json-file	No	Limited	Single-host, users have own aggregation
`centralized`	syslog + rsyslog	No	Real-time only	Multi-host, no CLP archive needed
`full`	CLP-managed Fluent Bit	Yes	Full	Multi-host, full CLP integration

Aspect	Docker Compose	Kubernetes
Log delivery mechanism	Push (container → fluentd driver → Fluent Bit)	Pull (Fluent Bit tails files on node)
Configuration location	Per-service `logging:` block	DaemonSet + ConfigMap
Reliability	Depends on `fluentd-async` setting	File-based, survives pod restarts

Centralized operational logging architecture for multi-host CLP Package deployments #1760

Description

Centralized operational logging architecture for multi-host CLP deployments

Request

Background

Requirements

R1: Multi-host Support

R2: Tiered Access

R3: Admin Access & Export

R4: WebUI Integration

R5: Lightweight & Efficient

R6: Incremental Migration

Possible implementation

Architecture Overview

Current

After (CLP-managed Fluent Bit)

Three-Tier Data Lifecycle

Component Changes

1. Fluent Bit deployment

2. Fluent Bit Configuration for Docker Compose

3. Service migration (Incremental)

4. S3 configuration (clp-config.yaml)

5. Log-ingestor configuration

6. WebUI enhancements

6.1 New /os/cat API Endpoint

6.2 New "Operational Logs" Page

6.3 Search Page Enhancement

Migration Timeline

Phase 1: Infrastructure Setup

Phase 2: WebUI Development

Phase 3: Service Migration for Third-Party Services

Phase 4: Service Migration for First-Party Services (First Wave)

Phase 5: Service Migration for First-Party Services (Second Wave)

Phase 6: Cleanup & Optimization

Future Optimizations

Alternative approaches: Native Docker logging drivers

Candidate logging drivers

1. json-file driver

2. syslog driver

docker logs command availability

Component changes impact analysis

1. Fluent Bit deployment

2. Fluent Bit configuration (dual output)

3. Service migration

4. S3 configuration

5. Log-ingestor configuration

6. WebUI enhancements

6.1 /os/cat API Endpoint

6.2 "Operational Logs" Page

6.3 Search Page Enhancement (?dataset=_clp)

Requirements impact matrix

Comparing approaches across deployment scenarios

Recommendation: Configurable modes

Kubernetes considerations

How Kubernetes logging works

Log rotation configuration (Helm)

Default behavior (equivalent to json-file)

syslog equivalent

CLP-managed Fluent Bit (recommended for Kubernetes)

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`docker logs` command availability

6.1 `/os/cat` API Endpoint

6.3 Search Page Enhancement (`?dataset=_clp`)