feat(helm): Add log-ingestor deployment.#1856
Conversation
WalkthroughAdds a configurable log-ingestor component to the Helm chart: chart version bump, new deployment, service, PV/PVC templates, ConfigMap entries, values, and test adjustments for ports and log directories. Changes
Sequence Diagram(s)sequenceDiagram
participant CD as Helm/Chart
participant Kube as Kubernetes API
participant CM as ConfigMap
participant PV as PersistentVolume / PVC
participant PodInit as InitContainer
participant Job as db-table-creator Job
participant PodMain as Log-Ingestor Container
participant Svc as Service (3002)
CD->>Kube: Render templates (Deployment, Service, PV/PVC, ConfigMap)
Kube->>CM: Create/Update clp-config (includes log_ingestor)
Kube->>PV: Provision PV/PVC for logs
Kube->>PodInit: Start pod initContainer
PodInit->>Job: Wait for job=db-table-creator
Job-->>PodInit: Signal ready
PodInit-->>PodMain: Start main container
PodMain->>CM: Read configuration
Svc->>PodMain: Route traffic / health probes on port 3002
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @tools/deployment/package-helm/templates/log-ingestor-deployment.yaml:
- Around line 34-78: The deployment for the "log-ingestor" container lacks
resource requests/limits; add a resources block under the container spec for the
container named "log-ingestor" that defines appropriate requests and limits (cpu
and memory) using Values or sensible defaults (e.g.,
.Values.image.clpPackage.resources or .Values.clpConfig.log_ingestor.resources)
so the manifest includes a resources: { requests: { cpu, memory }, limits: {
cpu, memory } } entry; ensure you update the template references around the
container definition (the block with image: "{{ include "clp.image.ref" . }}"
and command: [ "/opt/clp/bin/log-ingestor", ... ]) and prefer exposing them via
Helm values for easy tuning in values.yaml.
In @tools/deployment/package-helm/templates/log-ingestor-logs-pv.yaml:
- Around line 1-11: The PV created by the clp.createLocalPv call uses nodeRole
"control-plane" but the log-ingestor Deployment lacks scheduling constraints;
update the log-ingestor Deployment manifest (spec.template.spec) to include a
nodeSelector like node-role.kubernetes.io/control-plane: "" or add a
podAffinity/st nodeAffinity block using
requiredDuringSchedulingIgnoredDuringExecution to target control-plane nodes so
pods land on nodes where the PV (created via clp.createLocalPv) can bind.
In @tools/deployment/package-helm/templates/log-ingestor-logs-pvc.yaml:
- Around line 1-9: The template hardcodes the PVC capacity as "5Gi" in the
clp.createPvc invocation for the log-ingestor (see the include "clp.createPvc"
call with "component_category" "log-ingestor" and "name" "logs"); make capacity
configurable by reading a new values key (e.g.
.Values.clpConfig.log_ingestor.logs_storage_capacity) and fall back to "5Gi" if
not set, update the include to pass that variable instead of the literal, add
the new logs_storage_capacity entry (with default "5Gi") to values.yaml under
log_ingestor, and apply the same change to log-ingestor-logs-pv.yaml so both PVC
and PV use the configurable value.
In @tools/deployment/package-helm/values.yaml:
- Around line 93-105: The values.yaml example makes log_ingestor.port look
optional but the service template (log-ingestor-service.yaml) requires
.Values.clpConfig.log_ingestor.port; fix this by either (a) providing a default
port in values.yaml (uncomment and set port: 30302) and documenting it, and also
update log-ingestor-service.yaml to use a default fallback for
.Values.clpConfig.log_ingestor.port if missing, or (b) enforce validation by
adding a required check in log-ingestor-service.yaml that errors when
.Values.clpConfig.log_ingestor is set but .port is nil; reference the values key
log_ingestor and the template reference .Values.clpConfig.log_ingestor.port when
applying the change.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
tools/deployment/package-helm/Chart.yamltools/deployment/package-helm/templates/configmap.yamltools/deployment/package-helm/templates/log-ingestor-deployment.yamltools/deployment/package-helm/templates/log-ingestor-logs-pv.yamltools/deployment/package-helm/templates/log-ingestor-logs-pvc.yamltools/deployment/package-helm/templates/log-ingestor-service.yamltools/deployment/package-helm/test.shtools/deployment/package-helm/values.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-06T10:07:04.654Z
Learnt from: kirkrodrigues
Repo: y-scope/clp PR: 881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/test.sh
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/log-ingestor-logs-pv.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 10-10: too many spaces inside braces
(braces)
[error] 11-11: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/log-ingestor-service.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 5-5: too many spaces inside braces
(braces)
[error] 5-5: too many spaces inside braces
(braces)
[error] 7-7: too many spaces inside braces
(braces)
[error] 12-12: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
[error] 17-17: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/log-ingestor-logs-pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 8-8: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/log-ingestor-deployment.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 5-5: too many spaces inside braces
(braces)
[error] 5-5: too many spaces inside braces
(braces)
[error] 7-7: too many spaces inside braces
(braces)
[error] 13-13: too many spaces inside braces
(braces)
[error] 18-18: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 21-21: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 24-24: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 25-25: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 26-26: too many spaces inside braces
(braces)
[error] 32-32: too many spaces inside braces
(braces)
[error] 41-41: too many spaces inside braces
(braces)
[error] 41-41: too many spaces inside braces
(braces)
[error] 46-46: too many spaces inside braces
(braces)
[error] 46-46: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 56-56: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 72-72: too many spaces inside braces
(braces)
[error] 77-77: too many spaces inside braces
(braces)
[error] 84-84: too many spaces inside braces
(braces)
[error] 87-87: too many spaces inside braces
(braces)
[error] 87-87: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: package-image
- GitHub Check: check-generated
🔇 Additional comments (12)
tools/deployment/package-helm/Chart.yaml (1)
3-3: LGTM! Appropriate version bump for new feature.The version bump aligns with the addition of the log-ingestor component support.
tools/deployment/package-helm/templates/log-ingestor-service.yaml (1)
1-18: Named port configuration is correct.The log-ingestor deployment correctly defines a containerPort with
name: "log-ingestor"andcontainerPort: 3002, which matches the service's targetPort reference.tools/deployment/package-helm/templates/configmap.yaml (1)
118-126: LGTM! Configuration follows established patterns.The log_ingestor configuration block is correctly gated by the conditional and follows the same pattern as other components (api_server, webui) with appropriate hardcoded host/port values and templated configuration parameters.
tools/deployment/package-helm/test.sh (2)
62-62: LGTM! Necessary log directory for the new component.The addition of
log_ingestorto the log directories correctly supports the new log-ingestor deployment.
88-102: LGTM! Port mappings reorganized to support log-ingestor.The port mapping changes correctly expose port 30302 for the new log-ingestor service while maintaining all previously exposed ports.
tools/deployment/package-helm/templates/log-ingestor-deployment.yaml (7)
27-32: LGTM! Appropriate dependency wait.The initContainer correctly ensures database tables are created before the log-ingestor starts.
37-51: LGTM! Environment variables properly configured.Database credentials are correctly sourced from Secrets, and logging configuration appropriately uses the RUST_LOG environment variable.
52-64: LGTM! Ports and volume mounts correctly configured.The port configuration matches the ConfigMap settings, and volume mounts follow best practices with readOnly for config.
65-70: LGTM! Command configuration is explicit and correct.The command arguments are properly configured with explicit host (0.0.0.0) and port (3002) values that match the container port and ConfigMap settings.
71-78: LGTM! Health probes properly configured.Readiness and liveness probes are correctly configured using the /health endpoint with consistent timing parameters via helper templates.
79-88: LGTM! Volumes properly configured.The volumes are correctly defined using helper templates, with appropriate PVC for log persistence and ConfigMap for configuration.
21-21: No action required. Thejob-watcherserviceAccount is appropriately shared across the entire CLP system (including log-ingestor, api-server, reducer, query-scheduler, and others) with minimal, read-only permissions limited to watching jobs and pods. This is the intended architectural pattern rather than a configuration error.
| - name: "log-ingestor" | ||
| image: "{{ include "clp.image.ref" . }}" | ||
| imagePullPolicy: "{{ .Values.image.clpPackage.pullPolicy }}" | ||
| env: | ||
| - name: "CLP_DB_PASS" | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: {{ include "clp.fullname" . }}-database | ||
| key: "password" | ||
| - name: "CLP_DB_USER" | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: {{ include "clp.fullname" . }}-database | ||
| key: "username" | ||
| - name: "CLP_LOGS_DIR" | ||
| value: "/var/log/log_ingestor" | ||
| - name: "RUST_LOG" | ||
| value: {{ .Values.clpConfig.log_ingestor.logging_level | quote }} | ||
| ports: | ||
| - name: "log-ingestor" | ||
| containerPort: 3002 | ||
| volumeMounts: | ||
| - name: {{ include "clp.volumeName" (dict | ||
| "component_category" "log-ingestor" | ||
| "name" "logs" | ||
| ) | quote }} | ||
| mountPath: "/var/log/log_ingestor" | ||
| - name: "config" | ||
| mountPath: "/etc/clp-config.yaml" | ||
| subPath: "clp-config.yaml" | ||
| readOnly: true | ||
| command: [ | ||
| "/opt/clp/bin/log-ingestor", | ||
| "--config", "/etc/clp-config.yaml", | ||
| "--host", "0.0.0.0", | ||
| "--port", "3002" | ||
| ] | ||
| readinessProbe: | ||
| {{- include "clp.readinessProbeTimings" . | nindent 12 }} | ||
| httpGet: &log-ingestor-health-check | ||
| path: "/health" | ||
| port: "log-ingestor" | ||
| livenessProbe: | ||
| {{- include "clp.livenessProbeTimings" . | nindent 12 }} | ||
| httpGet: *log-ingestor-health-check |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding resource requests and limits.
The container specification doesn't define resource requests or limits. While this works for initial deployment, consider adding them to prevent resource contention and ensure predictable performance in production environments.
📊 Example resource configuration
Add a resources section to the container specification:
livenessProbe:
{{- include "clp.livenessProbeTimings" . | nindent 12 }}
httpGet: *log-ingestor-health-check
+ resources:
+ requests:
+ memory: "256Mi"
+ cpu: "100m"
+ limits:
+ memory: "512Mi"
+ cpu: "500m"
volumes:Adjust values based on actual resource requirements and testing.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 41-41: too many spaces inside braces
(braces)
[error] 41-41: too many spaces inside braces
(braces)
[error] 46-46: too many spaces inside braces
(braces)
[error] 46-46: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 51-51: too many spaces inside braces
(braces)
[error] 56-56: too many spaces inside braces
(braces)
[error] 59-59: too many spaces inside braces
(braces)
[error] 72-72: too many spaces inside braces
(braces)
[error] 77-77: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/log-ingestor-deployment.yaml around
lines 34 - 78, The deployment for the "log-ingestor" container lacks resource
requests/limits; add a resources block under the container spec for the
container named "log-ingestor" that defines appropriate requests and limits (cpu
and memory) using Values or sensible defaults (e.g.,
.Values.image.clpPackage.resources or .Values.clpConfig.log_ingestor.resources)
so the manifest includes a resources: { requests: { cpu, memory }, limits: {
cpu, memory } } entry; ensure you update the template references around the
container definition (the block with image: "{{ include "clp.image.ref" . }}"
and command: [ "/opt/clp/bin/log-ingestor", ... ]) and prefer exposing them via
Helm values for easy tuning in values.yaml.
| {{- if .Values.clpConfig.log_ingestor }} | ||
| {{- include "clp.createLocalPv" (dict | ||
| "root" . | ||
| "component_category" "log-ingestor" | ||
| "name" "logs" | ||
| "nodeRole" "control-plane" | ||
| "capacity" "5Gi" | ||
| "accessModes" (list "ReadWriteOnce") | ||
| "hostPath" (printf "%s/log_ingestor" .Values.clpConfig.logs_directory) | ||
| ) }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check log-ingestor deployment for node affinity configuration
rg -A 10 'nodeSelector|affinity' tools/deployment/package-helm/templates/log-ingestor-deployment.yamlRepository: y-scope/clp
Length of output: 37
🏁 Script executed:
#!/bin/bash
# Check if log-ingestor deployment file exists and explore its structure
fd "log-ingestor.*deployment" tools/deployment/package-helm/templates/
echo "---"
# If found, check the content
if [ -f "tools/deployment/package-helm/templates/log-ingestor-deployment.yaml" ]; then
cat -n tools/deployment/package-helm/templates/log-ingestor-deployment.yaml
else
echo "Checking templates directory..."
ls -la tools/deployment/package-helm/templates/ | grep log-ingestor
fiRepository: y-scope/clp
Length of output: 3847
Add nodeSelector or affinity to log-ingestor deployment to match PV nodeRole constraint.
The PV restricts scheduling to control-plane nodes (nodeRole: "control-plane"), but the log-ingestor deployment has no nodeSelector or affinity constraints. This mismatch means the deployment can be scheduled on any node, and if it lands on a worker node, the PVC will fail to bind since the PV is unavailable there.
Add a nodeSelector to the deployment spec to ensure the pod runs on control-plane nodes:
spec:
template:
spec:
nodeSelector:
node-role.kubernetes.io/control-plane: ""Alternatively, use affinity with requiredDuringSchedulingIgnoredDuringExecution for stronger enforcement if needed.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 10-10: too many spaces inside braces
(braces)
[error] 11-11: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/log-ingestor-logs-pv.yaml around
lines 1 - 11, The PV created by the clp.createLocalPv call uses nodeRole
"control-plane" but the log-ingestor Deployment lacks scheduling constraints;
update the log-ingestor Deployment manifest (spec.template.spec) to include a
nodeSelector like node-role.kubernetes.io/control-plane: "" or add a
podAffinity/st nodeAffinity block using
requiredDuringSchedulingIgnoredDuringExecution to target control-plane nodes so
pods land on nodes where the PV (created via clp.createLocalPv) can bind.
| {{- if .Values.clpConfig.log_ingestor }} | ||
| {{- include "clp.createPvc" (dict | ||
| "root" . | ||
| "component_category" "log-ingestor" | ||
| "name" "logs" | ||
| "capacity" "5Gi" | ||
| "accessModes" (list "ReadWriteOnce") | ||
| ) }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider making log storage capacity configurable.
The log storage capacity is hardcoded at 5Gi (line 6). For production deployments with varying log volumes, consider making this configurable through values.yaml to allow users to adjust storage capacity based on their needs.
♻️ Suggested enhancement
Add to values.yaml under the log_ingestor block:
log_ingestor: null
# port: 30302
# logs_storage_capacity: "5Gi" # Storage capacity for log-ingestor logs
# ...Then update the template:
{{- include "clp.createPvc" (dict
"root" .
"component_category" "log-ingestor"
"name" "logs"
- "capacity" "5Gi"
+ "capacity" (default "5Gi" .Values.clpConfig.log_ingestor.logs_storage_capacity)
"accessModes" (list "ReadWriteOnce")
) }}Apply the same change to log-ingestor-logs-pv.yaml.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if .Values.clpConfig.log_ingestor }} | |
| {{- include "clp.createPvc" (dict | |
| "root" . | |
| "component_category" "log-ingestor" | |
| "name" "logs" | |
| "capacity" "5Gi" | |
| "accessModes" (list "ReadWriteOnce") | |
| ) }} | |
| {{- end }} | |
| {{- if .Values.clpConfig.log_ingestor }} | |
| {{- include "clp.createPvc" (dict | |
| "root" . | |
| "component_category" "log-ingestor" | |
| "name" "logs" | |
| "capacity" (default "5Gi" .Values.clpConfig.log_ingestor.logs_storage_capacity) | |
| "accessModes" (list "ReadWriteOnce") | |
| ) }} | |
| {{- end }} |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 8-8: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/log-ingestor-logs-pvc.yaml around
lines 1 - 9, The template hardcodes the PVC capacity as "5Gi" in the
clp.createPvc invocation for the log-ingestor (see the include "clp.createPvc"
call with "component_category" "log-ingestor" and "name" "logs"); make capacity
configurable by reading a new values key (e.g.
.Values.clpConfig.log_ingestor.logs_storage_capacity) and fall back to "5Gi" if
not set, update the include to pass that variable instead of the literal, add
the new logs_storage_capacity entry (with default "5Gi") to values.yaml under
log_ingestor, and apply the same change to log-ingestor-logs-pv.yaml so both PVC
and PV use the configurable value.
| # log-ingestor config. Currently, the config is applicable only if `logs_input.type` is "s3". | ||
| log_ingestor: null | ||
| # port: 30302 | ||
| # # The timeout (in seconds) after which the log buffer is flushed for compression if no new | ||
| # # input arrives. | ||
| # buffer_flush_timeout: 300 | ||
| # # The log buffer size (in bytes) that triggers a flush for compression. | ||
| # buffer_flush_threshold: 268435456 # 256 MiB | ||
| # # The capacity of the internal channel used for communication between an ingestion job and the | ||
| # # log buffer. | ||
| # channel_capacity: 10 | ||
| # logging_level: "INFO" | ||
|
|
There was a problem hiding this comment.
Clarify port requirement and provide validation.
The service template (log-ingestor-service.yaml:17) requires .Values.clpConfig.log_ingestor.port to be set, but the commented example in values.yaml suggests it's optional. When users enable log_ingestor without specifying the port field, the deployment will fail.
Consider one of the following approaches:
- Make the port field mandatory in documentation and add a validation in the service template
- Provide a default port value if not specified
- Update the example to show port as a required field (uncomment it in the example)
🛠️ Suggested validation in service template
Add to the beginning of log-ingestor-service.yaml:
{{- if .Values.clpConfig.log_ingestor }}
+{{- if not .Values.clpConfig.log_ingestor.port }}
+{{- fail "clpConfig.log_ingestor.port is required when log_ingestor is enabled" }}
+{{- end }}
apiVersion: "v1"Or provide a default value in the service template:
- nodePort: {{ .Values.clpConfig.log_ingestor.port }}
+ nodePort: {{ .Values.clpConfig.log_ingestor.port | default 30302 }}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/values.yaml around lines 93 - 105, The
values.yaml example makes log_ingestor.port look optional but the service
template (log-ingestor-service.yaml) requires
.Values.clpConfig.log_ingestor.port; fix this by either (a) providing a default
port in values.yaml (uncomment and set port: 30302) and documenting it, and also
update log-ingestor-service.yaml to use a default fallback for
.Values.clpConfig.log_ingestor.port if missing, or (b) enforce validation by
adding a required check in log-ingestor-service.yaml that errors when
.Values.clpConfig.log_ingestor is set but .port is nil; reference the values key
log_ingestor and the template reference .Values.clpConfig.log_ingestor.port when
applying the change.
LinZhihao-723
left a comment
There was a problem hiding this comment.
In general I don't have any questions on changes. Will run validations on my end with necessary set up before approval.
There was a problem hiding this comment.
Cross-compared with api-server-deployment.yaml. The changes lgtm as:
- log-ingestor only waits for DB creation; no need to wait for result cache to start.
- log-ingestor doesn't need storage mount at this stage.
- log-ingestor doesn't need stream storage.
There was a problem hiding this comment.
Cross-compared with api-server-logs-pv.yaml. The changes lgtm.
My only concern is the 5Gi capacity might be too small. Current log ingestor generates a log even whenever a new object is ingested, with the full S3 key and metadata tracked. Assuming each log event is 150-byte long on average, this log capacity would only allow 35k files ingested. What's the behavior if the log size exceeds the capacity?
There was a problem hiding this comment.
What's the behavior if the log size exceeds the capacity?
if i'm not wrong, logs are dropped silently.
Let's raise it to 10Gi to match the compression scheduler log volume. In the long run, we should implement #1760 instead of requesting host volumes specifically for logs.
There was a problem hiding this comment.
Cross-compared with api-server-logs-pvc.yaml and the changes lgtm.
There was a problem hiding this comment.
Cross-compared with api-server-srervice.yaml and the changes lgtm.
There was a problem hiding this comment.
Cross-compared with the current package config and this lgtm.
LinZhihao-723
left a comment
There was a problem hiding this comment.
For the PR title, how about:
feat(helm): Add `log-ingestor` deployment.
This should match how we refer to the log ingestor service in previous PR titles.
log-ingestor deployment.
| # logging_level: "INFO" | ||
|
|
||
| # log-ingestor config. Currently, the config is applicable only if `logs_input.type` is "s3". | ||
| log_ingestor: null |
There was a problem hiding this comment.
We should explicitly set log_ingestor: null when it is so in values.yaml; otherwise, the Pydantic model parser would fallback to default values which leaves the log-ingestor configured. When clpConfig.package.storage is configured as "clp" (CLP-Text which doesn't support the log-ingestor), the config is deemed invalid and most Helm chart deployments won't start.
@coderabbitai create an issue to track
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the log_ingestor configuration problem
gh issue create \
--title "Fix log_ingestor configuration fallback to defaults when set to null in Helm values.yaml" \
--body "## Problem
When \`log_ingestor: null\` is set in \`values.yaml\`, the Pydantic model parser falls back to default values, which leaves the log-ingestor configured even though it should be disabled.
## Impact
When \`clpConfig.package.storage\` is configured as \`\"clp\"\` (CLP-Text, which doesn't support the log-ingestor), the configuration is deemed invalid and most Helm chart deployments won't start.
## Solution
The configuration parser should explicitly handle \`null\` values for \`log_ingestor\` to prevent fallback to default values, ensuring the feature remains disabled when set to null.
## References
- PR: https://github.com/y-scope/clp/pull/1856
- Comment: https://github.com/y-scope/clp/pull/1856#discussion_r2703116768
- Reported by: `@junhaoliao`
## Files Affected
- \`tools/deployment/package-helm/values.yaml\`
- Configuration parser/validator (likely Python Pydantic models)" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1881 to track the log_ingestor configuration problem: #1881
The issue describes how the Pydantic parser falls back to default values when log_ingestor: null, causing deployment failures with CLP-Text storage engine. It's been assigned to you for tracking.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR `#1152` discussion.
There was a problem hiding this comment.
#1882 sets log_ingestor: null in the clp_config.yaml ConfigMap to address #1881. However, the Rust utils cannot handle null values for this key and will fail with a parsing error.
As a result, if the API server is enabled while log_ingestor is disabled by setting log_ingestor: null, the API server will fail to start.
More importantly, this configuration ("enable the API server but disable log_ingestor by setting log_ingestor: null") is currently the default behavior in our Helm values.yaml. This means the v0.8.0 chart deploys an API server that continuously crashes and restarts.
I think we need to address two issues:
- Update the Rust utils to correctly handle a nullable log_ingestor.
- Enable log_ingestor by default in Helm values.yaml, so it aligns with the default clp_config.yaml (and clp_config.template.json.yaml).
@coderabbitai create those new two issues as GH issues
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create first issue: Update Rust utils to handle nullable log_ingestor
gh issue create \
--title "Update Rust utils to handle nullable log_ingestor configuration" \
--body "## Problem
The Rust utils cannot handle null values for the \`log_ingestor\` key in \`clp_config.yaml\` and fail with a parsing error.
## Impact
When the API server is enabled while \`log_ingestor\` is disabled by setting \`log_ingestor: null\`, the API server fails to start. This is particularly problematic because the current default configuration in Helm \`values.yaml\` has \`log_ingestor: null\`, causing the v0.8.0 chart to deploy an API server that continuously crashes and restarts.
## Solution
Update the Rust configuration parser to correctly handle nullable \`log_ingestor\` values, treating \`null\` as an indication that the log-ingestor feature should be disabled.
## References
- PR: https://github.com/y-scope/clp/pull/1856
- PR: https://github.com/y-scope/clp/pull/1882
- Related issue: `#1881`
- Comment: https://github.com/y-scope/clp/pull/1856#discussion_r2703116768
- Reported by: `@junhaoliao`
## Files Affected
- Rust configuration parser (likely in components utilizing clp_config.yaml)" \
--assignee junhaoliao
# Create second issue: Enable log_ingestor by default in Helm values.yaml
gh issue create \
--title "Enable log_ingestor by default in Helm values.yaml" \
--body "## Problem
The current default configuration in Helm \`values.yaml\` has \`log_ingestor: null\`, which disables the log-ingestor. This is inconsistent with the default \`clp_config.yaml\` and \`clp_config.template.json.yaml\` configurations, and causes the API server to crash and restart continuously in v0.8.0 due to parsing issues.
## Impact
The v0.8.0 Helm chart deploys an API server with the default configuration that continuously crashes and restarts, resulting in a broken deployment out of the box.
## Solution
Enable \`log_ingestor\` by default in Helm \`values.yaml\` with appropriate default values to align with the default \`clp_config.yaml\` and \`clp_config.template.json.yaml\` configurations.
## References
- PR: https://github.com/y-scope/clp/pull/1856
- PR: https://github.com/y-scope/clp/pull/1882
- Related issue: `#1881`
- Comment: https://github.com/y-scope/clp/pull/1856#discussion_r2703116768
- Reported by: `@junhaoliao`
## Files Affected
- \`tools/deployment/package-helm/values.yaml\`" \
--assignee junhaoliao
echo "Both issues created successfully"
Length of output: 118
@junhaoliao, I've created the two GitHub issues to track these problems:
-
Issue
#1911: Update Rust utils to handle nullable log_ingestor configuration
Update Rust utils to handle nullable log_ingestor configuration #1911 -
Issue
#1912: Enable log_ingestor by default in Helm values.yaml
Enable log_ingestor by default in Helm values.yaml #1912
Both issues have been assigned to you and include references to PR #1856, PR #1882, and issue #1881.
🐰✍️
✏️ Learnings added
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1856
File: tools/deployment/package-helm/values.yaml:94-94
Timestamp: 2026-01-26T22:19:03.543Z
Learning: In the Helm chart's `values.yaml`, the default configuration has `log_ingestor: null`, which causes the v0.8.0 chart to deploy an API server that continuously crashes and restarts due to Rust parsing issues.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR `#1152` discussion.
Description
Note
This PR is part of the ongoing work for #1309. More PRs will be submitted until the Helm chart is complete and fully functional.
Add Helm chart support for deploying the log-ingestor component. The log-ingestor enables ingestion of logs from S3 buckets via periodic scanning or SQS queue notifications.
This PR adds:
The log-ingestor is conditionally deployed only when
clpConfig.log_ingestoris configured in values.yaml.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.