Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 20, 2025

Health Endpoint Improvement - Complete ✅

Summary

Implemented comprehensive health checking for plugin processes in the /health endpoint to detect when plugins exit (e.g., due to OOM), as requested in the issue.

Changes Made

1. Plugin Client Tracking (cmd/plugins.go)

  • Track plugin.Client instances alongside dispensed plugin interfaces
  • Store clients with keys like "processor-files", "executor-shell"

2. Agent Integration (dkron/agent.go, dkron/options.go, cmd/agent.go)

  • Pass plugin clients from cmd layer to Agent
  • Store plugin clients in Agent for health checking

3. Health Endpoint Enhancement (dkron/api.go)

  • Check if any plugin has exited using client.Exited()
  • Return HTTP 503 when unhealthy (non-200 as requested)
  • Include detailed issues array with specific plugin names
  • Include cluster leader status for server nodes

4. Testing (dkron/api_test.go)

  • Added unit test for health endpoint
  • Manually verified healthy and unhealthy states

API Behavior

Healthy: HTTP 200 OK - {"status":"healthy","leader":true}

Unhealthy: HTTP 503 Service Unavailable - {"status":"unhealthy","issues":["plugin X has exited"],"leader":true}

Recent Updates

  • ✅ Merged main branch (includes dependency updates and test fixes)
  • ✅ Build verification passed

Testing & Validation

Build: Successful compilation after merge
Manual Test: Verified both healthy and unhealthy states
Unit Test: Added TestHealthEndpoint
Code Review: Addressed all review comments
Security Scan: 0 vulnerabilities (CodeQL)

Impact

  • 6 files changed (health endpoint implementation), 88 insertions(+), 17 deletions(-)
  • No breaking changes
  • Enables proper monitoring of plugin health
  • Addresses the OOM plugin crash detection issue

Addresses Issue Requirements

✅ Health endpoint checks all loaded plugins are running
✅ Returns non-200 status code when unhealthy
✅ Provides actionable health information
✅ Includes cluster health (leader status)

Original prompt

This section details on the original issue you should resolve

<issue_title>Health endpoint improvement</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.
We had a problem with a killed plugin process (due to OOM) which resulted in non-executing jobs.

Describe the solution you'd like
The health endpoint might be used to check whether all plugin processes are up and running.
In general, more health checks would be helpful, e.g. cluster health?

The endpoint could return a non-200 status code when the instance is not healthy!

</issue_description>

<agent_instructions>Health endpoint should check that all loaded plugins are running</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@vcastellm Already on the roadmap, will work on this. @vcastellm @davidgengenbach not really the improvement you mention but I think it's better to fail fast in case of a missing plugin. In case of using as a service the OS supervisor will take care of restarting. This is the case with processor plugins.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 3 commits December 20, 2025 10:58
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Co-authored-by: vcastellm <47026+vcastellm@users.noreply.github.com>
Copilot AI changed the title [WIP] Enhance health endpoint to monitor plugin processes Add plugin health monitoring to /health endpoint Dec 20, 2025
Copilot AI requested a review from vcastellm December 20, 2025 11:07
@vcastellm
Copy link
Member

@copilot merge main

@vcastellm vcastellm marked this pull request as ready for review January 21, 2026 18:46
Copy link
Contributor Author

Copilot AI commented Jan 21, 2026

@copilot merge main

Merged main branch successfully in commit 60646bd. The merge included dependency updates (JWT and gRPC) and test fixes. Build verification passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Health endpoint improvement

2 participants