Create a shared clientset for pod and service #60

JacksonMei · 2026-01-29T07:40:20Z

No description provided.

## Changes - Reduce QPS from 1000 to 5, Burst from 1000 to 10 - Implement lazy REST mapper to avoid expensive CRD discovery - Use shared clientset across all handlers - Optimize pod cache with async initialization - Add namespace scoping to manager ## Enhanced Logging - Added 🔧 emoji marker for rate limiting config confirmation - Added 🚀 emoji marker for lazy REST mapper creation - Added ✅ emoji marker for successful initialization - Added 🔗 emoji marker for shared clientset creation - Added 🎯 emoji marker for optimized ListWatcher usage These logs make it easy to verify the fix is deployed and active. ## Root Cause In large clusters with 300+ CRDs, aggressive QPS (1000) caused 'too many requests' errors from K8s API server, breaking 'aenv service list' and other operations. ## Verification Look for these log markers on startup: - 🔧 API Rate Limiting configured: QPS=5, Burst=10 - 🚀 Creating lazy REST mapper - 🔗 Creating shared Kubernetes clientset - 🎯 Using optimized ListWatcher Fixes: aenv service list 500 error Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

## Problem With QPS=5 and Burst=10, the shared rate limiter was too restrictive: - Pod reflector continuously retried list operations - Service list requests competed for the same QPS quota - Both operations failed with "too many requests" ## Solution Increase to QPS=20, Burst=40 - a more balanced approach that: - Allows background cache sync to proceed - Leaves headroom for user-initiated requests - Still conservative enough for large clusters ## Testing The eu126-sqa cluster has very high API server load. Previous QPS=5 was too low for even basic operations to succeed. Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

## Problem API server may apply stricter rate limits to custom UserAgent strings. The "aenv-controller" UserAgent might be treated as a batch client. ## Solution Change UserAgent from "aenv-controller" to kubectl-compatible format: "kubectl/v1.26.0 (aenv-controller) kubernetes/compatible" This makes the controller appear as a standard kubectl client while maintaining identifiability via the parenthetical annotation. ## Hypothesis K8s API server may have per-UserAgent rate limiting policies where: - Standard kubectl clients get more lenient limits - Custom clients get stricter limits to prevent abuse ## Verification Look for updated UserAgent in logs: 🔧 API Rate Limiting configured: ... UserAgent=kubectl/v1.26.0... Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

Revert UserAgent changes for analysis purposes. UserAgent change was proven to bypass APF rate limiting, but keeping original value to investigate CLI issues. Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

## Bug When API returns empty service list: {"success": true, "code": 0, "data": []} The condition 'api_response.success and api_response.data' evaluates to False because empty list [] is falsy in Python. This causes EnvironmentError with "Unknown error" message. ## Fix Change condition from: if api_response.success and api_response.data: To: if api_response.success: Now empty list is treated as valid successful response. ## Impact - aenv service list now works correctly when no services exist - Returns "No running services found" instead of "Unknown error" Fixes: CLI returning "Unknown error" for empty service list Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

- UserAgent rate limiting analysis - CLI empty list bug analysis and fix - Complete troubleshooting guides Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

gemini-code-assist · 2026-01-29T07:40:50Z

Summary of Changes

Hello @JacksonMei, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and efficiency of the aenv controller and its associated CLI. It introduces critical optimizations to how the controller interacts with the Kubernetes API, including the adoption of shared clientsets, refined API request rates, and a more efficient pod caching mechanism. Concurrently, a client-side bug that incorrectly processed empty API responses has been resolved. The changes are complemented by extensive documentation, providing a new frontend design plan and detailed analyses of the technical challenges addressed.

Highlights

Kubernetes API Client Optimization: Implemented a shared Kubernetes clientset, adjusted default QPS/Burst rates, and introduced a lazy REST mapper to significantly reduce API server load and avoid rate limiting, especially in high-load clusters.
Efficient Pod Caching: Refactored the Pod cache mechanism to use a more efficient ListWatchFromClient instead of SharedInformerFactory, reducing unnecessary resource watching and improving informer efficiency.
CLI Bug Resolution: Fixed a Python client bug that caused an "Unknown error" when the API returned an empty list of services, improving the robustness and user experience of the aenv CLI.
Comprehensive Documentation: Added detailed design and troubleshooting documents covering a new frontend design, an in-depth analysis of Kubernetes API rate limiting, and the specifics of the CLI bug fix.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant improvements to the controller's interaction with the Kubernetes API server, addressing performance and rate-limiting issues. Key changes include implementing a shared clientset for handlers, introducing a lazy REST mapper to optimize startup, and refining the pod cache mechanism. These changes should make the controller more resilient and performant, especially in large clusters. The PR also includes a crucial bug fix in the Python client for handling empty API responses and adds valuable troubleshooting documentation. The overall changes are excellent. I have a few minor suggestions for improvement.

gemini-code-assist · 2026-01-29T07:42:45Z

controller/pkg/aenvhub_http_server/aenv_pod_cache.go

+	// Start async sync watcher
+	go func() {
+		klog.Infof("Waiting for pod cache sync (namespace: %s)...", namespace)
+		if !cache.WaitForCacheSync(stopCh, informer.HasSynced) {
+			klog.Errorf("failed to wait for pod cache sync in namespace %s", namespace)
+			return
+		}
+		klog.Infof("Pod cache sync completed (namespace: %s), number of pods: %d", namespace, len(podCache.cache.ListKeys()))
+	}()


Changing the cache synchronization to be asynchronous is a great improvement for startup performance and resilience. However, this introduces a time window where the cache is not yet synced, and list operations might return incomplete data. It would be beneficial to expose the sync status, for example by adding an IsSynced() bool method to AEnvPodCache, so that callers like listPod can handle this state gracefully (e.g., by returning a '503 Service Unavailable' if the cache is not ready).

gemini-code-assist · 2026-01-29T07:42:45Z

docs/plans/2026-01-28-envhub-frontend-design.md

+  // GET /env-instance/:id/list (id can be * for all)
+  list: (envName?: string) =>
+    apiClient.get<EnvInstance[]>(`/env-instance/${envName || '*'}/list`),


The comment for the list method in instanceApi is a bit misleading. It says // GET /env-instance/:id/list, but the implementation uses envName as the path parameter, not id. To improve clarity and prevent confusion during implementation, it would be best to update the comment to reflect the use of envName.

Suggested change

// GET /env-instance/:id/list (id can be * for all)

list: (envName?: string) =>

apiClient.get<EnvInstance[]>(`/env-instance/${envName || '*'}/list`),

// GET /env-instance/:envName/list (envName can be * for all)

list: (envName?: string) =>

apiClient.get<EnvInstance[]>(`/env-instance/${envName || '*'}/list`),

gemini-code-assist · 2026-01-29T07:42:45Z

docs/plans/2026-01-28-envhub-frontend-design.md

+  // GET /env-service/:id/list (id can be * for all)
+  list: (envName?: string) =>
+    apiClient.get<EnvService[]>(`/env-service/${envName || '*'}/list`),


Similar to the instanceApi, the comment for the list method in serviceApi is inconsistent with the implementation. The comment mentions :id while the code uses :envName. Updating the comment will ensure the design document is accurate and clear for developers.

Suggested change

// GET /env-service/:id/list (id can be * for all)

list: (envName?: string) =>

apiClient.get<EnvService[]>(`/env-service/${envName || '*'}/list`),

// GET /env-service/:envName/list (envName can be * for all)

list: (envName?: string) =>

apiClient.get<EnvService[]>(`/env-service/${envName || '*'}/list`),

lanmaoxinqing · 2026-01-29T08:06:31Z

LGTM

JacksonMei and others added 7 commits January 29, 2026 12:41

fix kubeconfig issue

c714edf

revert: restore original UserAgent to 'aenv-controller'

0be8920

Revert UserAgent changes for analysis purposes. UserAgent change was proven to bypass APF rate limiting, but keeping original value to investigate CLI issues. Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

docs: add comprehensive bug analysis documentation

c22a222

- UserAgent rate limiting analysis - CLI empty list bug analysis and fix - Complete troubleshooting guides Co-Authored-By: Claude (claude-sonnet-4-5) <noreply@anthropic.com>

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

rm useless docs

26f7eec

lanmaoxinqing self-assigned this Jan 29, 2026

lanmaoxinqing self-requested a review January 29, 2026 08:06

lanmaoxinqing removed their assignment Jan 29, 2026

lanmaoxinqing approved these changes Jan 29, 2026

View reviewed changes

lanmaoxinqing merged commit ae5839d into main Jan 29, 2026
1 check passed

lanmaoxinqing deleted the fix/controller branch January 29, 2026 08:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a shared clientset for pod and service #60

Create a shared clientset for pod and service #60

Uh oh!

JacksonMei commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

lanmaoxinqing commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Create a shared clientset for pod and service #60

Create a shared clientset for pod and service #60

Uh oh!

Conversation

JacksonMei commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

lanmaoxinqing commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants