Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 39 additions & 7 deletions lib/queue_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,21 @@ package lib
import (
"context"
"errors"
lru "github.com/hashicorp/golang-lru"
"github.com/hashicorp/memberlist"
"github.com/sirupsen/logrus"
"net/http"
"sort"
"strconv"
"strings"
"sync"
"time"

lru "github.com/hashicorp/golang-lru"
"github.com/hashicorp/memberlist"
"github.com/sirupsen/logrus"
)

const (
nirnRetryCountHeader = "X-Nirn-Retry-Count"
maxRetries = 2
)

type QueueType int64
Expand Down Expand Up @@ -169,14 +175,26 @@ func (m *QueueManager) routeRequest(addr string, req *http.Request) (*http.Respo
nodeReq, err := http.NewRequestWithContext(req.Context(), req.Method, "http://"+addr+req.URL.Path+"?"+req.URL.RawQuery, req.Body)
nodeReq.Header = req.Header.Clone()
nodeReq.Header.Set("nirn-routed-to", addr)

// Increment retry count header when routing
retryCount := 0
if s := req.Header.Get(nirnRetryCountHeader); s != "" {
if v, err := strconv.Atoi(s); err == nil {
retryCount = v
}
}
retryCount++
nodeReq.Header.Set(nirnRetryCountHeader, strconv.Itoa(retryCount))
Comment on lines +179 to +187
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new retry count logic introduced in this PR (tracking requests via the X-Nirn-Retry-Count header and enforcing maxRetries) lacks test coverage. Consider adding tests that verify: 1) the header is properly incremented when routing requests, 2) requests are rejected when exceeding maxRetries, and 3) the early rejection at the DiscordRequestHandler entry point works correctly.

Copilot uses AI. Check for mistakes.

if err != nil {
return nil, err
}
Comment on lines 189 to 191
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error check on line 189 happens after the error variable has already been used to set the retry count header and create the request on lines 175-187. If the error is non-nil (from NewRequestWithContext), the code continues to set headers on a potentially nil nodeReq before checking the error. This should be checked immediately after line 175 before using nodeReq.

Copilot uses AI. Check for mistakes.

logger.WithFields(logrus.Fields{
"to": addr,
"path": req.URL.Path,
"method": req.Method,
"to": addr,
"path": req.URL.Path,
"method": req.Method,
"retryCount": retryCount,
}).Trace("Routing request to node in cluster")
resp, err := client.Do(nodeReq)
logger.WithFields(logrus.Fields{
Expand Down Expand Up @@ -261,11 +279,25 @@ func (m *QueueManager) getOrCreateBearerQueue(token string) (*RequestQueue, erro
}

func (m *QueueManager) DiscordRequestHandler(resp http.ResponseWriter, req *http.Request) {
// Check retry count early and fail fast to prevent infinite loops
if s := req.Header.Get(nirnRetryCountHeader); s != "" {
if v, err := strconv.Atoi(s); err == nil && v > maxRetries {
logger.WithFields(logrus.Fields{
"retryCount": v,
"maxRetries": maxRetries,
"path": req.URL.Path,
"method": req.Method,
}).Warn("Request exceeded max retry count, rejecting")
Generate429(&resp)
return
}
}

reqStart := time.Now()
metricsPath := GetMetricsPath(req.URL.Path)

token := req.Header.Get("Authorization")
clientId := GetBotId(token)
clientId := GetBotId(token)

ConnectionsOpen.With(map[string]string{"route": metricsPath, "method": req.Method, "clientId": clientId}).Inc()
defer ConnectionsOpen.With(map[string]string{"route": metricsPath, "method": req.Method, "clientId": clientId}).Dec()
Comment on lines 302 to 303
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the clientId label to ConnectionsOpen introduces a new dimension that could lead to high cardinality in Prometheus metrics, especially in multi-tenant environments with many unique bot clients. Each unique clientId creates a new time series. While this provides granular visibility, consider whether the operational benefit outweighs the potential metrics storage and query performance impact. If high cardinality becomes an issue, consider aggregating by clientId in application logs rather than metrics, or using exemplars instead.

Copilot uses AI. Check for mistakes.
Expand Down
Loading