The backend was experiencing:
- SSE connection failures - Streams dying after exactly 60 seconds with "context deadline exceeded"
- Connection reset errors - Intermittent "connection reset by peer" when calling Restate
- Heartbeat failures - SSE heartbeats failing due to timeout
The global timeout middleware in main.go was aggressively killing all requests after 60 seconds:
r.Use(middleware.Timeout(60 * time.Second)) // ❌ Kills SSE streams!SSE connections are meant to be long-lived, but the middleware was force-closing them, causing cascading failures.
Changed: Removed global timeout and applied it only to API routes
// ✅ NO global timeout
r.Use(middleware.Logger)
r.Use(middleware.Recoverer)
r.Use(middleware.RealIP)
// ✅ Timeout ONLY for API routes
r.Route("/api", func(r chi.Router) {
r.Use(middleware.Timeout(60 * time.Second))
// ... API routes
})
// ✅ SSE routes have NO timeout
r.Get("/stream/notifications", handlers.StreamNotifications)
r.Get("/stream/workflow/{orderID}", handlers.StreamWorkflowStatus)Added: /health/restate endpoint to monitor Restate connectivity
render_diffs(file:///home/chaschel/Documents/ibm/go/apps/zeroapp/prototype/backend/handlers/ingress.go)
Features:
- 2-second timeout for health check requests
- Distinguishes between connection errors and business logic errors
- Returns detailed status information
Usage:
curl http://localhost:8081/health/restate
# Response: {"status":"healthy","url":"http://localhost:9089","note":"Restate SDK is reachable"}Added defensive error handling to cart operations:
- ✅ Per-request 5-second timeout contexts
- ✅ Detailed logging at each step
- ✅ Timeout detection with specific error messages
- ✅ Success logging for debugging
Example - GetCart with timeout handling:
// Create timeout context for Restate call
ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
defer cancel()
basket, err := restateingress.Object[restate.Void, []models.CartItem](
h.client, "UserSession", userID, "GetBasket",
).Request(ctx, restate.Void{})
if err != nil {
if ctx.Err() == context.DeadlineExceeded {
log.Printf("Timeout fetching cart for user %s: %v", userID, err)
http.Error(w, "Request timeout - Restate may be unavailable", http.StatusGatewayTimeout)
return
}
// ... other error handling
}Enhanced: Main /health endpoint with structured response
{
"status": "healthy",
"services": {
"http": "ok",
"database": "ok",
"restate": "check /health/restate"
}
}# This should stay connected indefinitely (beyond 60s)
curl http://localhost:8081/stream/notificationsExpected: No "context deadline exceeded" errors after 60 seconds
# Add item to cart
curl -X POST http://localhost:8081/api/cart/add \
-H "Content-Type: application/json" \
-d '{"product_id": 1, "quantity": 2}'
# Get cart
curl http://localhost:8081/api/cartExpected:
- Detailed logs in backend console
- No connection reset errors
- Successful responses
# General health
curl http://localhost:8081/health
# Restate connectivity
curl http://localhost:8081/health/restate- main.go - Selective timeout middleware
- ingress.go - Health checks and error handling
-
Restart the backend with the new build:
cd /home/chaschel/Documents/ibm/go/apps/zeroapp/prototype/backend ./bin/zeroapp -
Monitor logs for the improved logging output
-
Test SSE stability by keeping a browser tab open for > 60 seconds
-
Verify cart operations no longer experience connection resets