-
Notifications
You must be signed in to change notification settings - Fork 659
Open
Labels
bugSomething isn't workingSomething isn't working
Description
AutoMQ Version
1.6.3
Operating System
Strimzi Operator in EKS
Installation Method
other
Hardware Configuration
No response
Other Relevant Software
Environment
- AutoMQ Version: 1.6.3-strimzi
- Cluster: 3 broker nodes spread across availability zones
What Went Wrong?
When an AutoMQ broker is in startup the broker fails to start due to S3 rate limiting (HTTP 503 "SlowDown") during the recovery phase. The broker treats this transient error as fatal and exits, rather than retrying with exponential backoff.
When a broker pod is recreated, it must recover its state from S3. This recovery process generates a burst of S3 API requests. If S3 returns a 503 "SlowDown" response (rate limiting), the broker fails to start and exits with a fatal exception after 3 failed attempts, causing Kubernetes to restart the pod.
What Should Have Happened Instead?
The broker should:
- Retry S3 operations during recovery with longer/configurable backoff intervals
- Not treat transient S3 503 errors as fatal during startup
- Eventually succeed once S3 rate limiting subsides
Steps to Reproduce
Trigger a lot of requests to S3 while the Broker is in recovery
Additional Information
kafka 2026-01-27 08:42:10,405 INFO [BrokerLifecycleManager id=1002] closed event queue. (org.apache.kafka.queue.KafkaEventQueue) [main]
kafka 2026-01-27 08:42:10,406 INFO [client-metrics-reaper[]: Shutting down (org.apache.kafka.server.util.timer.SystemTimerReaper$Reaper) [main]
kafka 2026-01-27 08:42:10,406 INFO [client-metrics-reaper[]: Stopped (org.apache.kafka.server.util.timer.SystemTimerReaper$Reaper) [client-metrics-reaper]
kafka 2026-01-27 08:42:10,406 INFO [client-metrics-reaper[]: Shutdown completed (org.apache.kafka.server.util.timer.SystemTimerReaper$Reaper) [main]
kafka 2026-01-27 08:42:10,407 INFO [SharedServer id=1002] Stopping SharedServer (kafka.server.SharedServer) [main]
kafka 2026-01-27 08:42:10,407 INFO [MetadataLoader id=1002] beginShutdown: shutting down event queue. (org.apache.kafka.queue.KafkaEventQueue) [main]
kafka 2026-01-27 08:42:10,407 INFO [SnapshotGenerator id=1002] close: shutting down event queue. (org.apache.kafka.queue.KafkaEventQueue) [kafka-1002-metadata-loader-event-handler]
kafka 2026-01-27 08:42:10,408 INFO [SnapshotGenerator id=1002] closed event queue. (org.apache.kafka.queue.KafkaEventQueue) [kafka-1002-metadata-loader-event-handler]
kafka 2026-01-27 08:42:10,408 INFO [MetadataLoader id=1002] closed event queue. (org.apache.kafka.queue.KafkaEventQueue) [main]
kafka 2026-01-27 08:42:10,409 INFO [SnapshotGenerator id=1002] closed event queue. (org.apache.kafka.queue.KafkaEventQueue) [main]
kafka 2026-01-27 08:42:10,409 INFO [raft-expiration-reaper[]: Shutting down (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper) [main]
kafka 2026-01-27 08:42:10,593 INFO [raft-expiration-reaper[]: Stopped (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper) [raft-expiration-reaper]
kafka 2026-01-27 08:42:10,594 INFO [raft-expiration-reaper[]: Shutdown completed (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper) [main]
kafka 2026-01-27 08:42:10,594 INFO [kafka-1002-raft-io-thread[]: Shutting down (org.apache.kafka.raft.KafkaRaftClientDriver) [main]
kafka 2026-01-27 08:42:10,594 INFO [RaftManager id=1002] Beginning graceful shutdown (org.apache.kafka.raft.KafkaRaftClient) [main]
kafka 2026-01-27 08:42:10,595 INFO [RaftManager id=1002] Graceful shutdown completed (org.apache.kafka.raft.KafkaRaftClient) [kafka-1002-raft-io-thread]
kafka 2026-01-27 08:42:10,595 INFO [kafka-1002-raft-io-thread[]: Stopped (org.apache.kafka.raft.KafkaRaftClientDriver) [kafka-1002-raft-io-thread]
kafka 2026-01-27 08:42:10,595 INFO [RaftManager id=1002] Completed graceful shutdown of RaftClient (org.apache.kafka.raft.KafkaRaftClientDriver) [main]
kafka 2026-01-27 08:42:10,595 INFO [kafka-1002-raft-io-thread[]: Shutdown completed (org.apache.kafka.raft.KafkaRaftClientDriver) [main]
kafka 2026-01-27 08:42:10,603 INFO [kafka-1002-raft-outbound-request-thread[]: Shutting down (org.apache.kafka.raft.KafkaNetworkChannel$SendThread) [main]
kafka 2026-01-27 08:42:10,603 INFO [kafka-1002-raft-outbound-request-thread[]: Stopped (org.apache.kafka.raft.KafkaNetworkChannel$SendThread) [kafka-1002-raft-outbound-request-thread]
kafka 2026-01-27 08:42:10,603 INFO [kafka-1002-raft-outbound-request-thread[]: Shutdown completed (org.apache.kafka.raft.KafkaNetworkChannel$SendThread) [main]
kafka 2026-01-27 08:42:10,616 INFO Metrics scheduler closed (org.apache.kafka.common.metrics.Metrics) [main]
kafka 2026-01-27 08:42:10,616 INFO Closing reporter org.apache.kafka.common.metrics.JmxReporter (org.apache.kafka.common.metrics.Metrics) [main]
kafka 2026-01-27 08:42:10,616 INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) [main]
kafka 2026-01-27 08:42:10,698 INFO App info kafka.server for 1002 unregistered (org.apache.kafka.common.utils.AppInfoParser) [main]
kafka 2026-01-27 08:42:10,698 INFO [BrokerServer id=1002] shut down completed (kafka.server.BrokerServer) [main]
kafka 2026-01-27 08:42:10,699 INFO [BrokerServer id=1002] Transition from SHUTTING_DOWN to SHUTDOWN (kafka.server.BrokerServer) [main]
kafka 2026-01-27 08:42:10,699 ERROR Exiting Kafka due to fatal exception during startup. (kafka.Kafka$) [main]
kafka java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.automq.stream.s3.exceptions.AutoMQException: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 787QJPMB56E35YRV, Extended Request ID: JHYGgsPnY/BJvbB+5xputPsenkaRfKWmObhpqi/mx45DbAE8Unl7oiuIJ9Lp8d5C1qDmR2R9SJPk33+f+Yxb7N1vXJ65Bqb6eUHVhTp/dr8=)
kafka at com.automq.stream.s3.S3Storage.startup(S3Storage.java:450)
kafka at kafka.log.stream.s3.DefaultS3Client.start(DefaultS3Client.java:183)
kafka at kafka.log.streamaspect.ClientWrapper.start(ClientWrapper.java:115)
kafka at kafka.log.streamaspect.ElasticLogManager.startup(ElasticLogManager.scala:116)
kafka at kafka.log.streamaspect.ElasticLogManager$.$anonfun$init$1(ElasticLogManager.scala:149)
kafka at kafka.log.streamaspect.ElasticLogManager$.$anonfun$init$1$adapted(ElasticLogManager.scala:149)
kafka at scala.Option.foreach(Option.scala:437)
kafka at kafka.log.streamaspect.ElasticLogManager$.init(ElasticLogManager.scala:149)
kafka at kafka.server.BrokerServer.startup(BrokerServer.scala:574)
kafka at kafka.server.KafkaRaftServer.$anonfun$startup$2(KafkaRaftServer.scala:88)
kafka at kafka.server.KafkaRaftServer.$anonfun$startup$2$adapted(KafkaRaftServer.scala:88)
kafka at scala.Option.foreach(Option.scala:437)
kafka at kafka.server.KafkaRaftServer.startup(KafkaRaftServer.scala:88)
kafka at kafka.Kafka$.main(Kafka.scala:160)
kafka at kafka.Kafka.main(Kafka.scala)
kafka Caused by: java.util.concurrent.ExecutionException: com.automq.stream.s3.exceptions.AutoMQException: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 787QJPMB56E35YRV, Extended Request ID: JHYGgsPnY/BJvbB+5xputPsenkaRfKWmObhpqi/mx45DbAE8Unl7oiuIJ9Lp8d5C1qDmR2R9SJPk33+f+Yxb7N1vXJ65Bqb6eUHVhTp/dr8=)
kafka at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
kafka at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
kafka at com.automq.stream.s3.S3Storage.recover0(S3Storage.java:492)
kafka at com.automq.stream.s3.S3Storage.recover(S3Storage.java:459)
kafka at com.automq.stream.s3.S3Storage.startup(S3Storage.java:446)
kafka ... 14 more
kafka Caused by: com.automq.stream.s3.exceptions.AutoMQException: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 787QJPMB56E35YRV, Extended Request ID: JHYGgsPnY/BJvbB+5xputPsenkaRfKWmObhpqi/mx45DbAE8Unl7oiuIJ9Lp8d5C1qDmR2R9SJPk33+f+Yxb7N1vXJ65Bqb6eUHVhTp/dr8=)
kafka at kafka.log.stream.s3.wal.ConfirmWal.lambda$reset$0(ConfirmWal.java:179)
kafka at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
kafka at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
kafka at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
kafka at java.base/java.lang.Thread.run(Thread.java:840)
kafka Caused by: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 787QJPMB56E35YRV, Extended Request ID: JHYGgsPnY/BJvbB+5xputPsenkaRfKWmObhpqi/mx45DbAE8Unl7oiuIJ9Lp8d5C1qDmR2R9SJPk33+f+Yxb7N1vXJ65Bqb6eUHVhTp/dr8=)
kafka at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
kafka at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
kafka at kafka.log.stream.s3.wal.ConfirmWal.lambda$reset$0(ConfirmWal.java:161)
kafka ... 4 more
kafka Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 787QJPMB56E35YRV, Extended Request ID: JHYGgsPnY/BJvbB+5xputPsenkaRfKWmObhpqi/mx45DbAE8Unl7oiuIJ9Lp8d5C1qDmR2R9SJPk33+f+Yxb7N1vXJ65Bqb6eUHVhTp/dr8=)
kafka at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
kafka at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
kafka at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
kafka at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
kafka at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:279)
kafka at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:92)
kafka at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
kafka at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
kafka at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
kafka at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:135)
kafka at software.amazon.awssdk.core.internal.metrics.BytesReadTrackingPublisher$BytesReadTracker.onComplete(BytesReadTrackingPublisher.java:74)
kafka at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$DataCountingPublisher$1.onComplete(ResponseHandler.java:519)
kafka at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:254)
kafka at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$600(ResponseHandler.java:77)
kafka at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:375)
kafka at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.publishMessage(HandlerPublisher.java:402)
kafka at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.flushBuffer(HandlerPublisher.java:338)
kafka at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.receivedDemand(HandlerPublisher.java:291)
kafka at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.access$200(HandlerPublisher.java:61)
kafka at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:495)
kafka at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
kafka at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
kafka at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
kafka at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
kafka at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
kafka at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
kafka ... 1 more
kafka Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: JH72WF2MEV91VHXS, Extended Request ID: 1sXiK951e9cWp0IbF1jzXwpqMjKSe53EyfuASEgQ2PXg7Ldzbj/vNd3UY9M+FjaRLSrKmT4t1DVoohNUGKjKHxwzGQXfrOv1dkMkOT09Phk=)
kafka Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: WZXDCHM4TGB3HKRE, Extended Request ID: LdXgfgIo89D4mf2+nsUgPyKjMW3jvFUXTWGSn7WohxIrPUio3Vp4aHIzBea9dGxJC4+EaWeSNwNT8vU321Ewb5zzhvBfNznohqyxXPY8ZCU=)
kafka Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: GY0EX4GXA14BJ8G6, Extended Request ID: 99xgEbYamujLTE3SXRXLz4lLZKbul5EZsf8ByGLQkruGwiHRbe+QY7D7PDrg/RO4Sa+yQGwvvq0Y1EX11Z3x0vndLv8kH5B43bBAwi2rVWc=)
kafka 2026-01-27 08:42:10,704 INFO App info kafka.server for 1002 unregistered (org.apache.kafka.common.utils.AppInfoParser) [kafka-shutdown-hook]
kafka 2026-01-27 08:42:10,708 INFO Stopped ServerConnector@62727399{SSL, (ssl, http/1.1)}{0.0.0.0:8443} (org.eclipse.jetty.server.AbstractConnector) [JettyShutdownThread]
kafka 2026-01-27 08:42:10,709 INFO Stopped ServerConnector@5d9b7a8a{HTTP/1.1, (http/1.1)}{localhost:8080} (org.eclipse.jetty.server.AbstractConnector) [JettyShutdownThread]
kafka 2026-01-27 08:42:10,710 INFO Stopped o.e.j.s.h.ContextHandler@4372b9b6{/v1/ready,null,STOPPED} (org.eclipse.jetty.server.handler.ContextHandler) [JettyShutdownThread]
kafka 2026-01-27 08:42:10,710 INFO Stopped o.e.j.s.h.ContextHandler@232a7d73{/v1/broker-state,null,STOPPED} (org.eclipse.jetty.server.handler.ContextHandler) [JettyShutdownThread]
kafka 2026-01-27 08:42:11,105 INFO S3RollingFileAppender log uploader closed successfully. (com.automq.log.S3RollingFileAppender) [kafka-shutdown-hook]
stream closed: EOF for monitoring-metrics/mimir-automq-broker-1002 (kafka)
'''
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working