Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PinotS3 : Connection Pool Shutdown after one hour on EKS on release-1.0.0 #11761

Closed
piby180 opened this issue Oct 9, 2023 · 22 comments
Closed

Comments

@piby180
Copy link
Contributor

piby180 commented Oct 9, 2023

We are currently facing the following exception in the controller pod on EKS. The minion job SegmentGenerationAndPush for offline tables works for one hour after controller pod start and after that it doesn't work anymore and throws the error below. Restarting the pod fixes the error again for an hour after which the error appears again.

We use IRSA roles on service account to provide S3 access to pinot pods.

The problem started happening after we upgraded the helm chart from release-0.12.1 to release-1.0.0. It seems that it is happening after upgrading awssdk to 2.20.94 in pinot release-1.0.0. It was not happening with awssdk 2.14.28 in pinot release-0.12.1.

We also tried the latest tag for pinot image but we got the same error.

It looks like it is unable to referesh the aws session token after it is expired in one hour.

Similar issue has been raised in awssdk project
aws/aws-sdk-java-v2#4386
aws/aws-sdk-java-v2#4221

This issue is currently blocking all our batch ingestion minion jobs in our offline tables forcing us to downgrade back to release-0.12.1

Unable to list files under URI: s3://bucket/path
java.io.IOException: java.lang.IllegalStateException: Connection pool shut down
        at org.apache.pinot.plugin.filesystem.S3PinotFS.visitFiles(S3PinotFS.java:556) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.plugin.filesystem.S3PinotFS.listFiles(S3PinotFS.java:490) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskGenerator.getInputFilesFromDirectory(SegmentGenerationAndPushTaskGenerator.java:343) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskGenerator.generateTasks(SegmentGenerationAndPushTaskGenerator.java:161) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.controller.helix.core.minion.PinotTaskManager.scheduleTask(PinotTaskManager.java:547) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.controller.helix.core.minion.PinotTaskManager.scheduleTask(PinotTaskManager.java:642) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.controller.helix.core.minion.CronJobScheduleJob.execute(CronJobScheduleJob.java:68) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [pinot-all-1.1.0-SNAPSHOT-jar-with-dependencies.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
Caused by: java.lang.IllegalStateException: Connection pool shut down
        at org.apache.http.util.Asserts.check(Asserts.java:34) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:37) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]     
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]     
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.services.sts.DefaultStsClient.assumeRoleWithWebIdentity(DefaultStsClient.java:757) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.services.sts.auth.StsAssumeRoleWithWebIdentityCredentialsProvider.getUpdatedCredentials(StsAssumeRoleWithWebIdentityCredentialsProvider.java:74) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.services.sts.auth.StsCredentialsProvider.updateSessionCredentials(StsCredentialsProvider.java:88) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.utils.cache.CachedSupplier.lambda$jitteredPrefetchValueSupplier$3(CachedSupplier.java:284) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.utils.cache.CachedSupplier$PrefetchStrategy.fetch(CachedSupplier.java:420) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:199) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:128) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.services.sts.auth.StsCredentialsProvider.resolveCredentials(StsCredentialsProvider.java:101) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.services.sts.internal.StsWebIdentityCredentialsProviderFactory$StsWebIdentityCredentialsProvider.resolveCredentials(StsWebIdentityCredentialsProviderFactory.java:96) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider.resolveCredentials(WebIdentityTokenFileCredentialsProvider.java:121) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:90) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.auth.credentials.internal.LazyAwsCredentialsProvider.resolveCredentials(LazyAwsCredentialsProvider.java:45) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.resolveCredentials(DefaultCredentialsProvider.java:128) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.util.MetricUtils.measureDuration(MetricUtils.java:50) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.awscore.internal.authcontext.AwsCredentialsAuthorizationStrategy.resolveCredentials(AwsCredentialsAuthorizationStrategy.java:100) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]       
        at software.amazon.awssdk.awscore.internal.authcontext.AwsCredentialsAuthorizationStrategy.addCredentialsToExecutionAttributes(AwsCredentialsAuthorizationStrategy.java:77) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:123) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.invokeInterceptorsAndCreateExecutionContext(AwsSyncClientHandler.java:69) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:78) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at software.amazon.awssdk.services.s3.DefaultS3Client.listObjectsV2(DefaultS3Client.java:6447) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        at org.apache.pinot.plugin.filesystem.S3PinotFS.visitFiles(S3PinotFS.java:548) ~[pinot-s3-1.1.0-SNAPSHOT-shaded.jar:1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2]
        ... 8 more

Our plugin versions are

{
  "pinot-protobuf": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-kafka-2.0": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-avro": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-distribution": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-csv": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-s3": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-yammer": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-segment-uploader-default": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-batch-ingestion-standalone": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-confluent-avro": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-thrift": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-orc": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-clp-log": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-pulsar": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-azure": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-gcs": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-dropwizard": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-hdfs": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-adls": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-kinesis": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-json": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-minion-builtin-tasks": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-parquet": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2",
  "pinot-segment-writer-file-based": "1.1.0-SNAPSHOT-13c2901978c7a6a7bab46afbce2b09c13a0b05e2"
}
@Jackie-Jiang
Copy link
Contributor

Thanks for reporting the issue.
@snleee @swaminathanmanish Can you help take a look?

@swaminathanmanish
Copy link
Contributor

Thanks for reporting the issue. @snleee @swaminathanmanish Can you help take a look?

Thanks @Jackie-Jiang . Will take a look.

@piby180
Copy link
Contributor Author

piby180 commented Oct 10, 2023

@swaminathanmanish The issue seems to be somewhat related to the comment here

WebIdentityTokenFileCredentialsProvider became AutoCloseable in SDK version 2.17.283 via aws/aws-sdk-java-v2#3440.

I am not a Java Developer but let me know if I can help in anyway to fast track this. Thanks!

@davizucon
Copy link
Contributor

davizucon commented Oct 10, 2023

What do you think about downgrading the AWS-SDK version? (only in the environment with the problem, not in the official release of course)

@swaminathanmanish
Copy link
Contributor

swaminathanmanish commented Oct 10, 2023

What do you think about downgrading the AWS-SDK version? (only in the environment with the problem, not in the official release of course)

Just trying to understand the scope of impact - Looks like this impacts only DefaultCredentialsProvider and not when you provide the credentials (from S3PinotFS)

try { if (StringUtils.isNotEmpty(s3Config.getAccessKey()) && StringUtils.isNotEmpty(s3Config.getSecretKey())) { AwsBasicCredentials awsBasicCredentials = AwsBasicCredentials.create(s3Config.getAccessKey(), s3Config.getSecretKey()); awsCredentialsProvider = StaticCredentialsProvider.create(awsBasicCredentials); } else { **awsCredentialsProvider = DefaultCredentialsProvider.create();** }

@davizucon
Copy link
Contributor

Thank you for your help !
Sure, in the tests I took, the credential comes from here: awsCredentialsProvider = DefaultCredentialsProvider.create(); not sure if I set AccessKey / SecretKey avoid the error.

@piby180
Copy link
Contributor Author

piby180 commented Oct 11, 2023

What do you think about downgrading the AWS-SDK version? (only in the environment with the problem, not in the official release of course)

Just trying to understand the scope of impact - Looks like this impacts only DefaultCredentialsProvider and not when you provide the credentials (from S3PinotFS)

try { if (StringUtils.isNotEmpty(s3Config.getAccessKey()) && StringUtils.isNotEmpty(s3Config.getSecretKey())) { AwsBasicCredentials awsBasicCredentials = AwsBasicCredentials.create(s3Config.getAccessKey(), s3Config.getSecretKey()); awsCredentialsProvider = StaticCredentialsProvider.create(awsBasicCredentials); } else { **awsCredentialsProvider = DefaultCredentialsProvider.create();** }

We can't use access and secret keys on AWS as it is against our IT security policy. We have to use role based access. Also does this not impact other parts of the code where aws resources are accessed besides minion tasks (like deep store)?

@swaminathanmanish
Copy link
Contributor

What do you think about downgrading the AWS-SDK version? (only in the environment with the problem, not in the official release of course)

Just trying to understand the scope of impact - Looks like this impacts only DefaultCredentialsProvider and not when you provide the credentials (from S3PinotFS)
try { if (StringUtils.isNotEmpty(s3Config.getAccessKey()) && StringUtils.isNotEmpty(s3Config.getSecretKey())) { AwsBasicCredentials awsBasicCredentials = AwsBasicCredentials.create(s3Config.getAccessKey(), s3Config.getSecretKey()); awsCredentialsProvider = StaticCredentialsProvider.create(awsBasicCredentials); } else { **awsCredentialsProvider = DefaultCredentialsProvider.create();** }

We can't use access and secret keys on AWS as it is against our IT security policy. We have to use role based access. Also does this not impact other parts of the code where aws resources are accessed besides minion tasks (like deep store)?

Tracking down when this update was made (from 2.14 to 2.20). Im not sure if downgrade is an option, since it was upgraded to address a memory leak - #10898
cc @klsince

@piby180
Copy link
Contributor Author

piby180 commented Oct 19, 2023

Any ETA on this? We are currently stuck for our offline tables due to this error.
I am not a Java Developer but to me, it looks like awsCredentialsProvider is now autoclosable on else block exit. Could the solution be raising the scope of the variable awsCredentialsProvider so that it does not get closed on else block exit?

@klsince
Copy link
Contributor

klsince commented Oct 19, 2023

not sure about the root cause.

But the awsCredentialsProvider is not closed when the else block is existed. In Java, the autoclosable would be auto closed if it's put in a try(awsCredentialsProvider = ...) {...} statement.

You said you were using role based access, then it's not the DefaultCredentialsProvider but StsAssumeRoleCredentialsProvider to be used, as in here

      if (StringUtils.isNotEmpty(s3Config.getAccessKey()) && StringUtils.isNotEmpty(s3Config.getSecretKey())) {
        AwsBasicCredentials awsBasicCredentials =
            AwsBasicCredentials.create(s3Config.getAccessKey(), s3Config.getSecretKey());
        awsCredentialsProvider = StaticCredentialsProvider.create(awsBasicCredentials);
      } else {
        awsCredentialsProvider = DefaultCredentialsProvider.create(); <----
      }
...
      // IAM Role based access
      if (s3Config.isIamRoleBasedAccess()) {
...
StsClient stsClient =
            StsClient.builder().region(Region.of(s3Config.getRegion())).
            credentialsProvider(awsCredentialsProvider).build();  <----                
awsCredentialsProvider =
            StsAssumeRoleCredentialsProvider.builder().stsClient(stsClient).refreshRequest(assumeRoleRequest)
                .asyncCredentialUpdateEnabled(s3Config.isAsyncSessionUpdateEnabled()).build();
}

However, to create StsAssumeRoleCredentialsProvider, a StsClient object is created firstly and it takes a awsCredentialsProvider of DefaultCredentialsProvider type.

The issue you found aws/aws-sdk-java-v2#4221 mentioned some potential cause like OutOfMemory errors, do you see such errors or any other suspicious exceptions in the logs?

@piby180
Copy link
Contributor Author

piby180 commented Oct 19, 2023

I didn't find any issue with memory.

I tried increasing the pod memory limits as well as modified the jvmOptions to exit on OutOfMemoryError as discussed on slack here

The error happens exactly one hour after the pod start. I suspect since the AWS session token generated by IAM role expires after one hour, Pinot is unable to refresh the AWS session token.

So anything related to memory is out of the picture here.

We are currently restarting the pods to run minion tasks. The problem goes away exactly for one hour during which our tasks run. After one hour, the problem comes back up again and we have to restart the pods again.

@q-nathangrand
Copy link

I came across this whilst googling around the same issue and just wanted to post my findings in case it helps you.

For me, the cause of the issue was that DefaultCredentialsProvider.create() doesn't create a new 'CredentialsProvider' each time you call it, but points to a single 'global' instance. If you use this 'CredentialsProvider' in any client that gets closed, or close the provider itself, then this single 'global' instance is closed. This goes unnoticed until an attempt is made to refresh a token after an hour.

I made a new 'CredentialsProvider' each time I created a client using DefaultCredentialsProvider.builder().build() to solve this.

@steveloughran
Copy link

you might want to look at org.apache.hadoop.fs.s3a.AWSCredentialProviderList as we ref count our closing; plus we don't use that default chain. We did however hit HADOOP-18945 though: IAM profile timing out in resolve() when under load. do make sure you do async credential refresh.

@klsince
Copy link
Contributor

klsince commented Nov 29, 2023

Thanks @q-nathangrand for the pointer. I made a quick fix here: #12063

@piby180 wonder if you could help take this fix for a spin in your env and see how it goes. Thank you!

@abhijeetkushe
Copy link
Contributor

Any updates on when this might be fixed.We are seeing a similar error while reading from AWS kinesis in a Realtime Table

@Jackie-Jiang
Copy link
Contributor

@abhijeetkushe Can you try out #12063 and see if it works? We are waiting for a confirmation that it works before merging it

@abhijeetkushe
Copy link
Contributor

@Jackie-Jiang Ok will let you know

@abhijeetkushe
Copy link
Contributor

@klsince
Copy link
Contributor

klsince commented Jan 4, 2024

thanks for testing it out. and good catch for the other places.

As you have env to test those out, please help open a new PR include all the fixes on DefaultCredentialsProvider, including that one-line change in my previous PR. Thank you :)

@abhijeetkushe
Copy link
Contributor

@klsince Thanks I will add that fix for all the plugins where I can find and test it out

@abhijeetkushe
Copy link
Contributor

@klsince @Jackie-Jiang I have opened this PR #12221

@klsince
Copy link
Contributor

klsince commented Jan 4, 2024

Thanks for the fix and tests @abhijeetkushe closing the issue for now.

@klsince klsince closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants