Skip to content

Trying to use AWS env variables but defaults to GCS #78

@alberttwong

Description

@alberttwong

environment:
docker compose with openjdk 11, minio, xtable, spark 3.4, hive 2.3.10, hadoop 2.10.2

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:01:44.376 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
17:01:44.496 [main] INFO  com.onehouse.RuntimeModule - Spinning up 70 threads
17:01:44.657 [main] INFO  com.onehouse.metrics.MetricsServer - Starting metrics server
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:01:44.864 [metadata-extractor-1] ERROR c.o.m.TableDiscoveryService - Failed to discover tables in path: s3://warehouse/people
17:01:44.865 [metadata-extractor-1] ERROR c.o.m.TableDiscoveryService - com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
java.util.concurrent.CompletionException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
        at com.google.cloud.storage.StorageException.translate(StorageException.java:118)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:287)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:430)
        at com.google.cloud.storage.StorageImpl.lambda$listBlobs$11(StorageImpl.java:397)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
        at com.google.cloud.storage.Retrying.run(Retrying.java:54)
        at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:394)
        at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:365)
        at com.onehouse.storage.GCSAsyncStorageClient.lambda$fetchObjectsByPage$1(GCSAsyncStorageClient.java:61)
        ... 7 common frames omitted
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 401 Unauthorized
GET https://storage.googleapis.com/storage/v1/b/warehouse/o?delimiter=/&prefix=people/&projection=full
{
  "code" : 401,
  "errors" : [ {
    "domain" : "global",
    "location" : "Authorization",
    "locationType" : "header",
    "message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).",
    "reason" : "required"
  } ],
  "message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:525)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:420)
        ... 15 common frames omitted
17:01:44.866 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: []
17:01:44.867 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:01:44.867 [main] INFO  com.onehouse.metrics.MetricsServer - Shutting down metrics server
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
#        region: us-west-2
#        accessKey: admin
#        accessSecret: password

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed
export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000
export AWS_REGION=us-east-1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions