Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS support #36

Open
animer3009 opened this issue Jul 8, 2023 · 6 comments
Open

GCS support #36

animer3009 opened this issue Jul 8, 2023 · 6 comments

Comments

@animer3009
Copy link

Hi guys,
Are you going to add GCS support?
Any ETA?

@CrawX
Copy link

CrawX commented Jul 10, 2023

It's actually very simple to do, even for yourself if you need it right now: just add implementation "org.apache.iceberg:iceberg-gcp:${icebergVersion}" to build.gradle and build the image yourself.

@animer3009
Copy link
Author

Hi @CrawX ,
Thank for your replay.
What about environment variables?

For s3 we have:

environment:
  - AWS_ACCESS_KEY_ID=admin
  - AWS_SECRET_ACCESS_KEY=password
  - AWS_REGION=us-east-1
  - CATALOG_WAREHOUSE=s3://warehouse/
  - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
  - CATALOG_S3_ENDPOINT=http://minio:9000

P.S. What exact command I need to use to build?
Just gradle build?

@animer3009
Copy link
Author

Hi @CrawX ,
Looks like you missed my replay.
Can you help please? :)

@CrawX
Copy link

CrawX commented Jul 21, 2023

I just added the mentioned dependency in build.gradle and then rebuild the image using docker build. You can check the Dockerfile on how this project is build to do that outside of docker.

I'm using it locally with fake-gcs-server, this is the env I'm setting

- CATALOG_WAREHOUSE=gs://warehouse/
- CATALOG_IO__IMPL=org.apache.iceberg.gcp.gcs.GCSFileIO
- CATALOG_GCS_SERVICE_HOST=http://gcs:4443

If you're actually using gcs, it will probably be different (auth etc). I suggest taking a look at GCPProperties.java.

@animer3009
Copy link
Author

Hi @CrawX ,
Thank you for your help!
I did all stuff, seems it works because I am able create tables. But I have trouble with storing data/read from it.
Getting error like:

scala> spark.sql("INSERT INTO prod.db.sample VALUES (1, 'John'), (2, 'Jane')")
23/07/26 23:48:48 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2)
org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path: gs://warehouse-iceberg/prod/db/sample/data/00000-2-759b4512-1ef6-4a0a-be07-235ca0329324-00001.parquet

Here is my spark.conf:

spark.jars.packages=org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.0,org.apache.iceberg:iceberg-gcp:1.3.0
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.defaultCatalog=rest_prod
spark.sql.catalog.rest_prod=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rest_prod.type=rest
spark.sql.catalog.rest_prod.uri=http://localhost:8181

It creates metadata in GCS but seems data folders are missing.

create log of rest API:

iceberg-rest | 2023-07-26T23:59:07.700 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request
iceberg-rest | org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: prod.db.sample)
iceberg-rest | org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: prod.db.sample
iceberg-rest | at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:53)
iceberg-rest | at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833)
iceberg-rest | 2023-07-26T23:59:07.715 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request
iceberg-rest | org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: prod.db)
iceberg-rest | org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: prod.db
iceberg-rest | at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:53)
iceberg-rest | at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100)
iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833)
iceberg-rest | 2023-07-26T23:59:08.237 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table properties set at catalog level through catalog properties: {}
iceberg-rest | 2023-07-26T23:59:08.239 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table properties enforced at catalog level through catalog properties: {}
iceberg-rest | 2023-07-26T23:59:08.417 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Successfully committed to table prod.db.sample in 174 ms
iceberg-rest | 2023-07-26T23:59:08.418 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Refreshing table metadata from new version: gs://warehouse-iceberg/prod/db/sample/metadata/00000-3e40b56b-aa8c-4b36-a8fa-f0de6368f487.metadata.json

insert log of rest API:

iceberg-rest | 2023-07-26T23:59:56.970 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Refreshing table metadata from new version: gs://warehouse-iceberg/prod/db/sample/metadata/00000-3e40b56b-aa8c-4b36-a8fa-f0de6368f487.metadata.json
iceberg-rest | 2023-07-26T23:59:57.121 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table loaded by catalog: rest_backend.prod.db.sample

How can I solve this?

@nastra
Copy link
Contributor

nastra commented Jul 27, 2023

@animer3009 the NoSuchTableException, message=Table does not exist: prod.db error is not necessarily indicating that something went wrong and could be from a Catalog#tableExists() check. You'll see the same stack trace when running through the https://iceberg.apache.org/spark-quickstart/ example when creating the table.
The important part is Successfully committed to table prod.db.sample, meaning that everything looks as it should during table creation.

However, Failed to get file system for path: gs://warehouse-iceberg/prod/db/sample/data/00000-2-759b4512-1ef6-4a0a-be07-235ca0329324-00001.parquet indicates that you're most likely missing GCS-related jars on the Spark side that understand the gs scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants