From cff3d3b07a2192b2c53717549379c680eb9e2242 Mon Sep 17 00:00:00 2001 From: Patrick Duin Date: Thu, 29 Aug 2024 15:25:44 +0200 Subject: [PATCH 1/2] fix:Update README.md --- hive-event-listeners/apiary-gluesync-listener/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hive-event-listeners/apiary-gluesync-listener/README.md b/hive-event-listeners/apiary-gluesync-listener/README.md index 316cacd..60aef31 100644 --- a/hive-event-listeners/apiary-gluesync-listener/README.md +++ b/hive-event-listeners/apiary-gluesync-listener/README.md @@ -15,7 +15,7 @@ The GlueSync listener can be configured by setting the following System Environm GLUE_PREFIX|No|Prefix added to Glue databases to handle database name collisions when synchronizing multiple metastores to the Glue catalog. ## Table update SkipArchive -[AWS default](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html#Glue-UpdateTable-request-SkipArchive) is to archive the table on every update. This especially with Iceberg tables can lead to a lot of table version of which you can only have a certain limit. To counter this we override this property and set skipArchive=true so do *not* make an archive of the table when updating. +[AWS default](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html#Glue-UpdateTable-request-SkipArchive) is to archive the table on every update. With Iceberg tables this can lead to a lot of table versions. In Glue you can only have a certain limit of the version and you'll get exceptions when trying to update a table once you hit the limit. You then need to remove versions manually. To counter this we override this property and set skipArchive=true so do *not* make an archive of the table when updating. If an archive is needed, this can be done per table by setting the Hive table property: 'apiary.gluesync.skipArchive=false'. From 396a93ea7617ba66f0a6add35722adba58cd222e Mon Sep 17 00:00:00 2001 From: Patrick Duin Date: Thu, 29 Aug 2024 15:27:49 +0200 Subject: [PATCH 2/2] Update README.md --- hive-event-listeners/apiary-gluesync-listener/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hive-event-listeners/apiary-gluesync-listener/README.md b/hive-event-listeners/apiary-gluesync-listener/README.md index 60aef31..d2eb12b 100644 --- a/hive-event-listeners/apiary-gluesync-listener/README.md +++ b/hive-event-listeners/apiary-gluesync-listener/README.md @@ -15,7 +15,7 @@ The GlueSync listener can be configured by setting the following System Environm GLUE_PREFIX|No|Prefix added to Glue databases to handle database name collisions when synchronizing multiple metastores to the Glue catalog. ## Table update SkipArchive -[AWS default](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html#Glue-UpdateTable-request-SkipArchive) is to archive the table on every update. With Iceberg tables this can lead to a lot of table versions. In Glue you can only have a certain limit of the version and you'll get exceptions when trying to update a table once you hit the limit. You then need to remove versions manually. To counter this we override this property and set skipArchive=true so do *not* make an archive of the table when updating. +[AWS default](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html#Glue-UpdateTable-request-SkipArchive) is to archive the table on every update. With Iceberg tables this can lead to a lot of table versions. In Glue you can only have a certain limit of the number of versions and you'll get exceptions when trying to update a table once you hit that limit. Manual version removal through AWS api is then needed. To counter this we override this property and set skipArchive=true. So the listners does *not* make an archive of the table when updating. If an archive is needed, this can be done per table by setting the Hive table property: 'apiary.gluesync.skipArchive=false'.