Replies: 3 comments 5 replies
-
Beta Was this translation helpful? Give feedback.
-
Thanks for you driving this disscussions. The first partIn my scenarios, it works well for most tables. For some abnormal situations, there are still some problems.
The second partIf the total size of the input deletes files is controllable, out of memory situations can be effectively avoided. |
Beta Was this translation helpful? Give feedback.
-
I have an idea 💡 For scenarios where there are a lot of eq delete files. When eq delete total record count is greater than (maybe) 1.5 times of data total record count, we write the data file primary key into StructLikeMap. When reading eq, we determine whether it exists in data. If it does not exist, we can directly Ignore it (currently all written to eq delete StructLikeMap), which can greatly reduce the overflow operation of eq delete. In this way, the size of StructLikeMap is controllable, so the memory usage is also controllable, depending on the size of the file. WDYT? |
Beta Was this translation helpful? Give feedback.
-
Iceberg's optimizing task requires indexing the delete data in memory first. If a table has too much delete data, it may cause the optimizer out of memory. Therefore, we introduced rocksdb to solve this problem of too many delete files.
This discussion focuses on two parts:
The first part is to collect everyone's feedback on the effectiveness of using rocksdb to solve the problem of large delete files in Iceberg tables and whether there are any issues encountered.
The second part is to explore other ways to optimize besides introducing rocksdb. One approach is to iterate optimization from historical snapshots, which can prevent reading too much delete data at once and causing OOM.
Beta Was this translation helpful? Give feedback.
All reactions