Skip to content

Releases: YSU-Data-Lab/OB-Tree

OB-Tree: Accelerating Data Cleaning in Out-of-Core Column-Store Databases

28 May 05:31
Compare
Choose a tag to compare

Abstract

The column-store database, featuring a column-by-column data layout and a fast data retrieving speed, is a representative of next-generation database management systems in this big data era. Optimizing the write performance is a well-known challenge in out-of-core (or external memory) column-store databases. Data cleaning helps to cleanse redundant data and improve the overall performance of the databases. Previously proposed data cleaning methods require a long execution time and additional computing resources which are inefficient for column-store databases with large-volume data.

This work introduces an auxiliary tree index and high-speed data cleaning methods to improve the overall processing speed of columnar data. The proposed index called OB-tree comes with a rich set of operations and possesses multiple advantages in working with a wide-range of column-store databases. We introduce new data cleaning methods utilizing OB-tree to efficiently identify target records and their locations. Extensive experiments show that the proposed methods enable significant performance improvements for data cleaning on column-store databases.

Keywords

Column-Store Database; Index; B+ Tree; Write Optimization