Lightweight Java library for reading and writing Apache Parquet files without Hadoop dependencies.
Fork of strategicblue/parquet-floor.
- Read and write Parquet files with a simple API
- No Hadoop dependency tree — minimal stubs included
- Write to
File,OutputStream, or anyOutputFileimplementation - Configurable compression codecs
| Primitive | Logical Type | Java Type | Notes |
|---|---|---|---|
| INT32 | — | int |
|
| INT64 | — | long |
|
| INT64 | TIMESTAMP(NANOS, UTC) | long |
Nanos since epoch, compatible with QuestDB now_ns() |
| FLOAT | — | float |
|
| DOUBLE | — | double |
|
| BOOLEAN | — | boolean |
|
| BINARY | STRING | String |
|
| BINARY | JSON | String |
|
| BINARY | ENUM | String |
|
| FIXED_LEN_BYTE_ARRAY(16) | UUID | java.util.UUID |
|
| FIXED_LEN_BYTE_ARRAY | DECIMAL | java.math.BigDecimal |
Configurable precision/scale |
| Codec | Status | Library |
|---|---|---|
| UNCOMPRESSED | Supported | Built-in |
| SNAPPY | Supported | xerial-snappy (no Hadoop) |
| ZSTD | Supported | zstd-jni (no Hadoop) |
| GZIP | Not supported | Requires hadoop-common |
| LZ4 | Not supported | Requires hadoop-common |
MessageType schema = new MessageType("ticker",
Types.required(PrimitiveTypeName.INT64).named("t"),
Types.required(PrimitiveTypeName.DOUBLE).named("cls"));
Dehydrator<Tick> dehydrator = (tick, writer) -> {
writer.write("t", tick.timestamp());
writer.write("cls", tick.close());
};
// Default codec (SNAPPY)
try (ParquetWriter<Tick> writer = ParquetWriter.writeFile(schema, file, dehydrator)) {
writer.write(tick);
}
// Explicit codec
try (ParquetWriter<Tick> writer = ParquetWriter.writeFile(schema, file, dehydrator,
CompressionCodecName.ZSTD)) {
writer.write(tick);
}
// Write to OutputStream
try (ParquetWriter<Tick> writer = ParquetWriter.writeOutputStream(schema, outputStream,
dehydrator, CompressionCodecName.ZSTD)) {
writer.write(tick);
}
// Write with custom compression level (e.g. ZSTD max)
Configuration conf = new Configuration(false);
conf.setInt("parquet.compression.codec.zstd.level", 22);
try (ParquetWriter<Tick> writer = ParquetWriter.writeOutputStream(schema, outputStream,
dehydrator, CompressionCodecName.ZSTD, conf)) {
writer.write(tick);
}MessageType schema = new MessageType("trades",
Types.required(INT64)
.as(LogicalTypeAnnotation.timestampType(true, LogicalTypeAnnotation.TimeUnit.NANOS))
.named("ts_ns"),
Types.required(FIXED_LEN_BYTE_ARRAY).length(16)
.as(LogicalTypeAnnotation.uuidType()).named("trade_id"),
Types.required(FIXED_LEN_BYTE_ARRAY).length(16)
.as(LogicalTypeAnnotation.decimalType(2, 18)).named("price"),
Types.required(BINARY)
.as(LogicalTypeAnnotation.enumType()).named("exchange"));
Dehydrator<Trade> dehydrator = (trade, writer) -> {
writer.write("ts_ns", trade.timestampNanos());
writer.write("trade_id", trade.id()); // UUID
writer.write("price", trade.price()); // BigDecimal
writer.write("exchange", trade.exchange()); // String
};Hydrator<Map<String, Object>, Map<String, Object>> hydrator = new Hydrator<>() {
public Map<String, Object> start() { return new HashMap<>(); }
public Map<String, Object> add(Map<String, Object> target, String heading, Object value) {
target.put(heading, value);
return target;
}
public Map<String, Object> finish(Map<String, Object> target) { return target; }
};
try (Stream<Map<String, Object>> rows = ParquetReader.streamContent(file,
HydratorSupplier.constantly(hydrator))) {
rows.forEach(row -> System.out.println(row));
}<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.github.qtsurfer</groupId>
<artifactId>parquet-lite</artifactId>
<version>2.1.0</version>
</dependency>repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.qtsurfer:parquet-lite:2.1.0'
}Apache License 2.0 — see LICENSE.
- Original work: Copyright Strategic Blue Ltd — strategicblue/parquet-floor
- Fork maintenance: Copyright Wualabs LTD — wualabs.com
See NOTICE for details.