Skip to content

Commit

Permalink
apacheGH-44007: [GLib][Parquet] Add `gparquet_arrow_file_writer_new_b…
Browse files Browse the repository at this point in the history
…uffered_row_group()` (apache#44100)

### Rationale for this change

It's useful for advanced use.

### What changes are included in this PR?

Add `gparquet_arrow_file_writer_new_buffered_row_group()`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: apache#44007

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
kou authored Sep 15, 2024
1 parent 41c481f commit dafc970
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 2 deletions.
38 changes: 38 additions & 0 deletions c_glib/parquet-glib/arrow-file-writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,19 @@ gparquet_arrow_file_writer_get_schema(GParquetArrowFileWriter *writer)
* @record_batch: A record batch to be written.
* @error: (nullable): Return location for a #GError or %NULL.
*
* Write a record batch into the buffered row group.
*
* Multiple record batches can be written into the same row group
* through this function.
*
* gparquet_writer_properties_get_max_row_group_length() is respected
* and a new row group will be created if the current row group
* exceeds the limit.
*
* Record batches get flushed to the output stream once
* gparquet_file_writer_new_buffered_row_group() or
* gparquet_file_writer_close() is called.
*
* Returns: %TRUE on success, %FALSE if there was an error.
*
* Since: 18.0.0
Expand Down Expand Up @@ -564,6 +577,8 @@ gparquet_arrow_file_writer_write_table(GParquetArrowFileWriter *writer,
* @chunk_size: The max number of rows in a row group.
* @error: (nullable): Return location for a #GError or %NULL.
*
* Start a new row group.
*
* Returns: %TRUE on success, %FALSE if there was an error.
*
* Since: 18.0.0
Expand All @@ -579,12 +594,35 @@ gparquet_arrow_file_writer_new_row_group(GParquetArrowFileWriter *writer,
"[parquet][arrow][file-writer][new-row-group]");
}

/**
* gparquet_arrow_file_writer_new_buffered_row_group:
* @writer: A #GParquetArrowFileWriter.
* @error: (nullable): Return location for a #GError or %NULL.
*
* Start a new buffered row group.
*
* Returns: %TRUE on success, %FALSE if there was an error.
*
* Since: 18.0.0
*/
gboolean
gparquet_arrow_file_writer_new_buffered_row_group(GParquetArrowFileWriter *writer,
GError **error)
{
auto parquet_arrow_file_writer = gparquet_arrow_file_writer_get_raw(writer);
return garrow::check(error,
parquet_arrow_file_writer->NewBufferedRowGroup(),
"[parquet][arrow][file-writer][new-buffered-row-group]");
}

/**
* gparquet_arrow_file_writer_write_chunked_array:
* @writer: A #GParquetArrowFileWriter.
* @chunked_array: A #GArrowChunkedArray to be written.
* @error: (nullable): Return location for a #GError or %NULL.
*
* Start a chunked array as a column chunk.
*
* Returns: %TRUE on success, %FALSE if there was an error.
*
* Since: 18.0.0
Expand Down
5 changes: 5 additions & 0 deletions c_glib/parquet-glib/arrow-file-writer.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,11 @@ gparquet_arrow_file_writer_new_row_group(GParquetArrowFileWriter *writer,
gsize chunk_size,
GError **error);

GPARQUET_AVAILABLE_IN_18_0
gboolean
gparquet_arrow_file_writer_new_buffered_row_group(GParquetArrowFileWriter *writer,
GError **error);

GPARQUET_AVAILABLE_IN_18_0
gboolean
gparquet_arrow_file_writer_write_chunked_array(GParquetArrowFileWriter *writer,
Expand Down
7 changes: 5 additions & 2 deletions c_glib/test/parquet/test-arrow-file-writer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,17 @@ def test_write_record_batch

writer = Parquet::ArrowFileWriter.new(record_batch.schema, @file.path)
writer.write_record_batch(record_batch)
writer.new_buffered_row_group
writer.write_record_batch(record_batch)
writer.close

reader = Parquet::ArrowFileReader.new(@file.path)
begin
reader.use_threads = true
assert_equal([
1,
Arrow::Table.new(record_batch.schema, [record_batch]),
2,
Arrow::Table.new(record_batch.schema,
[record_batch, record_batch]),
],
[
reader.n_row_groups,
Expand Down

0 comments on commit dafc970

Please sign in to comment.