Fix and test arcticdb reading streaming data #2218

IvoDD · 2025-03-05T14:50:00Z

Reference Issues/PRs

Fixes monday ref: 7855342201

What does this implement or fix?

Fixes:

Column filter in static schema
Column ordering when introducing a new column with an incomplete segment

Tests:

Columns filter in static and dynamic schema
Reading diffrent schema incompletes
Compatibility test for reading incompletes from an old env

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

poodlewars · 2025-03-06T16:52:15Z

python/tests/compat/arcticdb/test_compatibility.py

+        read_df = curr.lib._nvs.read(sym, date_range=(None, None), incomplete=True).data
+        assert_frame_equal(read_df, df)
+
+        read_df = curr.lib._nvs.read(sym, date_range=(None, None), incomplete=True, columns=["float_col"]).data


Would be good to check that filtering down to several columns (rather than just one) works, and that the date_range filtering works

Added more assertions to the test

poodlewars · 2025-03-06T16:52:33Z

cpp/arcticdb/version/version_core.cpp

@@ -1116,9 +1124,9 @@ bool read_incompletes_to_pipeline(
        pipeline_context->staged_descriptor_ =
            merge_descriptors(seg.descriptor(), incomplete_segments, read_query.columns);
        if (pipeline_context->desc_) {
-            const std::array fields_ptr = {pipeline_context->desc_->fields_ptr()};
+            const std::array staged_fields_ptr = {pipeline_context->staged_descriptor_->fields_ptr()};


Could you explain why we need this change?

So that if call merge_descriptors(descriptor_from_index_key, new_columns_from_staged_data) instead of the other way round.

This way if we have staged data which is missing columns it won't completely reorder the columns. Previously if e.g. we had index with columns col_1, col_2, col_3 and had an incomplete with col_2, col_3 reading them both would result in col_2, col_3, col_1, which seems quite odd. The dynamic schema test tests this

poodlewars · 2025-03-06T16:58:33Z

python/tests/unit/arcticdb/version_store/test_incompletes.py

@@ -80,3 +81,47 @@ def test_read_incompletes_no_chunking(lmdb_version_store_tiny_segment):

    ref_keys = lib_tool.find_keys_for_symbol(KeyType.APPEND_REF, sym)
    assert len(ref_keys) == 1
+
+@pytest.mark.parametrize("dynamic_schema", [True, False])
+def test_read_incompletes_columns_filter(version_store_factory, dynamic_schema):


A few ideas for extra tests (some may already be covered elsewhere):

Tests where we append incompletes after a compaction (like you do in the test below)

Tests with a chain of more than two incomplete segments (eg I could imagine messing up the linked list traversal)

Filtering down to more than one column

Filters on date_range

Added all of the suggestions with new assertions

Fixes: - Column filter in static schema - Column ordering when introducing a new column with an incomplete segment Tests: - Columns filter in static and dynamic schema - Reading diffrent schema incompletes - Compatibility test for reading incompletes from an old env

IvoDD added the patch Small change, should increase patch version label Mar 5, 2025

IvoDD marked this pull request as ready for review March 5, 2025 14:50

IvoDD requested review from alexowens90, willdealtry and poodlewars as code owners March 5, 2025 14:50

IvoDD force-pushed the fix-adb-reading-incomplete branch from 8614e5a to 07d0ab6 Compare March 6, 2025 08:21

poodlewars reviewed Mar 6, 2025

View reviewed changes

IvoDD force-pushed the fix-adb-reading-incomplete branch 2 times, most recently from 7da390e to 61e304f Compare March 7, 2025 13:13

phoebusm approved these changes Mar 7, 2025

View reviewed changes

IvoDD force-pushed the fix-adb-reading-incomplete branch from 61e304f to dd1bfaa Compare March 7, 2025 15:40

poodlewars approved these changes Mar 7, 2025

View reviewed changes

IvoDD merged commit 2b7d252 into master Mar 10, 2025
153 of 154 checks passed

IvoDD deleted the fix-adb-reading-incomplete branch March 10, 2025 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and test arcticdb reading streaming data #2218

Fix and test arcticdb reading streaming data #2218

IvoDD commented Mar 5, 2025

poodlewars Mar 6, 2025

IvoDD Mar 7, 2025

poodlewars Mar 6, 2025

IvoDD Mar 7, 2025 •

edited

Loading

poodlewars Mar 6, 2025 •

edited

Loading

IvoDD Mar 7, 2025 •

edited

Loading

Fix and test arcticdb reading streaming data #2218

Fix and test arcticdb reading streaming data #2218

Conversation

IvoDD commented Mar 5, 2025

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

poodlewars Mar 6, 2025

Choose a reason for hiding this comment

IvoDD Mar 7, 2025

Choose a reason for hiding this comment

poodlewars Mar 6, 2025

Choose a reason for hiding this comment

IvoDD Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

poodlewars Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

IvoDD Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

IvoDD Mar 7, 2025 •

edited

Loading

poodlewars Mar 6, 2025 •

edited

Loading

IvoDD Mar 7, 2025 •

edited

Loading