Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and test arcticdb reading streaming data #2218

Merged
merged 1 commit into from
Mar 10, 2025

Conversation

IvoDD
Copy link
Collaborator

@IvoDD IvoDD commented Mar 5, 2025

Reference Issues/PRs

Fixes monday ref: 7855342201

What does this implement or fix?

Fixes:

  • Column filter in static schema
  • Column ordering when introducing a new column with an incomplete segment

Tests:

  • Columns filter in static and dynamic schema
  • Reading diffrent schema incompletes
  • Compatibility test for reading incompletes from an old env

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@IvoDD IvoDD added the patch Small change, should increase patch version label Mar 5, 2025
@IvoDD IvoDD marked this pull request as ready for review March 5, 2025 14:50
@IvoDD IvoDD force-pushed the fix-adb-reading-incomplete branch from 8614e5a to 07d0ab6 Compare March 6, 2025 08:21
read_df = curr.lib._nvs.read(sym, date_range=(None, None), incomplete=True).data
assert_frame_equal(read_df, df)

read_df = curr.lib._nvs.read(sym, date_range=(None, None), incomplete=True, columns=["float_col"]).data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check that filtering down to several columns (rather than just one) works, and that the date_range filtering works

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more assertions to the test

@@ -1116,9 +1124,9 @@ bool read_incompletes_to_pipeline(
pipeline_context->staged_descriptor_ =
merge_descriptors(seg.descriptor(), incomplete_segments, read_query.columns);
if (pipeline_context->desc_) {
const std::array fields_ptr = {pipeline_context->desc_->fields_ptr()};
const std::array staged_fields_ptr = {pipeline_context->staged_descriptor_->fields_ptr()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why we need this change?

Copy link
Collaborator Author

@IvoDD IvoDD Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that if call merge_descriptors(descriptor_from_index_key, new_columns_from_staged_data) instead of the other way round.

This way if we have staged data which is missing columns it won't completely reorder the columns. Previously if e.g. we had index with columns col_1, col_2, col_3 and had an incomplete with col_2, col_3 reading them both would result in col_2, col_3, col_1, which seems quite odd. The dynamic schema test tests this

@@ -80,3 +81,47 @@ def test_read_incompletes_no_chunking(lmdb_version_store_tiny_segment):

ref_keys = lib_tool.find_keys_for_symbol(KeyType.APPEND_REF, sym)
assert len(ref_keys) == 1

@pytest.mark.parametrize("dynamic_schema", [True, False])
def test_read_incompletes_columns_filter(version_store_factory, dynamic_schema):
Copy link
Collaborator

@poodlewars poodlewars Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas for extra tests (some may already be covered elsewhere):

  • Tests where we append incompletes after a compaction (like you do in the test below)
  • Tests with a chain of more than two incomplete segments (eg I could imagine messing up the linked list traversal)
  • Filtering down to more than one column
  • Filters on date_range

Copy link
Collaborator Author

@IvoDD IvoDD Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all of the suggestions with new assertions

@IvoDD IvoDD force-pushed the fix-adb-reading-incomplete branch 2 times, most recently from 7da390e to 61e304f Compare March 7, 2025 13:13
Fixes:
- Column filter in static schema
- Column ordering when introducing a new column with an incomplete
  segment

Tests:
- Columns filter in static and dynamic schema
- Reading diffrent schema incompletes
- Compatibility test for reading incompletes from an old env
@IvoDD IvoDD force-pushed the fix-adb-reading-incomplete branch from 61e304f to dd1bfaa Compare March 7, 2025 15:40
@IvoDD IvoDD merged commit 2b7d252 into master Mar 10, 2025
153 of 154 checks passed
@IvoDD IvoDD deleted the fix-adb-reading-incomplete branch March 10, 2025 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants