-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and test arcticdb reading streaming data #2218
Conversation
8614e5a
to
07d0ab6
Compare
read_df = curr.lib._nvs.read(sym, date_range=(None, None), incomplete=True).data | ||
assert_frame_equal(read_df, df) | ||
|
||
read_df = curr.lib._nvs.read(sym, date_range=(None, None), incomplete=True, columns=["float_col"]).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to check that filtering down to several columns (rather than just one) works, and that the date_range
filtering works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more assertions to the test
@@ -1116,9 +1124,9 @@ bool read_incompletes_to_pipeline( | |||
pipeline_context->staged_descriptor_ = | |||
merge_descriptors(seg.descriptor(), incomplete_segments, read_query.columns); | |||
if (pipeline_context->desc_) { | |||
const std::array fields_ptr = {pipeline_context->desc_->fields_ptr()}; | |||
const std::array staged_fields_ptr = {pipeline_context->staged_descriptor_->fields_ptr()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that if call merge_descriptors(descriptor_from_index_key, new_columns_from_staged_data)
instead of the other way round.
This way if we have staged data which is missing columns it won't completely reorder the columns. Previously if e.g. we had index with columns col_1, col_2, col_3
and had an incomplete with col_2, col_3
reading them both would result in col_2, col_3, col_1
, which seems quite odd. The dynamic schema test tests this
@@ -80,3 +81,47 @@ def test_read_incompletes_no_chunking(lmdb_version_store_tiny_segment): | |||
|
|||
ref_keys = lib_tool.find_keys_for_symbol(KeyType.APPEND_REF, sym) | |||
assert len(ref_keys) == 1 | |||
|
|||
@pytest.mark.parametrize("dynamic_schema", [True, False]) | |||
def test_read_incompletes_columns_filter(version_store_factory, dynamic_schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few ideas for extra tests (some may already be covered elsewhere):
- Tests where we append incompletes after a compaction (like you do in the test below)
- Tests with a chain of more than two incomplete segments (eg I could imagine messing up the linked list traversal)
- Filtering down to more than one column
- Filters on
date_range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added all of the suggestions with new assertions
7da390e
to
61e304f
Compare
Fixes: - Column filter in static schema - Column ordering when introducing a new column with an incomplete segment Tests: - Columns filter in static and dynamic schema - Reading diffrent schema incompletes - Compatibility test for reading incompletes from an old env
61e304f
to
dd1bfaa
Compare
Reference Issues/PRs
Fixes monday ref: 7855342201
What does this implement or fix?
Fixes:
Tests:
Any other comments?
Checklist
Checklist for code changes...