-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low-code): add DpathFlattenFields #227
feat(low-code): add DpathFlattenFields #227
Conversation
/autofix
|
📝 WalkthroughWalkthroughThis pull request introduces a new transformation called Changes
Sequence DiagramsequenceDiagram
participant Record
participant DpathFlattenFields
participant Config
Record->>DpathFlattenFields: Provide record
DpathFlattenFields->>Config: Evaluate field path
Config-->>DpathFlattenFields: Return path values
DpathFlattenFields->>Record: Flatten fields
alt delete_origin_value is true
DpathFlattenFields->>Record: Remove original field
end
Possibly related PRs
Suggested reviewers
Hey there! 👋 Do you think there are any specific use cases for the ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
Line range hint
1-465
: The import block needs to be sorted according to the style guide.Similar to the first file, the imports need to be reorganized. Would you like me to help with that? wdyt?
🧹 Nitpick comments (7)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1974-1977
: The transformation description could be more detailed.The current description "A transformation that flatten field values to the to top of the record" could be enhanced to better explain the behavior and use cases. What do you think about expanding it to something like:
"A transformation that moves field values from a specified path to the top level of the record, while preserving other top-level fields. This is useful when you need to flatten nested data structures but want to maintain specific top-level fields intact."?
1988-1991
: Consider simplifying the array items type definition.The
field_path
property defines array items asitems: [{ type: string }]
which is an unusual syntax. Would it be clearer to use the more common form?items: - - type: string + type: stringairbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py (3)
13-19
: Would you like to enhance the docstring for better clarity? 📝The current docstring is a bit brief. What do you think about expanding it to include more details? Here's a suggestion:
""" Flattens fields from a specified path in the record to the top level. Args: config (Config): The connector configuration field_path (List[Union[InterpolatedString, str]]): Path to the field to flatten. Can include wildcards (*) for matching multiple fields. parameters (Mapping[str, Any]): Additional parameters for string interpolation delete_origin_value (bool, optional): Whether to delete the original field after flattening. Defaults to False. Example: For a record like: { "id": 1, "data": { "name": "test", "value": 42 } } Using field_path=["data"] will result in: { "id": 1, "data": {"name": "test", "value": 42}, # Only if delete_origin_value=False "name": "test", "value": 42 } """
26-34
: How about simplifying this initialization? 🤔The current implementation has some duplicate logic for converting paths to
InterpolatedString
. Would this simpler version work for you?def __post_init__(self, parameters: Mapping[str, Any]) -> None: - self._field_path = [ - InterpolatedString.create(path, parameters=parameters) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters=parameters - ) + self._field_path = [ + path if isinstance(path, InterpolatedString) + else InterpolatedString.create(path, parameters=parameters) + for path in self.field_path + ]
36-54
: A few suggestions to make this transform method even more robust! 💪
- Would you consider adding a return type hint? Even though it modifies in-place, it helps with clarity:
def transform( self, record: Dict[str, Any], config: Optional[Config] = None, stream_state: Optional[StreamState] = None, stream_slice: Optional[StreamSlice] = None, -) -> None: +) -> Dict[str, Any]: # Shows it returns the modified record
- What about adding some error handling for dpath operations? They might raise exceptions:
try: if "*" in path: matched = dpath.values(record, path) extracted = matched[0] if matched else None else: extracted = dpath.get(record, path, default=[]) except (TypeError, ValueError) as e: logger.warning(f"Failed to extract value at path {path}: {e}") return record
- Consider documenting the in-place modification behavior in the docstring? It's important for users to know the record is modified directly.
What do you think? 🤔
unit_tests/sources/declarative/transformations/test_dpath_flatten_fields.py (2)
5-7
: Quick suggestion about the test constants! ✨What do you think about making these constants more descriptive and type-hinted?
-_ANY_VALUE = -1 +_DUMMY_VALUE: int = -1 # Used as a placeholder in test records -_DELETE_ORIGIN_VALUE = True -_DO_NOT_DELETE_ORIGIN_VALUE = False +_DELETE_ORIGIN_VALUE: bool = True +_KEEP_ORIGIN_VALUE: bool = False # More descriptive than "DO_NOT_DELETE"
10-94
: Great test coverage! Mind if we add a few more edge cases? 🎯The current test suite looks solid! Would you consider adding these scenarios to make it even more robust?
# Invalid path that might raise exceptions pytest.param( {"field1": {"field2": [1, 2, 3]}}, {}, ["field1", "field2", 0], # Invalid path with array index _KEEP_ORIGIN_VALUE, {"field1": {"field2": [1, 2, 3]}}, # Should remain unchanged id="handle invalid path gracefully", ), # Empty dictionary at path pytest.param( {"field1": {"field2": {}}}, {}, ["field1", "field2"], _KEEP_ORIGIN_VALUE, {"field1": {"field2": {}}}, # Should handle empty dict id="handle empty dictionary at path", ), # Nested arrays pytest.param( {"field1": {"field2": [{"a": 1}, {"b": 2}]}}, {}, ["field1", "field2", "*"], _KEEP_ORIGIN_VALUE, { "field1": {"field2": [{"a": 1}, {"b": 2}]}, "a": 1, "b": 2 }, id="flatten nested arrays with objects", ),What do you think about these additions? They would help ensure the transformation handles edge cases gracefully! 🤔
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
(1 hunks)unit_tests/sources/declarative/transformations/test_dpath_flatten_fields.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
[error] 1-11: Import block is un-sorted or un-formatted. The imports need to be organized according to the style guide.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 5-465: Import block is un-sorted or un-formatted. The imports need to be organized according to the style guide.
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
795-812
: Well-structured schema definition!The DpathFlattenFields model is well-documented with clear examples and proper field definitions.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
682-693
: Clean factory implementation!The DpathFlattenFields factory method is well-implemented with proper parameter handling and default values.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1319-1319
: LGTM! The transformation is properly integrated.The
DpathFlattenFields
transformation is correctly added to both thetransformations
andschema_transformations
arrays, maintaining consistency with other transformation types.Also applies to: 1869-1869
airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
Outdated
Show resolved
Hide resolved
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
682-694
: Consider validating field_path contents?The field_path list comprehension assumes all elements are valid. Should we add validation to ensure each element is either a string or InterpolatedString? This would help catch configuration errors early. wdyt?
def create_dpath_flatten_fields( self, model: DpathFlattenFieldsModel, config: Config, **kwargs: Any ) -> DpathFlattenFields: - model_field_path: List[Union[InterpolatedString, str]] = [x for x in model.field_path] + model_field_path: List[Union[InterpolatedString, str]] = [] + for x in model.field_path: + if not isinstance(x, (str, InterpolatedString)): + raise ValueError(f"Invalid field_path element: {x}. Expected string or InterpolatedString.") + model_field_path.append(x) return DpathFlattenFields( config=config, field_path=model_field_path, delete_origin_value=model.delete_origin_value if model.delete_origin_value is not None else False, parameters=model.parameters or {}, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(4 hunks)airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
213-215
: LGTM! Import follows the established pattern.The import statement for
DpathFlattenFields
model is properly grouped with other model imports from the same module.
547-547
: LGTM! Factory mapping is consistent with other entries.The mapping of
DpathFlattenFieldsModel
to its factory method follows the established pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py (2)
13-19
: How about enhancing the docstring to be more descriptive? 🤔The current docstring could better explain the transformation's purpose and behavior. What do you think about something like this:
- """ - Flatten fields only for provided path. - - field_path: List[Union[InterpolatedString, str]] path to the field to flatten. - delete_origin_value: bool = False whether to delete origin field or keep it. Default is False. - - """ + """ + A transformation that flattens nested fields at specified paths by moving them to the record's top level. + + For example, if a record has {'meta': {'field1': 'value1'}}, using path 'meta' would result in {'field1': 'value1'}. + + Args: + field_path: List of paths to the fields that should be flattened. Supports both static strings and interpolated values. + delete_origin_value: If True, removes the original nested field after flattening. Defaults to False. + + Note: + Paths can contain wildcards (*) for matching multiple fields. + """wdyt? This would make it clearer how the transformation works! 🎯
26-35
: Could we simplify the post-initialization logic? 🤔The current implementation has duplicate conversion logic. The list comprehension already converts all paths to InterpolatedString, making the subsequent loop unnecessary. How about simplifying it like this:
def __post_init__(self, parameters: Mapping[str, Any]) -> None: - self._field_path = [ - InterpolatedString.create(path, parameters=parameters) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters=parameters - ) + self._field_path = [InterpolatedString.create(path, parameters=parameters) for path in self.field_path]This achieves the same result more concisely. What are your thoughts? 🎯
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/sources/declarative/transformations/test_dpath_flatten_fields.py (3)
10-102
: Great test coverage! Would you consider adding a few more edge cases? 🤔The test cases are well-structured and cover many scenarios. What do you think about adding tests for:
- Empty objects:
{"field2": {}}
- Empty arrays:
{"field2": []}
- Nested arrays:
{"field2": {"field3": [{"field4": 1}]}}
This would help ensure the transformation handles these edge cases gracefully, wdyt?
103-110
: Would you consider preserving the original input for better debugging? 🤔Currently, we modify the input record directly. How about:
def test_dpath_flatten_lists( input_record, config, field_path, delete_origin_value, expected_record ): + # Create a copy to preserve the original input + record = input_record.copy() flattener = DpathFlattenFields( field_path=field_path, parameters={}, config=config, delete_origin_value=delete_origin_value ) - flattener.transform(input_record) - assert input_record == expected_record + flattener.transform(record) + assert record == expected_recordThis would help with debugging by preserving the original input, wdyt?
103-105
: Would you like to add a docstring to explain the test function? 📝How about adding a docstring to explain what the test function verifies? Something like:
def test_dpath_flatten_lists( input_record, config, field_path, delete_origin_value, expected_record ): """ Test the DpathFlattenFields transformation with various input records and configurations. Args: input_record: The record to transform config: Configuration dictionary for the transformation field_path: Path to the field to flatten delete_origin_value: Whether to delete the original field expected_record: Expected transformed record """This would help other developers understand the purpose of the test, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
(1 hunks)unit_tests/sources/declarative/transformations/test_dpath_flatten_fields.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/transformations/dpath_flatten_fields.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/transformations/test_dpath_flatten_fields.py (1)
1-8
: Nice work on the imports and constants! 👍Clean and well-organized setup with meaningful constant names that improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with these changes to avoid slowing down the migration. However, I believe it would be better to add transformation_level
to FlattenFields
to control the depth to which a record should be flattened.
* main: feat(low-code): add DpathFlattenFields (airbytehq#227)
Created a DPath Enhancing Extractor Refactored the record enhancement logic - moved to the extracted class Split the tests of DPathExtractor and DPathEnhancingExtractor Fix the failing tests: FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_custom_components[test_create_custom_component_with_subcomponent_that_uses_parameters] FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_custom_components_do_not_contain_extra_fields FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_parse_custom_component_fields_if_subcomponent FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_page_increment FAILED unit_tests/sources/declarative/parsers/test_model_to_component_factory.py::test_create_offset_increment FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[simple_unstructured_scenario] FAILED unit_tests/sources/file_based/test_file_based_scenarios.py::test_file_based_read[no_file_extension_unstructured_scenario] They faile because of comparing string and int values of the page_size (public) attribute. Imposed an invariant: on construction, page_size can be set to a string or int keep only values of one type in page_size for uniform comparison (convert the values of the other type) _page_size holds the internal / working value ... unless manipulated directly. Merged: feat(low-code concurrent): Allow async job low-code streams that are incremental to be run by the concurrent framework (airbytehq#228) fix(low-code): Fix declarative low-code state migration in SubstreamPartitionRouter (airbytehq#267) feat: combine slash command jobs into single job steps (airbytehq#266) feat(low-code): add items and property mappings to dynamic schemas (airbytehq#256) feat: add help response for unrecognized slash commands (airbytehq#264) ci: post direct links to html connector test reports (airbytehq#252) (airbytehq#263) fix(low-code): Fix legacy state migration in SubstreamPartitionRouter (airbytehq#261) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(low-code): add DpathFlattenFields (airbytehq#227) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
What
source-airtable need record transformations that moves everything inside "fields" field to the top level of records and keeps others top level fields.
We already have FlattenFields transformation, but it transforms also objects inside "fields", records in airtable should not follow this format. Everything inside "fields" should be kept as is.
How.
Added
DpathFlattenFields
, which works with provided path and updates record with object found in the path, doesn't change fields inside the object.Summary by CodeRabbit
New Features
DpathFlattenFields
transformation for flattening nested fields in records.Tests
DpathFlattenFields
transformation.