-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source-declarative-manifest): add support for custom Python components from dynamic text input #174
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (7)
unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py (1)
21-24
: Consider using snake_case for variable names to follow Python conventionsVariable names
currPage
andtotalPages
are using camelCase, which is less common in Python. Could we rename them tocurrent_page
andtotal_pages
to match PEP 8 naming conventions? WDYT?unit_tests/source_declarative_manifest/conftest.py (1)
13-20
: Should we handle invalidhash_type
inputs gracefully?Currently, if an invalid
hash_type
is provided tohash_text
, it raises aKeyError
. Would it be better to handle this case and raise a more informative exception or provide a default behavior? WDYT?airbyte_cdk/cli/source_declarative_manifest/_run.py (1)
174-178
: LGTM! Consider enhancing the error message with the actual value?The type validation is a great addition! Would you consider adding the actual value to the error message to make debugging easier? Something like:
- f"but got type: {type(config['__injected_declarative_manifest'])}" + f"but got type: {type(config['__injected_declarative_manifest'])} with value: {config['__injected_declarative_manifest']!r}"wdyt?
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (2)
42-60
: Consider adding edge cases to the test?The basic functionality test looks good! Would you consider adding some edge cases like:
- Empty string input
- Invalid Python syntax
- Module with syntax errors
wdyt?
89-134
: Comprehensive integration test, but might need error casesThe happy path testing is thorough! Would you consider adding test cases for:
- Missing or invalid checksums
- Invalid manifest structure
- Error handling during source creation
Also, the start_date truncation comment could be more descriptive about why 2 days is chosen:
- # Truncate the start_date to speed up tests + # Truncate the start_date to 2 days ago to reduce API calls while still testing recent dataunit_tests/source_declarative_manifest/resources/source_the_guardian_api/manifest.yaml (2)
63-156
: Consider reducing duplication in stream definitions?I notice that
base_stream
andcontent_stream
have identical configurations forincremental_sync
,retriever
, and their sub-components. Would it make sense to use YAML anchors and aliases to reduce this duplication? For example:base_stream: &base_stream_config incremental_sync: # ... existing config ... retriever: # ... existing config ... content_stream: <<: *base_stream_config schema_loader: # ... specific schema config ...This would make the configuration more maintainable and less prone to drift. wdyt?
327-327
: Consider tightening the date pattern regex?The current pattern
^([1-9][0-9]{3})\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])$
allows some invalid dates (like 2023-02-31). Would it be helpful to use a more restrictive pattern that validates actual calendar dates? wdyt?Also applies to: 374-374
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
airbyte_cdk/cli/source_declarative_manifest/_run.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(3 hunks)airbyte_cdk/test/utils/manifest_only_fixtures.py
(3 hunks)pyproject.toml
(1 hunks)unit_tests/source_declarative_manifest/conftest.py
(1 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/.gitignore
(1 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/README.md
(1 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py
(1 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/manifest.yaml
(1 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/valid_config.yaml
(1 hunks)unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- unit_tests/source_declarative_manifest/resources/source_the_guardian_api/.gitignore
- unit_tests/source_declarative_manifest/resources/source_the_guardian_api/valid_config.yaml
- unit_tests/source_declarative_manifest/resources/source_the_guardian_api/README.md
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (9)
unit_tests/source_declarative_manifest/conftest.py (1)
23-23
: Adding return type annotation looks good!The addition of the return type annotation
-> str
improves code readability and type checking.airbyte_cdk/test/utils/manifest_only_fixtures.py (1)
54-68
:⚠️ Potential issueSecurity consideration: Using exec() on user input
The implementation looks clean, but executing arbitrary Python code could be dangerous. Should we add a warning in the docstring about the security implications? Also, consider adding input validation or sandboxing if this will be used with untrusted input.
Let's check if this is used with user input:
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (1)
28-35
: LGTM! Simple and clear test fixture.The sample components are well-structured and demonstrate both function and class implementations.
pyproject.toml (1)
129-129
: LGTM! Useful addition for dependency management.The new lock task provides a clean way to update the lockfile without updating dependencies.
unit_tests/source_declarative_manifest/resources/source_the_guardian_api/manifest.yaml (5)
312-312
: Consider restricting additional properties?
additionalProperties: true
allows any unknown properties in the configuration. Given this is an API connector, would it be safer to set this tofalse
to catch potential typos in property names? wdyt?
297-300
: LGTM! Simple and effective check configuration.The check configuration properly validates the essential "content" stream.
304-304
: LGTM! Well-documented with proper security considerations.The documentation URLs and descriptions are clear, and the API key is properly marked as
airbyte_secret: true
.Also applies to: 317-319
1-1
: Consider using a more specific version format?The version "4.3.2" seems to be using semantic versioning, but it might be helpful to align it with the actual Guardian API version if one exists. wdyt?
Let me check if there's a specific version format used by The Guardian API:
45-45
: Verify the availability of the custom pagination class?The
CustomPageIncrement
class is referenced in multiple places. Let's verify its existence and implementation:Also applies to: 91-92, 139-139, 231-231
✅ Verification successful
The CustomPageIncrement class is properly implemented and available! 👍
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if the custom pagination class exists and examine its implementation rg -l "class CustomPageIncrement" ast-grep --pattern 'class CustomPageIncrement { $$$ }'Length of output: 179
Script:
#!/bin/bash # Examine the implementation of CustomPageIncrement class rg -A 10 "class CustomPageIncrement" unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.pyLength of output: 577
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py
Outdated
Show resolved
Hide resolved
Docstrings generation was requested by @aaronsteers. * #174 (comment) The following files were modified: * `airbyte_cdk/cli/source_declarative_manifest/_run.py` * `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` * `airbyte_cdk/test/utils/manifest_only_fixtures.py` * `unit_tests/source_declarative_manifest/conftest.py` * `unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py` * `unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py`
Note We have generated docstrings for this pull request, at #218 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
990-993
: Consider adding error handling for missing components module.The code assumes the components module will be available. Should we add more descriptive error messages for common failure scenarios? WDYT?
custom_component_class = self._get_class_from_fully_qualified_class_name( full_qualified_class_name=model.class_name, - components_module=self._get_components_module_object(config=config), + components_module=self._get_components_module_object(config=config), + ) if hasattr(model, 'class_name') else None + + if not custom_component_class: + raise ValueError(f"Could not create custom component. Missing class_name in {model}")
1084-1107
: Consider documenting security implications of executing code from configuration.Based on the retrieved learnings, custom Python components are intentionally unrestricted to support unpredictable use cases. Should we add a docstring warning about the security implications? WDYT?
def _get_components_module_object( config: Config, ) -> types.ModuleType: - """Get a components module object based on the provided config. + """Get a components module object based on the provided config and execute the code. If custom python components is provided, this will be loaded. Otherwise, we will attempt to load from the `components` module already imported. + + Warning: + This method executes Python code from configuration without restrictions. + Ensure that the configuration is from a trusted source as it has full access + to Python's capabilities including file operations and network access. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(3 hunks)
🧰 Additional context used
📓 Learnings (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1063-1067
Timestamp: 2025-01-13T23:49:48.658Z
Learning: When implementing code execution from configuration, use a combination of security measures:
1. AST validation to detect dangerous operations
2. RestrictedPython or custom restricted execution environment
3. Limited builtins to prevent access to dangerous functions
4. Import hooks and global restrictions to prevent file/network access
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1063-1067
Timestamp: 2025-01-13T23:38:28.436Z
Learning: When executing Python code from configuration, implement security measures like AST validation, sandboxing, or restricted execution environments to prevent security vulnerabilities.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
10-11
: LGTM! The new imports support the custom components functionality.The addition of
sys
andtypes
modules is appropriate for dynamic module creation and management.
1046-1083
: The module name validation aligns with the documented requirements.Based on the retrieved learnings, the strict module name checks are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte_cdk/cli/source_declarative_manifest/_run.py
(1 hunks)airbyte_cdk/connector_builder/connector_builder_handler.py
(1 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(3 hunks)airbyte_cdk/test/utils/manifest_only_fixtures.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- airbyte_cdk/test/utils/manifest_only_fixtures.py
- airbyte_cdk/cli/source_declarative_manifest/_run.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
- airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/connector_builder/connector_builder_handler.py
[error] 55-55: Argument "config" to "ManifestDeclarativeSource" has incompatible type "Mapping[str, Any]"; expected "dict[str, Any] | None"
airbyte_cdk/sources/declarative/manifest_declarative_source.py
[error] 86-86: Argument "config" to "get_registered_components_module" has incompatible type "dict[str, Any] | None"; expected "dict[str, Any]"
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)
71-76
: Great job on the docstring improvements!The docstring follows a clear format with the
Args
section and provides detailed descriptions for each parameter.
airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py
Outdated
Show resolved
Hide resolved
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides there being one very minor place we could use a unit test, this was so well written it brought a tear to my eye.
Well structured, with simple single purpose functions, and great comments when necessary.
This was a joy to read. Thanks @aaronsteers .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes make sense to me, though I did not test it locally.
Thank you for the thorough documentation throughout the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py (1)
131-136
: 🛠️ Refactor suggestionConsider adding security measures around exec(), wdyt?
The use of
exec()
without restrictions could be dangerous. We should consider:
- Adding a restricted globals dict
- Running the code in a separate namespace
# Create a new module object components_module = ModuleType(name=COMPONENTS_MODULE_NAME) + # Create a restricted globals dict + restricted_globals = { + '__builtins__': { + name: getattr(__builtins__, name) + for name in ['True', 'False', 'None', 'len', 'dict', 'list', 'str', 'int', 'float'] + } + } + # Execute the module text in the module's namespace - exec(components_py_text, components_module.__dict__) + exec(components_py_text, restricted_globals, components_module.__dict__)
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py (2)
12-16
: Consider using a more restrictive type for ChecksumType, wdyt?The type alias could be more specific by using the exact string literals that are used in
CHECKSUM_FUNCTIONS
.-ChecksumType = Literal["md5", "sha256"] +ChecksumType = Literal["md5", "sha256"] # type: ignore CHECKSUM_FUNCTIONS: dict[ChecksumType, Any] = { "md5": hashlib.md5, "sha256": hashlib.sha256, }
25-30
: Consider adding more details to the error message, wdyt?The docstring explains the purpose well, but the error message could include more specific details about what to do when this error occurs.
class AirbyteCodeTamperedError(Exception): """Raised when the connector's components module does not match its checksum. This is a fatal error, as it can be a sign of code tampering. """ + def __init__(self, message: str) -> None: + super().__init__( + f"{message} This could indicate code tampering. Please verify the source " + "of your connector code and ensure it hasn't been modified in transit." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/manifest_declarative_source.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py
(1 hunks)unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/manifest_declarative_source.py
- unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/custom_code_compiler.py (2)
63-82
: Add tests for checksum validation, wdyt?We should add unit tests to verify that:
- Checksums are correctly validated
- Tampering attempts are detected
- Different hash types work as expected
#!/bin/bash # Check if tests exist for checksum validation rg -l "test.*checksum" "test_custom_code_compiler.py"
63-82
: 🛠️ Refactor suggestionConsider adding additional security validations for the code text, wdyt?
The function validates checksums but could benefit from additional security measures:
- Basic syntax validation before execution
- Size limits for the code text
- Pattern matching for potentially dangerous imports or function calls
def validate_python_code( code_text: str, checksums: dict[str, str] | None, ) -> None: """Validate the provided Python code text against the provided checksums. Currently we fail if no checksums are provided, although this may change in the future. """ + # Add size limit + if len(code_text) > 1_000_000: # 1MB + raise ValueError("Code text is too large.") + + # Basic syntax validation + try: + compile(code_text, '<string>', 'exec') + except SyntaxError as e: + raise ValueError(f"Invalid Python syntax: {e}") + + # Check for potentially dangerous patterns + dangerous_patterns = ['os.system', 'subprocess', 'eval(', 'exec('] + for pattern in dangerous_patterns: + if pattern in code_text: + raise ValueError(f"Potentially dangerous code pattern found: {pattern}") + if not checksums: raise ValueError(f"A checksum is required to validate the code. Received: {checksums}")Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (4)
102-107
: Add error handling for YAML parsing.The YAML loading could fail with malformed input. Should we add try-except blocks for better error messages? Wdyt?
- manifest_dict = yaml.safe_load(manifest_yml_path.read_text()) - assert manifest_dict, "Failed to load the manifest file." + try: + manifest_dict = yaml.safe_load(manifest_yml_path.read_text()) + if not manifest_dict: + raise ValueError("Manifest file is empty") + except yaml.YAMLError as e: + raise ValueError(f"Failed to parse manifest YAML: {e}")
298-301
: Specify the expected exception type.The bare
except
could mask important errors likeKeyboardInterrupt
. Should we specify the expected exception type? Wdyt?- with pytest.raises(Exception): + with pytest.raises((ValueError, RuntimeError)): # specify expected exceptions
267-273
: Ensure proper cleanup of temporary files.The
delete=False
flag means temporary files aren't automatically cleaned up. Should we use a context manager to ensure cleanup? Wdyt?- with NamedTemporaryFile(delete=False, suffix=".json") as temp_config_file: + temp_config_file = NamedTemporaryFile(delete=False, suffix=".json") + try: json_str = json.dumps(py_components_config_dict) Path(temp_config_file.name).write_text(json_str) temp_config_file.flush() source = create_declarative_source( ["check", "--config", temp_config_file.name], ) + finally: + os.unlink(temp_config_file.name)
237-241
: Consider using a mock for credentials.The test is skipped if secrets.yaml is missing. Should we consider using a mock instead of requiring real credentials? This would make the tests more reliable and faster. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(1 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(3 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(3 hunks)pyproject.toml
(1 hunks)unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- pyproject.toml
- airbyte_cdk/sources/declarative/manifest_declarative_source.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
- airbyte_cdk/sources/declarative/concurrent_declarative_source.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (1)
1-45
: LGTM! Well-organized imports and clear test fixture.The imports are comprehensive and properly typed. The
SAMPLE_COMPONENTS_PY_TEXT
provides a clear and simple test fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components_failing.py (3)
15-17
: Consider inheriting from a more specific exception class?The exception class looks good! Though I was wondering if inheriting from
RuntimeError
might be more appropriate since this is an intentional runtime failure, wdyt? 🤔-class IntentionalException(Exception): +class IntentionalException(RuntimeError): """This exception is raised intentionally in order to test error handling."""
31-39
: Could we make the page_size validation more robust?The page size validation looks good, but I have a few suggestions that might make it even better:
- Should we add a check for negative page sizes?
- The error message could be more descriptive, maybe include the actual value?
What do you think about something like this? 🤔
def __post_init__(self, parameters: Mapping[str, Any]) -> None: if isinstance(self.page_size, int) or (self.page_size is None): + if self.page_size is not None and self.page_size <= 0: + raise ValueError(f"Page size must be positive, got: {self.page_size}") self._page_size = self.page_size else: page_size = InterpolatedString(self.page_size, parameters=parameters).eval(self.config) if not isinstance(page_size, int): - raise Exception(f"{page_size} is of type {type(page_size)}. Expected {int}") + raise ValueError(f"Page size must be an integer, got: {page_size} of type {type(page_size).__name__}") self._page_size = page_size
40-52
: Add context to intentional failures?Both
initial_token
andnext_page_token
raiseIntentionalException
, but would it be helpful to add some context about which method failed? This could make debugging test failures easier, wdyt? 🤔@property def initial_token(self) -> Optional[Any]: - raise IntentionalException() + raise IntentionalException("Intentional failure in initial_token") def next_page_token( self, response: requests.Response, last_page_size: int, last_record: Optional[Record], last_page_token_value: Optional[Any], ) -> Optional[Any]: - raise IntentionalException() + raise IntentionalException("Intentional failure in next_page_token")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(1 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(3 hunks)unit_tests/connector_builder/test_connector_builder_handler.py
(3 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py
(1 hunks)unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components_failing.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/manifest_declarative_source.py
- airbyte_cdk/sources/declarative/concurrent_declarative_source.py
🧰 Additional context used
📓 Learnings (1)
unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py (1)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (7)
unit_tests/connector_builder/test_connector_builder_handler.py (3)
347-347
: LGTM! Nice update to use keyword argument.The change from positional to keyword argument
source_config
improves code clarity and maintainability.
508-508
: LGTM! Consistent with the previous change.The update maintains consistency with the earlier change to use the
source_config
keyword argument.
595-595
: LGTM! Completes the consistent updates.The change aligns with the pattern of using the
source_config
keyword argument across all test cases.unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components_failing.py (3)
1-13
: Clean imports and proper copyright notice!The imports are well-organized and properly separated between standard library, third-party, and local modules. Nice work! 👍
19-23
: Docstring seems inconsistent with implementationThe docstring mentions starting from page 1, but
start_from_page
defaults to 0. Should we align these, or am I missing something? 🤔
53-54
: LGTM!The
get_page_size
method is simple and does exactly what it should. 👍unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py (1)
1-61
: Code looks good!The
CustomPageIncrement
class is implemented correctly, and all methods align with the intended functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/declarative/test_manifest_declarative_source.py (2)
19-19
: Would you consider expanding the comment to be more specific about why it's needed?The current comment indicates it's needed for dynamic imports, but it might be helpful to explain which specific dynamic imports rely on this module, wdyt?
-import unit_tests.sources.declarative.external_component # Needed for dynamic imports to work +import unit_tests.sources.declarative.external_component # Required for custom component dynamic imports in test_valid_manifest
268-271
: LGTM! Consider extracting these assertions into a helper method?The assertions are well-structured and verify the module loading hierarchy. To improve maintainability and reusability, would you consider extracting them into a helper method, wdyt?
+def assert_modules_loaded(): + """Verify that all required modules are properly loaded.""" + assert "unit_tests" in sys.modules + assert "unit_tests.sources" in sys.modules + assert "unit_tests.sources.declarative" in sys.modules + assert "unit_tests.sources.declarative.external_component" in sys.modules + def test_valid_manifest(): manifest = { # ... manifest configuration ... } + assert_modules_loaded() - assert "unit_tests" in sys.modules - assert "unit_tests.sources" in sys.modules - assert "unit_tests.sources.declarative" in sys.modules - assert "unit_tests.sources.declarative.external_component" in sys.modules
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/test_manifest_declarative_source.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
unit_tests/sources/declarative/test_manifest_declarative_source.py (1)
Line range hint
272-1207
: No changes in this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (3)
140-148
: Consider using context manager for temporary file cleanup.The temporary files created with
delete=False
aren't automatically cleaned up. Should we ensure cleanup using a try-finally block? Wdyt?- with NamedTemporaryFile(delete=False, suffix=".json") as temp_config_file: + temp_config_file = NamedTemporaryFile(delete=False, suffix=".json") + try: json_str = json.dumps(py_components_config_dict) Path(temp_config_file.name).write_text(json_str) temp_config_file.flush() source = create_declarative_source( ["check", "--config", temp_config_file.name], ) + finally: + os.unlink(temp_config_file.name)Also applies to: 174-182, 267-274
298-301
: Specify exception type inexcept
block.The bare
except
could catch and mask important exceptions likeKeyboardInterrupt
. Should we specify the expected exception type? Wdyt?- with pytest.raises(Exception): + with pytest.raises((ValueError, RuntimeError)): # specify expected exceptions for msg in msg_iterator: assert msg return
237-241
: Consider creating a mock source for testing.The TODO comment suggests creating a test source that doesn't require credentials. This would make the tests more reliable and faster. Would you like me to help create a mock source implementation?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (2)
1-45
: LGTM! Well-organized imports and clear test data.The imports are comprehensive and properly typed, and the sample component code provides a good test case.
102-120
: Consider adding error handling for YAML parsing.The YAML parsing could fail with malformed files. Should we add try-except blocks to provide more informative error messages? Wdyt?
- manifest_dict = yaml.safe_load(manifest_yml_path.read_text()) + try: + manifest_dict = yaml.safe_load(manifest_yml_path.read_text()) + except yaml.YAMLError as e: + raise ValueError(f"Failed to parse manifest file: {e}")
* remotes/airbyte/main: fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor (airbytehq#254) feat(low-code): add profile assertion flow to oauth authenticator component (airbytehq#236) feat(Low-Code Concurrent CDK): Add ConcurrentPerPartitionCursor (airbytehq#111) fix: don't mypy unit_tests (airbytehq#241) fix: handle backoff_strategies in CompositeErrorHandler (airbytehq#225) feat(concurrent cursor): attempt at clamping datetime (airbytehq#234) ci: use `ubuntu-24.04` explicitly (resolves CI warnings) (airbytehq#244) Fix(sdm): module ref issue in python components import (airbytehq#243) feat(source-declarative-manifest): add support for custom Python components from dynamic text input (airbytehq#174) chore(deps): bump avro from 1.11.3 to 1.12.0 (airbytehq#133) docs: comments on what the `Dockerfile` is for (airbytehq#240) chore: move ruff configuration to dedicated ruff.toml file (airbytehq#237)
Summary
Adds ability to parse
components.py
text passed in the config. The feature is disabled-by-default in all of our current runtimes, but it unblocks the next stage of development and testing of the Platform. See below security section for more info.New feature
The
source-declarative-manifest
connector will now accept these config args:__injected_manifest
: (already existing config) this is how we receive the manifest yaml.__injected_components_py
: (new) the Pythoncomponents.py
text passed from the Builder__injected_components_py_checksums
: (new) one or more checksum to ensure code is not tampered with in flight:md5
sha256
Notes on security
AIRBYTE_ALLOW_CUSTOM_CODE
env var is set totrue
.Summary by CodeRabbit
New Features
start_date
configuration option added for relevant time periods.Bug Fixes
Documentation
Chores
.gitignore
to exclude sensitive files.