-
Notifications
You must be signed in to change notification settings - Fork 24
feat: Adds ZipfileDecoder
component
#169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 36 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
e68f36f
initial JsonParser component
pnilan a8a7bb3
update parser
pnilan 254f877
add tests for json parser
pnilan 8df239a
update parser and tests to yield empty dict if unparseable.
pnilan 8c7d5f8
add zipfile_decoder
pnilan 92574df
chore: format code
pnilan 6e4b376
Merge branch 'pnilan/declarative/parsers' into pnilan/declarative/zip…
pnilan ab3f404
update zipfile_decoder and relevants tests
pnilan 82a15c9
Merge branch 'main' into pnilan/declarative/parsers
pnilan 49d0ec8
Merge branch 'pnilan/declarative/parsers' into pnilan/declarative/zip…
pnilan 96ec874
remove errant comment
pnilan 0b3b5e1
Merge branch 'main' into pnilan/declarative/parsers
pnilan 9fd93cb
conform tests
pnilan 1892a03
initial test updates
pnilan 51118f1
update JsonParser and relevant tests
pnilan 34a710d
chore: format/type-check
pnilan 060178a
remove orjson from composite_raw_decoder file
pnilan bf8dd26
Merge branch 'main' into pnilan/declarative/parsers
pnilan d9b6df3
chore: format code
pnilan f20fffc
add additional test
pnilan 9ce2c28
update to fallback to json library if orjson fails, update test to us…
pnilan 7e7b2c4
add `JsonParser` to GzipDecoder and CompositeRawDecoder "anyOf" list
pnilan 23cbfb7
update to simplify orjson/json parsing
pnilan 1c2a832
chore: type-check
pnilan 66aaae9
unlock `CompositeRawDecoder` w/ `JsonParser` support for pagination
pnilan 00cf7b1
update conditional validations for decoders/parsers for pagination
pnilan b7aa78f
remove errant print
pnilan 7b41732
chore: coderabbitai suggestions
pnilan 3f550f2
update parservalidation method
pnilan 27bf5a7
Merge branch 'main' into pnilan/declarative/parsers
pnilan bd724bf
Merge branch 'pnilan/declarative/parsers' into pnilan/declarative/zip…
pnilan 350fcdb
remove unnecessary parser
pnilan aae3e77
update ZipfileDecoder to hanlde underlying parsers
pnilan fe6859f
update types
pnilan 907f628
Merge branch 'main' into pnilan/declarative/zipfiledecoder
pnilan 8a1ccf0
add `ZipfileDecoder` to `anyOf` validator in declarative component sc…
pnilan fcb3184
update `anyOf` validations to include `ZipfileDecoder` in declarative…
pnilan 1c8cd66
update zipfiledecoder and relevant tests
pnilan 1777cac
adds JsonLineParser and CsvParser to available underlying parsers for…
pnilan c0b2130
close zipfile context, add exception logging at decoder level, and ad…
pnilan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
47 changes: 47 additions & 0 deletions
47
airbyte_cdk/sources/declarative/decoders/zipfile_decoder.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# | ||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||
# | ||
|
||
import logging | ||
import zipfile | ||
from dataclasses import InitVar, dataclass | ||
from io import BufferedReader, BytesIO | ||
from typing import Any, Generator, Mapping, MutableMapping, Optional | ||
|
||
import requests | ||
|
||
from airbyte_cdk.models import FailureType | ||
from airbyte_cdk.sources.declarative.decoders import Decoder | ||
from airbyte_cdk.sources.declarative.decoders.composite_raw_decoder import Parser | ||
from airbyte_cdk.utils import AirbyteTracedException | ||
|
||
logger = logging.getLogger("airbyte") | ||
|
||
|
||
@dataclass | ||
class ZipfileDecoder(Decoder): | ||
parser: Parser | ||
|
||
def is_stream_response(self) -> bool: | ||
return True | ||
|
||
def decode( | ||
self, response: requests.Response | ||
) -> Generator[MutableMapping[str, Any], None, None]: | ||
try: | ||
zip_file = zipfile.ZipFile(BytesIO(response.content)) | ||
except zipfile.BadZipFile as e: | ||
logger.error( | ||
f"Received an invalid zip file in response to URL: {response.request.url}. " | ||
f"The size of the response body is: {len(response.content)}" | ||
) | ||
raise AirbyteTracedException( | ||
message="Received an invalid zip file in response.", | ||
internal_message=f"Received an invalid zip file in response to URL: {response.request.url}.", | ||
failure_type=FailureType.system_error, | ||
) from e | ||
|
||
for filename in zip_file.namelist(): | ||
with zip_file.open(filename) as file: | ||
yield from self.parser.parse(BytesIO(file.read())) | ||
zip_file.close() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
unit_tests/sources/declarative/decoders/test_zipfile_decoder.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# | ||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||
# | ||
import gzip | ||
import json | ||
import zipfile | ||
from io import BytesIO | ||
from typing import Union | ||
|
||
import pytest | ||
import requests | ||
|
||
from airbyte_cdk.sources.declarative.decoders import ZipfileDecoder | ||
from airbyte_cdk.sources.declarative.decoders.parsers import JsonParser | ||
|
||
|
||
def create_zip_from_dict(data: Union[dict, list]): | ||
zip_buffer = BytesIO() | ||
with zipfile.ZipFile(zip_buffer, mode="w") as zip_file: | ||
zip_file.writestr("data.json", data) | ||
zip_buffer.seek(0) | ||
return zip_buffer.getvalue() | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"json_data", | ||
[ | ||
{"test": "test"}, | ||
[{"id": 1}, {"id": 2}], | ||
], | ||
) | ||
def test_zipfile_decoder_with_valid_response(requests_mock, json_data): | ||
pnilan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
zipfile_decoder = ZipfileDecoder(parameters={}, parser=JsonParser) | ||
compressed_data = gzip.compress(json.dumps(json_data).encode()) | ||
zipped_data = create_zip_from_dict(compressed_data) | ||
requests_mock.register_uri("GET", "https://airbyte.io/", content=zipped_data) | ||
response = requests.get("https://airbyte.io/") | ||
|
||
if isinstance(json_data, list): | ||
for i, actual in enumerate(zipfile_decoder.decode(response=response)): | ||
assert actual == json_data[i] | ||
else: | ||
assert next(zipfile_decoder.decode(response=response)) == json_data |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.