Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Incorporate SDM in CDK and add publish workflow #58

Merged
merged 29 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
cbb34c9
add manifest connectors to test matrix
aaronsteers Nov 13, 2024
6979c68
ci: add docker build job
aaronsteers Nov 13, 2024
f0566c3
feat: add cli script for sdm
aaronsteers Nov 13, 2024
423d6b3
feat: add sdm cli
aaronsteers Nov 13, 2024
17c5ee6
feat: publish official sdm docker image after pypi publish
aaronsteers Nov 13, 2024
9094390
feat: working Dockerfile for declarative-manifest
ChristoGrab Nov 14, 2024
91a9799
update docker-build action
ChristoGrab Nov 14, 2024
b72056b
modify docker-build action
ChristoGrab Nov 14, 2024
87c06ea
refactor docker-build action
ChristoGrab Nov 14, 2024
b5be82c
Apply suggestions from code review
aaronsteers Nov 15, 2024
089cf02
Auto-fix lint and format issues
Nov 15, 2024
176c901
add this branch for testing docker builds
aaronsteers Nov 15, 2024
84ba231
fix comment syntax
aaronsteers Nov 15, 2024
c8ff536
full test on dev branch
aaronsteers Nov 15, 2024
374e419
fix secrets names
aaronsteers Nov 15, 2024
0137ba2
fix linting/typing issues
aaronsteers Nov 15, 2024
78f770d
fix more mypy issues
aaronsteers Nov 15, 2024
dd01890
add multi-arch build and vulnerability scanning
ChristoGrab Nov 18, 2024
c66b05e
chore: add test branch to publish step
ChristoGrab Nov 18, 2024
a4a52e0
chore: fix double-quotes in build step
ChristoGrab Nov 18, 2024
8e6f1a7
chore: define single architecture for test build
ChristoGrab Nov 18, 2024
12984d2
Apply suggestions from code review
ChristoGrab Nov 18, 2024
9709c3c
address review comments
ChristoGrab Nov 18, 2024
c680b90
chore: resolve merge conflicts
ChristoGrab Nov 18, 2024
8d0ffc9
chore: update lockfile
ChristoGrab Nov 18, 2024
bc5de0d
revert change to yaml_declarative_source
ChristoGrab Nov 18, 2024
3b86602
Update .github/workflows/cdk-publish.yml
ChristoGrab Nov 18, 2024
87acda0
Update .github/workflows/cdk-publish.yml
ChristoGrab Nov 18, 2024
7df3955
Update .github/workflows/cdk-publish.yml
ChristoGrab Nov 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/connector-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ jobs:
cdk_extra: vector-db-based
- connector: destination-motherduck
cdk_extra: sql
# TODO: These are manifest connectors and won't work as expected until we
# add `--use-local-cdk` support for manifest connectors.
- connector: source-the-guardian-api
cdk_extra: n/a
- connector: source-pokeapi
cdk_extra: n/a

name: "Check: '${{matrix.connector}}' (skip=${{needs.cdk_changes.outputs[matrix.cdk_extra] == 'false'}})"
steps:
Expand Down
63 changes: 63 additions & 0 deletions .github/workflows/docker-build.yml
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: SDM Docker Build

on:
push:
branches:
- main
paths:
- 'airbyte_cdk/**'
- '.github/workflows/docker-build.yml'
- 'Dockerfile'
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved
workflow_dispatch:
inputs:
version-tag:
description: "Version tag for the image (optional)"
required: false
type: string
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved

jobs:
docker_build:
name: Build and Publish SDM Docker Image
runs-on: ubuntu-latest
permissions:
id-token: write # Required for trusted publishing
contents: write # Required for artifact uploads

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Build Docker image
run: |
docker build -t airbyte/source-declarative-manifest:build-test .
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved

- name: Test image
run: |
docker run airbyte/source-declarative-manifest:build-test spec

aaronsteers marked this conversation as resolved.
Show resolved Hide resolved
- name: Login to Docker Hub
if: ${{ success() && (github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch') }}
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}

- name: Push to Docker Hub
if: ${{ success() && (github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch') }}
run: |
# Always tag with commit SHA
docker tag airbyte/source-declarative-manifest:build-test airbyte/source-declarative-manifest:${{ github.sha }}
docker push airbyte/source-declarative-manifest:${{ github.sha }}

# Tag as latest if on main branch
if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
docker tag airbyte/source-declarative-manifest:build-test airbyte/source-declarative-manifest:latest
docker push airbyte/source-declarative-manifest:latest
fi

# Add version tag if provided
if [[ -n "${{ github.event.inputs.version-tag }}" ]]; then
docker tag airbyte/source-declarative-manifest:build-test airbyte/source-declarative-manifest:${{ github.event.inputs.version-tag }}
docker push airbyte/source-declarative-manifest:${{ github.event.inputs.version-tag }}
fi
18 changes: 18 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM docker.io/airbyte/python-connector-base:2.0.0@sha256:c44839ba84406116e8ba68722a0f30e8f6e7056c726f447681bb9e9ece8bd916
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved

WORKDIR /airbyte/integration_code

# Copy project files needed for build
COPY pyproject.toml poetry.lock README.md ./

# Install dependencies - ignore keyring warnings
RUN poetry config virtualenvs.create false \
&& poetry install --only main --no-interaction --no-ansi || true

# Copy source code
COPY airbyte_cdk ./airbyte_cdk

# Build and install the package
RUN poetry build && pip install dist/*.whl

ENTRYPOINT ["poetry", "run", "source-declarative-manifest"]
1 change: 1 addition & 0 deletions airbyte_cdk/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
6 changes: 6 additions & 0 deletions airbyte_cdk/cli/source_declarative_manifest/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from airbyte_cdk.cli.source_declarative_manifest._run import run


__all__ = [
"run",
]
205 changes: 205 additions & 0 deletions airbyte_cdk/cli/source_declarative_manifest/_run.py
ChristoGrab marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
aaronsteers marked this conversation as resolved.
Show resolved Hide resolved
"""Defines the `source-declarative-manifest` connector, which installs alongside CDK.

This file was originally imported from the dedicated connector directory, under the
`airbyte` monorepo.

Usage:

```
pipx install airbyte-cdk
source-declarative-manifest --help
source-declarative-manifest spec
...
```
"""


from __future__ import annotations

import json
import pkgutil
import sys
import traceback
from datetime import datetime
from pathlib import Path
from typing import Any, List, Mapping, Optional

from airbyte_cdk.entrypoint import AirbyteEntrypoint, launch
from airbyte_cdk.models import (
AirbyteErrorTraceMessage,
AirbyteMessage,
AirbyteMessageSerializer,
AirbyteStateMessage,
AirbyteTraceMessage,
ConfiguredAirbyteCatalog,
ConnectorSpecificationSerializer,
TraceType,
Type,
)
from airbyte_cdk.sources.declarative.concurrent_declarative_source import (
ConcurrentDeclarativeSource,
)
from airbyte_cdk.sources.declarative.yaml_declarative_source import YamlDeclarativeSource
from airbyte_cdk.sources.source import TState
from orjson import orjson


class SourceLocalYaml(YamlDeclarativeSource):
"""
Declarative source defined by a yaml file in the local filesystem
"""

def __init__(
self,
catalog: Optional[ConfiguredAirbyteCatalog],
config: Optional[Mapping[str, Any]],
state: TState,
**kwargs,
):
"""
HACK!
Problem: YamlDeclarativeSource relies on the calling module name/path to find the yaml file.
Implication: If you call YamlDeclarativeSource directly it will look for the yaml file in the wrong place. (e.g. the airbyte-cdk package)
Solution: Subclass YamlDeclarativeSource from the same location as the manifest to load.

When can we remove this?
When the airbyte-cdk is updated to not rely on the calling module name/path to find the yaml file.
When all manifest connectors are updated to use the new airbyte-cdk.
When all manifest connectors are updated to use the source-declarative-manifest as the base image.
"""
super().__init__(
catalog=catalog, config=config, state=state, **{"path_to_yaml": "manifest.yaml"}
)


def _is_local_manifest_command(args: List[str]) -> bool:
# Check for a local manifest.yaml file
return Path("/airbyte/integration_code/source_declarative_manifest/manifest.yaml").exists()

aaronsteers marked this conversation as resolved.
Show resolved Hide resolved

def handle_command(args: List[str]) -> None:
if _is_local_manifest_command(args):
handle_local_manifest_command(args)
else:
handle_remote_manifest_command(args)


def _get_local_yaml_source(args: List[str]) -> SourceLocalYaml:
try:
config, catalog, state = _parse_inputs_into_config_catalog_state(args)
return SourceLocalYaml(config=config, catalog=catalog, state=state)
except Exception as error:
print(
orjson.dumps(
AirbyteMessageSerializer.dump(
AirbyteMessage(
type=Type.TRACE,
trace=AirbyteTraceMessage(
type=TraceType.ERROR,
emitted_at=int(datetime.now().timestamp() * 1000),
error=AirbyteErrorTraceMessage(
message=f"Error starting the sync. This could be due to an invalid configuration or catalog. Please contact Support for assistance. Error: {error}",
stack_trace=traceback.format_exc(),
),
),
)
)
).decode()
)
raise error


def handle_local_manifest_command(args: List[str]) -> None:
source = _get_local_yaml_source(args)
launch(source, args)


def handle_remote_manifest_command(args: List[str]) -> None:
"""Overrides the spec command to return the generalized spec for the declarative manifest source.

This is different from a typical low-code, but built and published separately source built as a ManifestDeclarativeSource,
because that will have a spec method that returns the spec for that specific source. Other than spec,
the generalized connector behaves the same as any other, since the manifest is provided in the config.
"""
if args[0] == "spec":
json_spec = pkgutil.get_data(
aaronsteers marked this conversation as resolved.
Show resolved Hide resolved
"airbyte_cdk.cli.source_declarative_manifest",
"spec.json",
)
spec_obj = json.loads(json_spec)
spec = ConnectorSpecificationSerializer.load(spec_obj)

message = AirbyteMessage(type=Type.SPEC, spec=spec)
print(AirbyteEntrypoint.airbyte_message_to_string(message))
else:
source = create_declarative_source(args)
launch(source, args)


def create_declarative_source(args: List[str]) -> ConcurrentDeclarativeSource:
"""Creates the source with the injected config.

This essentially does what other low-code sources do at build time, but at runtime,
with a user-provided manifest in the config. This better reflects what happens in the
connector builder.
"""
try:
config, catalog, state = _parse_inputs_into_config_catalog_state(args)
if "__injected_declarative_manifest" not in config:
raise ValueError(
f"Invalid config: `__injected_declarative_manifest` should be provided at the root of the config but config only has keys {list(config.keys())}"
)
return ConcurrentDeclarativeSource(
config=config,
catalog=catalog,
state=state,
source_config=config.get("__injected_declarative_manifest"),
)
except Exception as error:
print(
orjson.dumps(
AirbyteMessageSerializer.dump(
AirbyteMessage(
type=Type.TRACE,
trace=AirbyteTraceMessage(
type=TraceType.ERROR,
emitted_at=int(datetime.now().timestamp() * 1000),
error=AirbyteErrorTraceMessage(
message=f"Error starting the sync. This could be due to an invalid configuration or catalog. Please contact Support for assistance. Error: {error}",
stack_trace=traceback.format_exc(),
),
),
)
)
).decode()
)
raise error


def _parse_inputs_into_config_catalog_state(
args: List[str],
) -> (Optional[Mapping[str, Any]], Optional[ConfiguredAirbyteCatalog], List[AirbyteStateMessage]):
parsed_args = AirbyteEntrypoint.parse_args(args)
config = (
ConcurrentDeclarativeSource.read_config(parsed_args.config)
if hasattr(parsed_args, "config")
else None
)
catalog = (
ConcurrentDeclarativeSource.read_catalog(parsed_args.catalog)
if hasattr(parsed_args, "catalog")
else None
)
state = (
ConcurrentDeclarativeSource.read_state(parsed_args.state)
if hasattr(parsed_args, "state")
else []
)

return config, catalog, state


def run() -> None:
args = sys.argv[1:]
handle_command(args)
17 changes: 17 additions & 0 deletions airbyte_cdk/cli/source_declarative_manifest/spec.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"documentationUrl": "https://docs.airbyte.com/integrations/sources/low-code",
"connectionSpecification": {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Low-code source spec",
"type": "object",
"required": ["__injected_declarative_manifest"],
"additionalProperties": true,
"properties": {
"__injected_declarative_manifest": {
"title": "Low-code manifest",
"type": "object",
"description": "The low-code manifest that defines the components of the source."
}
}
aaronsteers marked this conversation as resolved.
Show resolved Hide resolved
}
}
12 changes: 6 additions & 6 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ pyrate-limiter = "~3.1.0"
python-dateutil = "*"
python-ulid = "^3.0.0"
PyYAML = "^6.0.1"
rapidfuzz = "^3.10.1"
aaronsteers marked this conversation as resolved.
Show resolved Hide resolved
requests = "*"
requests_cache = "*"
wcmatch = "10.0"
Expand Down Expand Up @@ -104,6 +105,10 @@ sphinx-docs = ["Sphinx", "sphinx-rtd-theme"]
vector-db-based = ["langchain", "openai", "cohere", "tiktoken"]
sql = ["sqlalchemy"]

[tool.poetry.scripts]

source-declarative-manifest = "airbyte_cdk.cli.source_declarative_manifest:run"

aaronsteers marked this conversation as resolved.
Show resolved Hide resolved
[tool.isort]
skip = ["__init__.py"] # TODO: Remove after this is fixed: https://github.com/airbytehq/airbyte-python-cdk/issues/12

Expand Down
Loading