Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self hosted runners with aproxy #326

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
53e9f2c
Use self-hosted runners for integration tests
carlcsaposs-canonical Aug 7, 2023
4951d30
temp branch
carlcsaposs-canonical Aug 7, 2023
68970e2
snap channel
carlcsaposs-canonical Aug 7, 2023
cf0ee8d
secrets
carlcsaposs-canonical Aug 7, 2023
f993b8c
microk8s.kubectl
carlcsaposs-canonical Aug 9, 2023
24d1873
Use 60s interval for fast forward instead of 5s
carlcsaposs-canonical Aug 23, 2023
a49344b
Use large runner
carlcsaposs-canonical Aug 23, 2023
fa03c83
Move the integration_test_charm to point ot add-aproxy-snap in the wo…
phvalguima Oct 18, 2023
c87b874
dummy commit to rerun workflow
phvalguima Oct 18, 2023
149aee1
Move build charm to self-hosted runner
phvalguima Oct 18, 2023
e7ed184
Move lint to add-aproxy-snap
phvalguima Oct 18, 2023
744c7b7
Dummy change to retrigger and download newer workflows
phvalguima Oct 18, 2023
df1fe43
Dummy change to retrigger and download newer workflows (2)
phvalguima Oct 18, 2023
004566f
Dummy change to retrigger and download newer workflows (2)
phvalguima Oct 18, 2023
4262c5f
Dummy change to retrigger and download newer workflows (4)
phvalguima Oct 18, 2023
74ba429
Dummy commit to retrigger workflow update
phvalguima Oct 18, 2023
42f41e6
Dummy commit to refresh workflows
phvalguima Oct 18, 2023
dec6f38
Dummy commit to trigger refresh in workflows (2)
phvalguima Oct 18, 2023
1753072
Remove dependency from lint and unit tests for now
phvalguima Oct 18, 2023
261d93a
Dummy commit to refresh workflow
phvalguima Oct 18, 2023
21fa2a1
Dummy commit to trigger workflow
phvalguima Oct 18, 2023
d1513bb
Simplify the workflow for the integration tests
phvalguima Oct 18, 2023
31e890f
Comment lint for now, add support for unit testing
phvalguima Oct 18, 2023
8fea528
fix build step: remove runs-on
phvalguima Oct 18, 2023
3ea9cde
Comment codecov report for now
phvalguima Oct 18, 2023
6ea47e5
remove last ref to gh-hosted
phvalguima Oct 18, 2023
b9b1819
Add test for NO_PROXY + lightkube fix
phvalguima Oct 18, 2023
bd79b2e
remove wrong ref for trust_env
phvalguima Oct 18, 2023
43af202
Add trust_env
phvalguima Oct 19, 2023
ea6b905
Add missing netloc to urlparse output
phvalguima Oct 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 114 additions & 36 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
name: Tests

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
group: ${{ github.workflow }}-${{ github.ref }}-large
cancel-in-progress: true

on:
Expand All @@ -14,21 +14,48 @@ on:
workflow_call:

jobs:
lint:
name: Lint
uses: canonical/data-platform-workflows/.github/workflows/lint.yaml@v4.2.3
# lint:
# name: Lint
# uses: canonical/data-platform-workflows/.github/workflows/lint.yaml@v4.2.3
## uses: canonical/data-platform-workflows/.github/workflows/lint.yaml@add-aproxy-snap
# runs-on: [self-hosted, linux, X64, large, jammy]

unit-test:
strategy:
fail-fast: false
matrix:
juju-version: ["2.9", "3.1"]
name: Unit test charm
runs-on: ubuntu-latest
runs-on: [self-hosted, linux, X64, large, jammy]
timeout-minutes: 5
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Install venv and pipx
run: |
sudo apt-get install python3-pip python3-venv -y
python3 -m pip install --user pipx
python3 -m pipx ensurepath
- name: Install tox & poetry
run: |
pipx install tox
pipx install poetry
- name: Select test stability level
id: select-test-stability
shell: python
run: |
import os

if "${{ github.event_name }}" == "schedule":
print("Running unstable and stable tests")
output = "mark_expression="
else:
print("Skipping unstable tests")
output = "mark_expression=not unstable"

with open(os.environ["GITHUB_OUTPUT"], "a") as file:
file.write(output)

- name: Install tox & poetry
run: |
pipx install tox
Expand All @@ -39,52 +66,59 @@ jobs:
# This env var is only to indicate Juju version to "simulate" in the unit tests
# No libjuju is being actually used in unit testing
LIBJUJU_VERSION_SPECIFIER: ${{ matrix.juju-version }}
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v3
# Seems not to work atm for self-hosted, investigate better later
# - name: Upload Coverage to Codecov
# uses: codecov/codecov-action@v3

build:
name: Build charm
uses: canonical/data-platform-workflows/.github/workflows/build_charms_with_cache.yaml@v4.2.3
uses: canonical/data-platform-workflows/.github/workflows/build_charms_with_cache.yaml@add-aproxy-snap
with:
charmcraft-snap-revision: 1349 # version 2.3.0
permissions:
actions: write # Needed to manage GitHub Actions cache

gh-hosted-collect-integration-tests:
name: (GH hosted) Collect integration test groups
needs:
- lint
- unit-test
runs-on: ubuntu-latest
collect-integration-tests:
name: Collect integration test groups
runs-on: [self-hosted, linux, X64, large, jammy]
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Install venv and pipx
run: |
sudo apt-get install python3-pip python3-venv -y
python3 -m pip install --user pipx
python3 -m pipx ensurepath
- name: Install tox & poetry
run: |
pipx install tox
pipx install poetry
- name: Select test stability level
id: select-test-stability
shell: python
run: |
if [[ "${{ github.event_name }}" == "schedule" ]]
then
echo Running unstable and stable tests
echo "mark_expression=" >> "$GITHUB_OUTPUT"
else
echo Skipping unstable tests
echo "mark_expression=not unstable" >> "$GITHUB_OUTPUT"
fi
import os

if "${{ github.event_name }}" == "schedule":
print("Running unstable and stable tests")
output = "mark_expression="
else:
print("Skipping unstable tests")
output = "mark_expression=not unstable"

with open(os.environ["GITHUB_OUTPUT"], "a") as file:
file.write(output)
- name: Collect test groups
id: collect-groups
run: tox run -e integration -- tests/integration -m '${{ steps.select-test-stability.outputs.mark_expression }}' --collect-groups
outputs:
groups: ${{ steps.collect-groups.outputs.groups }}

gh-hosted-integration-test:
integration-test:
strategy:
fail-fast: false
matrix:
groups: ${{ fromJSON(needs.gh-hosted-collect-integration-tests.outputs.groups) }}
groups: ${{ fromJSON(needs.collect-integration-tests.outputs.groups) }}
juju-snap-channel: ["2.9/stable", "3.1/stable"]
include:
- juju-snap-channel: "3.1/stable"
Expand All @@ -107,17 +141,47 @@ jobs:
- juju-snap-channel: "3.1/stable"
groups:
job_name: "high_availability/test_upgrade_from_stable.py | group 1"
name: ${{ matrix.juju-snap-channel }} - (GH hosted) ${{ matrix.groups.job_name }}
name: ${{ matrix.juju-snap-channel }} - (self hosted) ${{ matrix.groups.job_name }}
needs:
- lint
- unit-test
- build
- gh-hosted-collect-integration-tests
runs-on: ubuntu-latest
- collect-integration-tests
runs-on: [self-hosted, linux, X64, xlarge, jammy]
timeout-minutes: 120
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Check environment resources
run: |
sudo df -h
sudo free
sudo cat /proc/cpuinfo
- name: Set up aproxy
run: |
sudo snap install aproxy --edge
sudo snap set aproxy proxy=squid.internal:3128
sudo nft -f - << EOF
define default-ip = $(ip route get $(ip route show 0.0.0.0/0 | grep -oP 'via \K\S+') | grep -oP 'src \K\S+')
define private-ips = { 10.0.0.0/8, 127.0.0.1/8, 172.16.0.0/12, 192.168.0.0/16 }
table ip aproxy
flush table ip aproxy
table ip aproxy {
chain prerouting {
type nat hook prerouting priority dstnat; policy accept;
ip daddr != \$private-ips tcp dport { 80, 443 } counter dnat to \$default-ip:8443
}

chain output {
type nat hook output priority -100; policy accept;
ip daddr != \$private-ips tcp dport { 80, 443 } counter dnat to \$default-ip:8443
}
}
EOF
- name: Install venv and pipx
run: |
sudo apt-get install python3-pip python3-venv -y
python3 -m pip install --user pipx
python3 -m pipx ensurepath

- name: Install tox & poetry
run: |
pipx install tox
Expand Down Expand Up @@ -151,10 +215,24 @@ jobs:
run: tox run -e integration -- "${{ matrix.groups.path_to_test_file }}" --group="${{ matrix.groups.group_number }}" -m '${{ steps.select-test-stability.outputs.mark_expression }}'
env:
LIBJUJU_VERSION_SPECIFIER: ${{ matrix.libjuju-version }}
SECRETS_FROM_GITHUB: |
{
"AWS_ACCESS_KEY": "${{ secrets.AWS_ACCESS_KEY }}",
"AWS_SECRET_KEY": "${{ secrets.AWS_SECRET_KEY }}",
"GCP_ACCESS_KEY": "${{ secrets.GCP_ACCESS_KEY }}",
"GCP_SECRET_KEY": "${{ secrets.GCP_SECRET_KEY }}",
}

# integration-test:
# name: Integration test charm
# needs:
# - build
# uses: canonical/data-platform-workflows/.github/workflows/integration_test_charm.yaml@add-aproxy-snap
# with:
# artifact-name: ${{ needs.build.outputs.artifact-name }}
# cloud: microk8s
# microk8s-snap-channel: latest/stable
# juju-agent-version: 2.9.44
# secrets:
# integration-test: |
# {
# "AWS_ACCESS_KEY": "${{ secrets.AWS_ACCESS_KEY }}",
# "AWS_SECRET_KEY": "${{ secrets.AWS_SECRET_KEY }}",
# "GCP_ACCESS_KEY": "${{ secrets.GCP_ACCESS_KEY }}",
# "GCP_SECRET_KEY": "${{ secrets.GCP_SECRET_KEY }}",
# }
# docker-hub-username: ${{ secrets.DOCKER_HUB_USERNAME }}
# docker-hub-password: ${{ secrets.DOCKER_HUB_TOKEN }}
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,7 @@ def isolate_instance_from_cluster(ops_test: OpsTest, unit_name: str) -> None:
env["KUBECONFIG"] = os.path.expanduser("~/.kube/config")

try:
subprocess.check_output(["kubectl", "apply", "-f", temp_file.name], env=env)
subprocess.check_output(["microk8s.kubectl", "apply", "-f", temp_file.name], env=env)
except subprocess.CalledProcessError as e:
logger.error(e.output)
logger.error(e.stderr)
Expand All @@ -535,7 +535,7 @@ def remove_instance_isolation(ops_test: OpsTest) -> None:
env = os.environ
env["KUBECONFIG"] = os.path.expanduser("~/.kube/config")
subprocess.check_output(
f"kubectl -n {ops_test.model.info.name} delete networkchaos network-loss-primary",
f"microk8s.kubectl -n {ops_test.model.info.name} delete networkchaos network-loss-primary",
shell=True,
env=env,
)
Expand Down
34 changes: 17 additions & 17 deletions tests/integration/high_availability/scripts/destroy_chaos_mesh.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,38 +11,38 @@ fi

destroy_chaos_mesh() {
echo "deleting api-resources"
for i in $(kubectl api-resources | awk '/chaos-mesh/ {print $1}'); do
timeout 30 kubectl delete "${i}" --all --all-namespaces || true
for i in $(microk8s.kubectl api-resources | awk '/chaos-mesh/ {print $1}'); do
timeout 30 microk8s.kubectl delete "${i}" --all --all-namespaces || true
done

if kubectl get mutatingwebhookconfiguration | grep -q 'chaos-mesh-mutation'; then
timeout 30 kubectl delete mutatingwebhookconfiguration chaos-mesh-mutation || true
if microk8s.kubectl get mutatingwebhookconfiguration | grep -q 'chaos-mesh-mutation'; then
timeout 30 microk8s.kubectl delete mutatingwebhookconfiguration chaos-mesh-mutation || true
fi

if kubectl get validatingwebhookconfiguration | grep -q 'chaos-mesh-validation-auth'; then
timeout 30 kubectl delete validatingwebhookconfiguration chaos-mesh-validation-auth || true
if microk8s.kubectl get validatingwebhookconfiguration | grep -q 'chaos-mesh-validation-auth'; then
timeout 30 microk8s.kubectl delete validatingwebhookconfiguration chaos-mesh-validation-auth || true
fi

if kubectl get validatingwebhookconfiguration | grep -q 'chaos-mesh-validation'; then
timeout 30 kubectl delete validatingwebhookconfiguration chaos-mesh-validation || true
if microk8s.kubectl get validatingwebhookconfiguration | grep -q 'chaos-mesh-validation'; then
timeout 30 microk8s.kubectl delete validatingwebhookconfiguration chaos-mesh-validation || true
fi

if kubectl get clusterrolebinding | grep -q 'chaos-mesh'; then
if microk8s.kubectl get clusterrolebinding | grep -q 'chaos-mesh'; then
echo "deleting clusterrolebindings"
readarray -t args < <(kubectl get clusterrolebinding | awk '/chaos-mesh/ {print $1}')
timeout 30 kubectl delete clusterrolebinding "${args[@]}" || true
readarray -t args < <(microk8s.kubectl get clusterrolebinding | awk '/chaos-mesh/ {print $1}')
timeout 30 microk8s.kubectl delete clusterrolebinding "${args[@]}" || true
fi

if kubectl get clusterrole | grep -q 'chaos-mesh'; then
if microk8s.kubectl get clusterrole | grep -q 'chaos-mesh'; then
echo "deleting clusterroles"
readarray -t args < <(kubectl get clusterrole | awk '/chaos-mesh/ {print $1}')
timeout 30 kubectl delete clusterrole "${args[@]}" || true
readarray -t args < <(microk8s.kubectl get clusterrole | awk '/chaos-mesh/ {print $1}')
timeout 30 microk8s.kubectl delete clusterrole "${args[@]}" || true
fi

if kubectl get crd | grep -q 'chaos-mesh.org'; then
if microk8s.kubectl get crd | grep -q 'chaos-mesh.org'; then
echo "deleting crds"
readarray -t args < <(kubectl get crd | awk '/chaos-mesh.org/ {print $1}')
timeout 30 kubectl delete crd "${args[@]}" || true
readarray -t args < <(microk8s.kubectl get crd | awk '/chaos-mesh.org/ {print $1}')
timeout 30 microk8s.kubectl delete crd "${args[@]}" || true
fi

if [ -n "${chaos_mesh_ns}" ] && sg snap_microk8s -c "microk8s.helm3 repo list --namespace=${chaos_mesh_ns}" | grep -q 'chaos-mesh'; then
Expand Down
23 changes: 23 additions & 0 deletions tests/integration/high_availability/test_replication.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,29 @@

TIMEOUT = 15 * 60

# Quick hack: force NO_PROXY env to have the IP/hostname of the cluster
from lightkube.core.generic_client import GenericClient
from lightkube.config.client_adapter import httpx_parameters
from lightkube.config.kubeconfig import SingleConfig
import os
import httpx
from urllib.parse import urlparse


def NoProxyExtendClient(config: SingleConfig, timeout: httpx.Timeout, trust_env=True) -> httpx.Client:
"""Reviews the NO_PROXY setting: it must contain the base_url's IP/hostname, otherwise add it."""
if "HTTP_PROXY" in os.environ.keys() or "HTTPS_PROXY" in os.environ.keys():
# urlparse returns an <ip>|<hostname>:<port>, we do not need the port
host = urlparse(config.cluster.server).netloc.split(":")[0]
print("TESTING" + os.environ["NO_PROXY"])
if host not in os.environ["NO_PROXY"].split(","): # compare with a list, as we want to avoid matching "192.168.0.1" "192.168.0.1/24,10.0.0.0/8" string
os.environ["NO_PROXY"] = ",".join([host, os.environ["NO_PROXY"] ])
print("TESTING" + os.environ["NO_PROXY"])
return httpx.Client(**httpx_parameters(config, timeout, trust_env))

# Override the AdapterClient
GenericClient.AdapterClient = staticmethod(NoProxyExtendClient)


@pytest.mark.group(1)
async def test_build_and_deploy(ops_test: OpsTest) -> None:
Expand Down
5 changes: 4 additions & 1 deletion tests/integration/high_availability/test_upgrade.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,9 @@ async def test_upgrade_from_edge(ops_test: OpsTest, continuous_writes) -> None:

logger.info("Build charm locally")
charm = await ops_test.build_charm(".")
await application.local_refresh(path=charm, resources=resources)
async with ops_test.fast_forward("60s"):
await ops_test.model.wait_for_idle(apps=[mysql_app_name], status="active", timeout=TIMEOUT)

logger.info("Refresh the charm")
await application.refresh(path=charm, resources=resources)
Expand All @@ -127,7 +130,7 @@ async def test_upgrade_from_edge(ops_test: OpsTest, continuous_writes) -> None:
await action.wait()

logger.info("Wait for upgrade to complete")
async with ops_test.fast_forward("60s"):
async with ops_test.fast_forward("60s", fast_interval="60s"):
await ops_test.model.wait_for_idle(
apps=[MYSQL_APP_NAME], status="active", idle_period=30, timeout=TIMEOUT
)
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/relations/test_mysql_root.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,8 @@ async def test_deploy_and_relate_osm_bundle(ops_test: OpsTest) -> None:
)


@pytest.mark.abort_on_fail
@pytest.mark.group(1)
@pytest.mark.abort_on_fail
async def test_osm_pol_operations(ops_test: OpsTest) -> None:
"""Test the existence of databases and tables created by osm-pol's migrations."""
show_databases_sql = [
Expand Down
Loading