Skip to content

Commit

Permalink
Merge pull request #156 from asmorodskyi/eks_cleanup
Browse files Browse the repository at this point in the history
Improve running clusters reporting
  • Loading branch information
asmorodskyi authored Aug 3, 2022
2 parents 42ac2a8 + fbdef5c commit 685a5bb
Show file tree
Hide file tree
Showing 12 changed files with 117 additions and 27 deletions.
23 changes: 23 additions & 0 deletions Dockerfile_dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM registry.suse.com/bci/python:3.10

ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 UWSGI_WSGI_FILE=/pcw/webui/wsgi.py UWSGI_MASTER=1
ENV UWSGI_HTTP_AUTO_CHUNKED=1 UWSGI_HTTP_KEEPALIVE=1 UWSGI_LAZY_APPS=1 UWSGI_WSGI_ENV_BEHAVIOR=holy

## System preparation steps ################################################# ##

# !!! Runtime changes won't affect requirements.txt
COPY requirements.txt /tmp/
# * Install system requirements
# * Install pip requirements
# * Empty system cache to conserve some space
RUN zypper -n in python310-devel python310-pip gcc libffi-devel && pip3.10 install -r /tmp/requirements.txt && rm -rf /var/cache

WORKDIR /pcw

## Finalize ################################################################# ##

VOLUME /pcw/db

EXPOSE 8000/tcp

ENTRYPOINT ["sh", "-c", "/pcw/container-startup", "$@"]
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,5 @@ docker-container:
docker build . -t ${CONT_TAG}
podman-container:
podman build . -t ${CONT_TAG}

podman-container-devel:
podman build -f Dockerfile_dev -t pcw-devel
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ The PCW container supports two volumes to be mounted:

To create a container using e.g. the data directory `/srv/pcw` for both volumes and expose port 8000, run the following:

podman create --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw:latest
podman create --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -v <local creds storage>:/var/pcw -p 8000:8000/tcp ghcr.io/suse/pcw:latest
podman start pcw

For usage in docker simply replace `podman` by `docker` in the above command.
Expand All @@ -103,11 +103,21 @@ The `pcw` container runs by default the `/pcw/container-startup` startup helper

podman exec pcw /pcw/container-startup help

podman run -ti --rm --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw:latest /pcw/container-startup help
podman run -ti --rm --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v <local creds storage>:/var/pcw -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw:latest /pcw/container-startup help

To create the admin superuser within the created container named `pcw`, run

podman run -ti --rm -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw:latest /pcw/container-startup createsuperuser --email admin@example.com --username admin
podman run -ti --rm -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -v <local creds storage>:/var/pcw -p 8000:8000/tcp ghcr.io/suse/pcw:latest /pcw/container-startup createsuperuser --email admin@example.com --username admin

## Devel version of container

There is [devel version](Dockerfile_dev) of container file. Main difference is that source files are not copied into image but expected to be mounted via volume. This ease development in environment close as much as possible to production run.

Expected use would be :

make podman-container-devel
podman run -v <local path to ini file>:/etc/pcw.ini -v <local creds storage>:/var/pcw -v <path to this folder>:/pcw -t pcw-devel <any target from container-startup>


## Codecov

Expand Down
1 change: 0 additions & 1 deletion container-startup
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,3 @@ case "$CMD" in
*)
python3 manage.py $@
esac

51 changes: 47 additions & 4 deletions ocw/lib/EC2.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import boto3
from botocore.exceptions import ClientError
import re
from datetime import date, datetime, timedelta
from datetime import date, datetime, timedelta, timezone
from ocw.lib.emailnotify import send_mail
import traceback
import time
Expand All @@ -21,6 +21,10 @@ def __init__(self, namespace: str):
self.all_regions = ConfigFile().getList('default/ec2_regions')
else:
self.all_regions = self.get_all_regions()
if PCWConfig.has('clusters/ec2_regions'):
self.cluster_regions = ConfigFile().getList('clusters/ec2_regions')
else:
self.cluster_regions = self.get_all_regions()

def __new__(cls, vault_namespace):
if vault_namespace not in EC2.__instances:
Expand Down Expand Up @@ -69,10 +73,11 @@ def eks_client(self, region):
return self.__eks_client[region]

def all_clusters(self):
clusters = list()
for region in self.all_regions:
clusters = dict()
for region in self.cluster_regions:
response = self.eks_client(region).list_clusters()
[clusters.append(cluster) for cluster in response['clusters']]
if len(response['clusters']):
clusters[region] = response['clusters']
return clusters

@staticmethod
Expand Down Expand Up @@ -158,6 +163,44 @@ def delete_instance(self, region, instance_id):
else:
raise ex

def wait_for_empty_nodegroup_list(self, region, clusterName, timeout_minutes=20):
if self.dry_run:
self.log_info("Skip waiting due to dry-run mode")
return True
self.log_info("Waiting empty nodegroup list in {}", clusterName)
end = datetime.now(timezone.utc) + timedelta(minutes=timeout_minutes)
resp_nodegroup = self.eks_client(region).list_nodegroups(clusterName=clusterName)

while datetime.now(timezone.utc) < end and len(resp_nodegroup['nodegroups']) > 0:
time.sleep(20)
resp_nodegroup = self.eks_client(region).list_nodegroups(clusterName=clusterName)
if len(resp_nodegroup['nodegroups']) > 0:
self.log_info("Still waiting for {} nodegroups to disappear", len(resp_nodegroup['nodegroups']))

def delete_all_clusters(self):
self.log_info("Deleting all clusters!")
for region in self.cluster_regions:
response = self.eks_client(region).list_clusters()
if len(response['clusters']):
self.log_info("Found {} cluster(s) in {}", len(response['clusters']), region)
for cluster in response['clusters']:
resp_nodegroup = self.eks_client(region).list_nodegroups(clusterName=cluster)
if len(resp_nodegroup['nodegroups']):
self.log_info("Found {} nodegroups for {}", len(resp_nodegroup['nodegroups']), cluster)
for nodegroup in resp_nodegroup['nodegroups']:
if self.dry_run:
self.log_info("Skipping {} nodegroup deletion due to dry-run mode", nodegroup)
else:
self.log_info("Deleting {}", nodegroup)
self.eks_client(region).delete_nodegroup(
clusterName=cluster, nodegroupName=nodegroup)
self.wait_for_empty_nodegroup_list(region, cluster)
if self.dry_run:
self.log_info("Skipping {} cluster deletion due to dry-run mode", cluster)
else:
self.log_info("Finally deleting {} cluster", cluster)
self.eks_client(region).delete_cluster(name=cluster)

def parse_image_name(self, img_name):
regexes = [
# openqa-SLES12-SP5-EC2.x86_64-0.9.1-BYOS-Build1.55.raw.xz
Expand Down
2 changes: 1 addition & 1 deletion ocw/lib/azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def check_credentials(self):
raise AuthenticationError("Invalid Azure credentials")

def bs_client(self):
if(self.__blob_service_client is None):
if (self.__blob_service_client is None):
storage_account = PCWConfig.get_feature_property(
'cleanup', 'azure-storage-account-name', self._namespace)
storage_key = self.get_storage_key(storage_account)
Expand Down
6 changes: 4 additions & 2 deletions ocw/lib/cleanup.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,10 @@ def list_clusters():
for namespace in PCWConfig.get_namespaces_for('clusters'):
try:
clusters = EC2(namespace).all_clusters()
logger.info("%d clusters found", len(clusters))
send_cluster_notification(namespace, clusters)
quantity = sum(len(c1) for c1 in clusters.keys())
logger.info("%d clusters found", quantity)
if quantity > 0:
send_cluster_notification(namespace, clusters)
except Exception as e:
logger.exception("[{}] List clusters failed!".format(namespace))
send_mail('{} on List clusters in [{}]'.format(type(e).__name__, namespace), traceback.format_exc())
Expand Down
9 changes: 6 additions & 3 deletions ocw/lib/emailnotify.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,13 @@ def send_leftover_notification():

def send_cluster_notification(namespace, clusters):
if len(clusters) and PCWConfig.has('notify'):
clusters_str = ' '.join([str(cluster) for cluster in clusters])
clusters_str = ''
for region in clusters:
clusters_list = ' '.join([str(cluster) for cluster in clusters[region]])
clusters_str = '{}\n{} : {}'.format(clusters_str, region, clusters_list)
logger.debug("Full clusters list - %s", clusters_str)
send_mail("EC2 clusters found", clusters_str,
receiver_email=PCWConfig.get_feature_property('cluster.notify', 'to', namespace))
send_mail("[{}] EC2 clusters found".format(namespace), clusters_str,
receiver_email=PCWConfig.get_feature_property('notify', 'to', namespace))


def send_mail(subject, message, receiver_email=None):
Expand Down
2 changes: 0 additions & 2 deletions ocw/lib/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,7 @@ def __init__(self, namespace: str):
def read_auth_json(self):
authcachepath = Path('/var/pcw/{}/{}.json'.format(self._namespace, self.__class__.__name__))
if authcachepath.exists():
self.log_info('Loading credentials')
with authcachepath.open() as f:
self.log_info("Try loading auth from file {}".format(f.name))
return json.loads(f.read())
else:
self.log_err('Credentials not found in {}. Terminating', authcachepath)
Expand Down
11 changes: 11 additions & 0 deletions ocw/management/commands/rmclusters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from django.core.management.base import BaseCommand
from webui.settings import PCWConfig
from ocw.lib.EC2 import EC2


class Command(BaseCommand):
help = 'Delete all leftovers in all providers (according to pcw.ini)'

def handle(self, *args, **options):
for namespace in PCWConfig.get_namespaces_for('clusters'):
EC2(namespace).delete_all_clusters()
2 changes: 1 addition & 1 deletion ocw/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def age_formated(self):
return format_seconds(self.age.total_seconds())

def ttl_formated(self):
return format_seconds(self.ttl.total_seconds()) if(self.ttl) else ""
return format_seconds(self.ttl.total_seconds()) if (self.ttl) else ""

def all_time_fields(self):
all_time_pattern = "(age={}, first_seen={}, last_seen={}, ttl={})"
Expand Down
18 changes: 9 additions & 9 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
boto3
azure-mgmt-compute==12.0.0
azure-mgmt-storage==10.0.0
azure-mgmt-resource==10.0.0
azure-storage-blob==12.4.0
azure-mgmt-compute==27.2.0
azure-mgmt-storage==20.0.0
azure-mgmt-resource==21.1.0
azure-storage-blob==12.13.0
msrestazure==0.6.4
uwsgi==2.0.20
requests==2.25.1
requests==2.28.1
django==4.0.6
django-tables2==2.4.1
django-filter==22.1
django-bootstrap4==22.1
texttable
oauth2client==4.1.3
google-api-python-client==2.0.2
google-cloud-storage==1.37.0
python-dateutil==2.8.1
oauth2client
google-api-python-client==2.55.0
google-cloud-storage==2.4.0
python-dateutil
apscheduler

0 comments on commit 685a5bb

Please sign in to comment.