Skip to content

Commit

Permalink
Merge branch 'master' into dependabot/pip/azure-mgmt-storage-21.2.1
Browse files Browse the repository at this point in the history
  • Loading branch information
asmorodskyi authored Oct 11, 2024
2 parents c971767 + 5a169f9 commit 25f1c09
Show file tree
Hide file tree
Showing 24 changed files with 191 additions and 419 deletions.
12 changes: 6 additions & 6 deletions .github/workflows/container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/login-action@0d4c9c5ea7693da7b068278f7b52bda2a190a446
uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
Expand All @@ -45,7 +45,7 @@ jobs:

- name: Build Docker image (PCW)
if: ${{ matrix.suffix == 'main' }}
uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25
uses: docker/build-push-action@32945a339266b759abcbdc89316275140b0fc960
with:
context: .
file: containers/Dockerfile
Expand All @@ -54,7 +54,7 @@ jobs:
labels: ${{ steps.meta.outputs.labels }}
- name: Build Docker image (K8S)
if: ${{ matrix.suffix == 'k8s' }}
uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25
uses: docker/build-push-action@32945a339266b759abcbdc89316275140b0fc960
with:
context: .
file: containers/Dockerfile_${{ matrix.suffix }}
Expand All @@ -77,7 +77,7 @@ jobs:
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/login-action@0d4c9c5ea7693da7b068278f7b52bda2a190a446
uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
Expand All @@ -91,7 +91,7 @@ jobs:

- name: Build and push Docker image (PCW)
if: ${{ matrix.suffix == 'main' }}
uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25
uses: docker/build-push-action@32945a339266b759abcbdc89316275140b0fc960
with:
context: .
file: containers/Dockerfile
Expand All @@ -100,7 +100,7 @@ jobs:
labels: ${{ steps.meta.outputs.labels }}
- name: Build and push Docker image (K8S)
if: ${{ matrix.suffix == 'k8s' }}
uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25
uses: docker/build-push-action@32945a339266b759abcbdc89316275140b0fc960
with:
context: .
file: containers/Dockerfile_${{ matrix.suffix }}
Expand Down
85 changes: 39 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
> **Anton Smorodskyi**: YES, constantly! :partygeeko: I see it in every palm on the beach !
PublicCloud-Watcher (PCW) is a web app which monitors, displays and deletes resources on various Cloud Service Providers (CSPs).
PCW has two main flows :
PCW has three main flows :

1. **Update run ( implemented in [ocw/lib/db.py](ocw/lib/db.py) )** Executed every 45 minutes. Concentrates on deleting VMs (in case of Azure Resource Groups).
- Each update scans accounts defined in configuration file and writes the obtained results into a local sqlite database. Newly discovered entities get assigned an obligatory time-to-life value (TTL). TTL may be taken from tag `openqa_ttl` if entity is tagged with such tag if not PCW will check `pcw.ini` for `updaterun/default_ttl` setting and if setting is not defined than PCW will use hard-coded value from [webui/settings.py](webui/settings.py). Database has a web UI where you can manually trigger certain entity deletion.
Expand All @@ -19,45 +19,47 @@ PCW has two main flows :
2. **Cleanup ( implemented in [ocw/lib/cleanup.py](ocw/lib/cleanup.py) )** Execution via django command. Concentrates on everything except VM deletion. This vary a lot per CSP so let's clarify that on per provider level.
- For Azure such entities monitored (check details in [ocw/lib/azure.py](ocw/lib/azure.py)):
a. bootdiagnostics
b. Blobs in `sle-images` container
c. Disks assigned to certain resource groups
d. Images assigned to certain resource groups
b. Blobs in all containers
c. Disks assigned to certain resource group defined in pcw.ini ('azure-storage-resourcegroup')
d. Images assigned to certain resource group defined in pcw.ini ('azure-storage-resourcegroup')
e. Image versions assigned to certain resource group defined in pcw.ini ('azure-storage-resourcegroup')
- For EC2 such entities monitored (check details in [ocw/lib/ec2.py](ocw/lib/ec2.py)):
a. Images in all regions defined
b. Snapshots in all region defined
c. Volumes in all regions defined
d. VPC's ( deletion of VPC means deletion of all assigned to VPC entities first ( security groups , networks etc. ))
- For GCE deleting disks, images & network resources (check details in [ocw/lib/gce.py](ocw/lib/gce.py))
- For Openstack deleting instances, images & keypairs (check details in [ocw/lib/openstack.py](ocw/lib/openstack.py)
3. **Dump entities quantity ( implemented in [ocw/lib/dumpstate.py](ocw/lib/dumpstate.py) )**. To be able to react fast on possible bugs in PCW and/or unexpected creation of many resources there is ability to dump real time data from each CSP into defined InfluxDB instance. This allow building real-time dashboards and/or setup notification flow.


The fastest way to run PCW is via the provided containers, as described in the [Running a container](#running-a-container) section.

## Install
# Usage

## Python virtualenv

### Requirements files

PCW has 3 sets of virtual env requirements files :
- [requirements.txt](requirements.txt) common usage for everything except K8S related cleanups
- [requirements_k8s.txt](requirements_k8s.txt) due to high volume of dependencies needed only in single use case (k8s cleanups) they excluded in independent category
- [requirements_test.txt](requirements_test.txt) contains dependencies allowing to run pcw's unit tests
It's recommended to setup `pcw` in a virtual environment to avoid package collisions:

```bash
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
```

## Configure and run

### Configuration
Configuration of PCW happens via a global config file in `/etc/pcw.ini`. See [templates/pcw.ini](templates/pcw.ini) for a configuration template. To start, copy the template over:

```bash
cp templates/pwc.ini /etc/pcw.ini
```

### CSP credentials
To be able to connect to CSP PCW needs Service Principal details. Depending on namespaces defined in `pcw.ini` PCW will expect some JSON files to be created
under `/var/pcw/[namespace name]/[Azure/EC2/GCE/Openstack].json`. See [templates/var/example_namespace/](templates/var/example_namespace/) for examples.
under `/var/pcw/[namespace name]/[Azure/EC2/GCE].json`. See [templates/var/example_namespace/](templates/var/example_namespace/) for examples.

PCW supports email notifications about left-over instances. See the `notify` section therein and their corresponding comments.

### Build and run

```bash
# Setup virtual environment
virtualenv env
Expand All @@ -79,20 +81,22 @@ python manage.py runserver

By default, PCW runs on http://127.0.0.1:8000/

## Building PCW containers
## PCW in container

### Available containers

In [containers](containers/) folder you main find several Dockerfiles to build several different images:

- [Dockerfile](containers/Dockerfile) image based on [bci-python3.11](https://registry.suse.com/categories/bci-devel/repositories/bci-python311) and can be used to run all PCW functionality except k8s cleanup
- [Dockerfile_k8s](containers/Dockerfile_k8s) image based on [bci-python3.11](https://registry.suse.com/categories/bci-devel/repositories/bci-python311) and can be used to run k8s cleanup
- [Dockerfile_k8s_dev](containers/Dockerfile_k8s_dev) and [Dockerfile_dev](containers/Dockerfile_dev) images which contains same set of dependencies as [Dockerfile](containers/Dockerfile) and [Dockerfile_k8s](containers/Dockerfile_k8s) and expect PCW source code to be mounted as volumes. Very usefull for development experiments

## Running a container
### Execution

You can use the already build containers within [this repository](https://github.com/orgs/SUSE/packages?repo_name=pcw):

```bash
podman pull ghcr.io/suse/pcw:latest
podman pull ghcr.io/suse/pcw_main:latest
podman pull ghcr.io/suse/pcw_k8s:latest
```

Expand All @@ -104,7 +108,7 @@ The PCW container supports two volumes to be mounted:
To create a container using e.g. the data directory `/srv/pcw` for both volumes and expose port 8000, run the following:

```bash
podman create --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -v <local creds storage>:/var/pcw -p 8000:8000/tcp ghcr.io/suse/pcw:latest
podman create --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v /srv/pcw/db:/pcw/db -v <local creds storage>:/var/pcw -p 8000:8000/tcp ghcr.io/suse/pcw_main:latest
podman start pcw
```

Expand All @@ -113,7 +117,7 @@ The `pcw` container runs by default the [/pcw/container-startup](containers/cont
```bash
podman exec pcw /pcw/container-startup help

podman run -ti --rm --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v <local creds storage>:/var/pcw -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw:latest /pcw/container-startup help
podman run -ti --rm --hostname pcw --name pcw -v /srv/pcw/pcw.ini:/etc/pcw.ini -v <local creds storage>:/var/pcw -v /srv/pcw/db:/pcw/db -p 8000:8000/tcp ghcr.io/suse/pcw_main:latest /pcw/container-startup help
```

To create an user within the created container named `pcw`, run
Expand All @@ -122,33 +126,36 @@ To create an user within the created container named `pcw`, run
podman exec pcw /pcw/container-startup createuser admin USE_A_STRONG_PASSWORD
```

## Devel version of container
### Devel version

There is [devel version](containers/Dockerfile_dev) of container file. Main difference is that source files are not copied into image but expected to be mounted via volume. This ease development in environment close as much as possible to production run.

Expected use would be :

```bash
make podman-container-devel
make container-devel
podman run -v <local path to ini file>:/etc/pcw.ini -v <local creds storage>:/var/pcw -v <path to this folder>:/pcw -t pcw-devel "python3 manage.py <any command available>"
```

## Test and debug

## Codecov

Running codecov locally require installation of `pytest pytest-cov codecov`.
Then you can run it with
### Testing

```bash
BROWSER=$(xdg-settings get default-web-browser)
pytest -v --cov=./ --cov-report=html && $BROWSER htmlcov/index.html
virtualenv .
source bin/activate
pip install -r requirements_test.txt
make test
```

and explore the results in your browser
The tests contain a Selenium test for the webUI that uses Podman. Make sure that you have the latest [geckodriver](https://github.com/mozilla/geckodriver/releases) installed anywhere in your `PATH` and that the `podman.socket` is enabled:
`systemctl --user enable --now podman.socket`

Set the `SKIP_SELENIUM` environment variable when running `pytest` or `make test` to skip the Selenium test.

## Debug
### Debug

To simplify problem investigation pcw has two [django commands](https://docs.djangoproject.com/en/3.1/howto/custom-management-commands/) :
To simplify problem investigation pcw has several [django commands](https://docs.djangoproject.com/en/3.1/howto/custom-management-commands/) :

[cleanup](ocw/management/commands/cleanup.py)

Expand All @@ -160,17 +167,3 @@ To simplify problem investigation pcw has two [django commands](https://docs.dja

those allows triggering core functionality without web UI. It is highly recommended to use `dry_run = True` in `pcw.ini` in
such cases.

## Testing

```bash
virtualenv .
source bin/activate
pip install -r requirements_test.txt
make test
```

The tests contain a Selenium test for the webUI that uses Podman. Make sure that you have the latest [geckodriver](https://github.com/mozilla/geckodriver/releases) installed anywhere in your `PATH` and that the `podman.socket` is enabled:
`systemctl --user enable --now podman.socket`

Set the `SKIP_SELENIUM` environment variable when running `pytest` or `make test` to skip the Selenium test.
2 changes: 1 addition & 1 deletion containers/Dockerfile_k8s
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM registry.suse.com/bci/python:3.11

RUN zypper -n in gcc tar gzip kubernetes1.24-client aws-cli && zypper clean && rm -rf /var/cache
RUN zypper -n in gcc tar gzip kubernetes1.28-client aws-cli && zypper clean && rm -rf /var/cache

# Google cli installation
RUN curl -sf https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-415.0.0-linux-x86_64.tar.gz | tar -zxf - -C /opt \
Expand Down
2 changes: 1 addition & 1 deletion containers/Dockerfile_k8s_dev
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM registry.suse.com/bci/python:3.11

RUN zypper -n in gcc tar gzip kubernetes1.24-client aws-cli && zypper clean && rm -rf /var/cache
RUN zypper -n in gcc tar gzip kubernetes1.28-client aws-cli && zypper clean && rm -rf /var/cache

# Google cli installation
RUN curl -sf https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-415.0.0-linux-x86_64.tar.gz | tar -zxf - -C /opt \
Expand Down
3 changes: 0 additions & 3 deletions ocw/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ class ProviderChoice(ChoiceEnum):
GCE = 'Google'
EC2 = 'EC2'
AZURE = 'Azure'
OSTACK = 'Openstack'

@staticmethod
def from_str(provider):
Expand All @@ -30,8 +29,6 @@ def from_str(provider):
return ProviderChoice.EC2
if provider.upper() == ProviderChoice.AZURE:
return ProviderChoice.AZURE
if provider.upper() == ProviderChoice.OSTACK:
return ProviderChoice.OSTACK
raise ValueError(f"{provider} is not convertable to ProviderChoice")


Expand Down
4 changes: 0 additions & 4 deletions ocw/lib/cleanup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from ocw.lib.azure import Azure
from ocw.lib.ec2 import EC2
from ocw.lib.gce import GCE
from ocw.lib.openstack import Openstack
from ocw.lib.eks import EKS
from ocw.lib.emailnotify import send_mail, send_cluster_notification
from ocw.enums import ProviderChoice
Expand All @@ -26,9 +25,6 @@ def cleanup_run():
if ProviderChoice.GCE in providers:
GCE(namespace).cleanup_all()

if ProviderChoice.OSTACK in providers:
Openstack(namespace).cleanup_all()

except Exception as ex:
logger.exception("[%s] Cleanup failed!", namespace)
send_mail(f'{type(ex).__name__} on Cleanup in [{namespace}]', traceback.format_exc())
Expand Down
3 changes: 1 addition & 2 deletions ocw/lib/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from ocw.apps import getScheduler
from webui.PCWConfig import PCWConfig
from ..models import Instance, StateChoice, ProviderChoice, CspInfo
from .emailnotify import send_mail, send_leftover_notification
from .emailnotify import send_mail
from .azure import Azure
from .ec2 import EC2
from .gce import GCE
Expand Down Expand Up @@ -155,7 +155,6 @@ def update_run() -> None:
traceback.format_exc())

auto_delete_instances()
send_leftover_notification()
RUNNING = False
if not error_occured:
LAST_UPDATE = datetime.now(timezone.utc)
Expand Down
38 changes: 38 additions & 0 deletions ocw/lib/dump_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from webui.PCWConfig import PCWConfig
from ocw.lib.azure import Azure
from ocw.lib.ec2 import EC2
from ocw.lib.gce import GCE
from ocw.enums import ProviderChoice
from ocw.lib.influx import Influx

Expand Down Expand Up @@ -65,6 +66,43 @@ def dump_state():
namespace,
EC2(namespace).count_all_volumes
)
Influx().dump_resource(
ProviderChoice.EC2.value,
Influx.VPC_QUANTITY,
namespace,
EC2(namespace).count_all_vpc
)
if ProviderChoice.GCE in providers:
Influx().dump_resource(
ProviderChoice.GCE.value,
Influx.VMS_QUANTITY,
namespace,
GCE(namespace).count_all_instances
)
Influx().dump_resource(
ProviderChoice.GCE.value,
Influx.IMAGES_QUANTITY,
namespace,
GCE(namespace).count_all_images
)
Influx().dump_resource(
ProviderChoice.GCE.value,
Influx.DISK_QUANTITY,
namespace,
GCE(namespace).count_all_disks
)
Influx().dump_resource(
ProviderChoice.GCE.value,
Influx.BLOB_QUANTITY,
namespace,
GCE(namespace).count_all_blobs
)
Influx().dump_resource(
ProviderChoice.GCE.value,
Influx.NETWORK_QUANTITY,
namespace,
GCE(namespace).count_all_networks
)
except Exception:
logger.exception(
"[%s] Dump state failed!: \n %s", namespace, traceback.format_exc()
Expand Down
14 changes: 13 additions & 1 deletion ocw/lib/ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,12 @@ def vpc_can_be_deleted(self, resource_vpc, vpc_id) -> bool:

def report_cleanup_results(self, vpc_errors: list, vpc_notify: list, vpc_locked: list) -> None:
if len(vpc_errors) > 0:
send_mail(f'Errors on VPC deletion in [{self._namespace}]', '\n'.join(vpc_errors))
# this is most common error message which we can not fix.
# So no point to spam us with notifications about it
known_error = "An error occurred (DependencyViolation) when calling the DeleteVpc operation"
filtered = [x for x in vpc_errors if known_error not in x]
if len(filtered) > 0:
send_mail(f'Errors on VPC deletion in [{self._namespace}]', '\n'.join(vpc_errors))
if len(vpc_notify) > 0:
send_mail(f'{len(vpc_notify)} VPC\'s should be deleted, skipping due vpc-notify-only=True', ','.join(vpc_notify))
if len(vpc_locked) > 0:
Expand All @@ -345,6 +350,13 @@ def count_all_volumes(self) -> int:
all_volumes_cnt += len(response['Volumes'])
return all_volumes_cnt

def count_all_vpc(self) -> int:
all_vpcs = 0
for region in self.all_regions:
response = self.ec2_client(region).describe_vpcs(Filters=[{'Name': 'isDefault', 'Values': ['false']}])
all_vpcs += len(response['Vpcs'])
return all_vpcs

def cleanup_images(self, valid_period_days: float) -> None:
self.log_dbg('Call cleanup_images')
for region in self.all_regions:
Expand Down
Loading

0 comments on commit 25f1c09

Please sign in to comment.