Skip to content

Commit

Permalink
Merge pull request #1 from FelipeMezzarana/Add-CI/DC-Pipeline
Browse files Browse the repository at this point in the history
Add Github Actions workflow
  • Loading branch information
FelipeMezzarana authored Mar 2, 2024
2 parents cdc60c4 + a235517 commit 9909428
Show file tree
Hide file tree
Showing 45 changed files with 2,154 additions and 552 deletions.
36 changes: 36 additions & 0 deletions .github/workflows/run_tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: run-unit-integration-tests-and-check-style

on:
workflow_call:
pull_request:
branches:
- main

jobs:
test:
runs-on: ubuntu-latest

permissions:
contents: write

steps:
- name: checkout-code
uses: actions/checkout@v4

- name: build-linter-image
run: docker build -f Dockerfile.linting -t app-tox-worker .

- name: check-linting-tests
run: docker run app-tox-worker tox

- name: build-unit-test-image
run: docker build --file Dockerfile.unit_tests --tag app_unit_tests .

- name: run-unit-tests
run: docker run --network=host --volume="$PWD/coverage/":/var/coverage/ app_unit_tests

- name: build-integration-test-image
run: docker build --file Dockerfile.integration_tests --tag app_integration_tests .

- name: run-integration-tests
run: docker run app_integration_tests
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
.ipynb_checkpoints
*.pyc
.DS_Store
.vscode/launch.json
Binary file removed Database/currency_exchange_db.db
Binary file not shown.
10 changes: 10 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM python:3.10

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy everything in
COPY src /src
WORKDIR /

CMD python3 -m src.main
12 changes: 12 additions & 0 deletions Dockerfile.integration_tests
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM python:3.10

# Install requirements
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy everything in
COPY . /repository
WORKDIR /repository

# Run unit tests by default
CMD ["python3", "-m", "pytest", "-s", "tests/integration"]
8 changes: 8 additions & 0 deletions Dockerfile.linting
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM python:3.11

RUN pip install tox

COPY . /repository
WORKDIR /repository

CMD tox
12 changes: 12 additions & 0 deletions Dockerfile.unit_tests
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM python:3.10

# Install requirements
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy everything in
COPY . /repository
WORKDIR /repository

# Run unit tests by default
CMD ["python3", "-m", "pytest", "tests/unit", "--cov", "src", "--cov-config", "tox.ini", "--cov-report", "html", "--cov-report", "term"]
78 changes: 59 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ ETL process to:
+ Extract data about 273 Currencies exchange rates from a public API
+ Organize, transform and store data in two tables (Dollar and Euro based rates) in a SQLite DB
+ Generate a Customized Excel report (decision-making tool)
+ Write unit tests to ensure data quality and availability
+ Orchestrate a job with Airflow to recurrently run all steps

# Step 1: ETL

Data will be Extract from a public API https://github.com/fawazahmed0/currency-api. The API return a daily updated json with all the exchange rates for the for the selected base currency and date.
Data will be Extract from a public API https://github.com/fawazahmed0/exchange-api/tree/main. The API return a daily updated json with all the exchange rates for the for the selected base currency and date.

The script [update_currency_exchange.py](update_currency_exchange.py) is responsible for the whole ETL process. In short, this script contains functions to:

Expand All @@ -34,20 +33,61 @@ it's easier to show than to describe:
![png](readme_files/report_print.PNG)


# Step 3: Unit Tests

Although I'm calling this the "step 3" (just to keep the logical order) it will be the first step to be executed in the pipeline.

The scrip [test_api_input.py](test_api_input.py) you will run a few tests and save a txt log file in the folder [log](log). Tests are checking:
+ API connection
+ DB connection
+ API latest date (test if API is being updated)
+ Quantity of currencies the API returned, test id there are new currencies available

# Step 4: Orchestrate a Job/ Run the Pipeline

There are two ways to run the pipeline responsible for the three stages of the process (Unit Tests, ETL, Excel report)

The first and simplest is through the script [main.py](main.py) , running it from the CLI will execute all the steps in the pipeline.

The second is option is to orchestrate a job with Airflow. the DAG [dag_currency_exchange_etl.py](dag_currency_exchange_etl.py) will also run all the steps in the pipeline, it will only be necessary to have an active Airflow server.
# Step 3: Orchestrate a Job/ Run the Pipeline

There are two ways to run the pipeline responsible for all stages of the process:

The first and simplest is through the script [main.py](main.py) , running it from the CLI or from Docker [run.sh](run.sh) will execute all the steps in the pipeline.

The second is option is to orchestrate a job with Airflow. the DAG [dag_currency_exchange_etl.py](src/airflow/dag_currency_exchange_etl.py) will also run all the steps in the pipeline, it will only be necessary to have an active Airflow server.


# Usage

App:
```shell
# Run through Docker
./run.sh
# Run through Pytohn
python3 -m src.main
```

Linting:
```shell
./run_linting.sh
```

Unit tests:
```shell
./run_unit_tests.sh
```
Integration tests:
```shell
./run_integration_tests.sh
```


# Structure

```bash
├── coverage
├── readme_files
├── src
│ ├── airflow
│ ├── database
│ ├── modules
│ └── reports
└── tests
├── integration
└── unit
└── sample_data
```

- `coverage` (not present in Github) is created when you run unit tests, and contains HTML for the code coverage. Specifically, open `coverage/index.html` to view the code coverage. 100% coverage is required.
- [readme_files](readme_files) Images used in readme.
- [src](src) Contains the application code.
- [src/airflow](src/airflow/) DAG file to run the app through Airflow.
- [src/database](src/database) SQLLite DB.
- [src/modules](src/pipeline/) Modules to run the ETL pipeline and generate the Excel report.
- [src/reports](src/reports/) Contains the generated Excel reports.
- [tests](tests) unit tests, integration tests and data samples.
200 changes: 0 additions & 200 deletions create_report.py

This file was deleted.

Loading

0 comments on commit 9909428

Please sign in to comment.