Skip to content

Commit

Permalink
Merge pull request #21 from ivdatahub/feature/UpdateProject
Browse files Browse the repository at this point in the history
chore: update docs + package project
  • Loading branch information
IvanildoBarauna authored Sep 20, 2024
2 parents 02306b2 + deae037 commit 6065014
Show file tree
Hide file tree
Showing 3 changed files with 82 additions and 23 deletions.
65 changes: 65 additions & 0 deletions .github/workflows/deploy-image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#
name: Docker deploy

# Configures this workflow to run every time a change is pushed to the branch called `release`.
on:
push:
branches:
- main
workflow_dispatch:

# Defines two custom environment variables for the workflow. These are used for the Container registry domain, and a name for the Docker image that this workflow builds.
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

# There is a single job in this workflow. It's configured to run on the latest available version of Ubuntu.
jobs:
build-and-push-image:
runs-on: ubuntu-latest
# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in this job.
permissions:
contents: read
packages: write
attestations: write
id-token: write
#
steps:
- name: Checkout repository
uses: actions/checkout@v4
# Uses the `docker/login-action` action to log in to the Container registry registry using the account and password that will publish the packages. Once published, the packages are scoped to the account defined here.
- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# This step uses [docker/metadata-action](https://github.com/docker/metadata-action#about) to extract tags and labels that will be applied to the specified image. The `id` "meta" allows the output of this step to be referenced in a subsequent step. The `images` value provides the base name for the tags and labels.
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
# This step uses the `docker/build-push-action` action to build the image, based on your repository's `Dockerfile`. If the build succeeds, it pushes the image to GitHub Packages.
# It uses the `context` parameter to define the build's context as the set of files located in the specified path. For more information, see "[Usage](https://github.com/docker/build-push-action#usage)" in the README of the `docker/build-push-action` repository.
# It uses the `tags` and `labels` parameters to tag and label the image with the output from the "meta" step.
- name: Build and push Docker image
id: push
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

# This step generates an artifact attestation for the image, which is an unforgeable statement about where and how it was built. It increases supply chain security for people who consume the image. For more information, see "[AUTOTITLE](/actions/security-guides/using-artifact-attestations-to-establish-provenance-for-builds)."
- name: Generate artifact attestation
uses: actions/attest-build-provenance@v1
with:
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
subject-digest: ${{ steps.push.outputs.digest }}
push-to-registry: true

environment:
name: github-packages
url: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
14 changes: 7 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# Contributing to GCP-streaming-pipeline
# Contributing to data-consumer-pipeline

Firstly, thank you very much for your interest in contributing to GCP-streaming-pipeline! This document provides guidelines to help ensure the contribution process is smooth and efficient for everyone involved.
Firstly, thank you very much for your interest in contributing to data-consumer-pipeline! This document provides guidelines to help ensure the contribution process is smooth and efficient for everyone involved.

## How to Contribute

### 1. Fork the Repository

1. Go to [repository page](https://github.com/IvanildoBarauna/GCP-streaming-pipeline).
1. Go to [repository page](https://github.com/ivdatahub/data-consumer-pipeline).
2. Click the "Fork" button in the top right corner to create a copy of the repository on your GitHub.

### 2. Clone the Repository

Clone the forked repository to your local machine using the command:

```sh
git clone https://github.com/seu-usuario/GCP-streaming-pipeline.git
git clone https://github.com/<your-username>/data-consumer-pipeline.git
```

### 3. Create a Branch
Expand Down Expand Up @@ -67,7 +67,7 @@ git push origin branchname

## Reporting Bugs

If you find a bug, please open an [issue](https://github.com/IvanildoBarauna/GCP-streaming-pipeline/issues) and provide as much information as possible, including:
If you find a bug, please open an [issue](https://github.com/ivdatahub/data-consumer-pipeline/issues) and provide as much information as possible, including:

- Detailed description of the problem.
- Steps to reproduce the issue.
Expand All @@ -76,8 +76,8 @@ If you find a bug, please open an [issue](https://github.com/IvanildoBarauna/GCP

## Improvement suggestions

If you have suggestions for improvements, please open an [issue](https://github.com/IvanildoBarauna/GCP-streaming-pipeline/issues) and describe your idea in detail.
If you have suggestions for improvements, please open an [issue](https://github.com/ivdatahub/data-consumer-pipeline/issues) and describe your idea in detail.

## Thanks

Thanks for considering contributing to GCP-streaming-pipeline! Every contribution is valuable and helps to improve the project.
Thanks for considering contributing to data-consumer-pipeline! Every contribution is valuable and helps to improve the project.
26 changes: 10 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,44 @@
## Data Consumer Pipeline: Data Pipeline for ingest data in near real time

![Project Status](https://img.shields.io/badge/status-development-yellow?style=for-the-badge&logo=github)
![Python Version](https://img.shields.io/badge/python-3.9-blue?style=for-the-badge&logo=python)
![License](https://img.shields.io/badge/license-MIT-blue?style=for-the-badge&logo=mit)


![Black](https://img.shields.io/badge/code%20style-black-000000.svg?style=for-the-badge&logo=python)
![pylint](https://img.shields.io/badge/pylint-10.00-green?style=for-the-badge&logo=python)

[//]: # ([![CI-CD]&#40;https://img.shields.io/github/actions/workflow/status/IvanildoBarauna/data-consumer-pipeline/CI-CD.yaml?&style=for-the-badge&logo=githubactions&cacheSeconds=60&label=Tests&#41;]&#40;https://github.com/IvanildoBarauna/data-consumer-pipeline/actions/workflows/CI-CD.yml&#41;)

[//]: # ([![IMAGE-DEPLOY]&#40;https://img.shields.io/github/actions/workflow/status/IvanildoBarauna/data-consumer-pipeline/deploy-image.yml?&style=for-the-badge&logo=github&cacheSeconds=60&label=Registry&#41;]&#40;https://github.com/IvanildoBarauna/data-consumer-pipeline/actions/workflows/deploy-cloud-run.yaml&#41;)
[//]: # "[![CI-CD](https://img.shields.io/github/actions/workflow/status/ivdatahub/data-consumer-pipeline/CI-CD.yaml?&style=for-the-badge&logo=githubactions&cacheSeconds=60&label=Tests)](https://github.com/data-consumer-pipeline/data-consumer-pipeline/actions/workflows/CI-CD.yml)"
[//]: # "[![IMAGE-DEPLOY](https://img.shields.io/github/actions/workflow/status/data-consumer-pipeline/data-consumer-pipeline/deploy-image.yml?&style=for-the-badge&logo=github&cacheSeconds=60&label=Registry)](https://github.com/data-consumer-pipeline/data-consumer-pipeline/actions/workflows/deploy-cloud-run.yaml)"
[//]: # "[![GCP-DEPLOY](https://img.shields.io/github/actions/workflow/status/data-consumer-pipeline/data-consumer-pipeline/deploy-cloud-run.yaml?&style=for-the-badge&logo=google&cacheSeconds=60&label=Deploy)](https://github.com/data-consumer-pipeline/data-consumer-pipeline/actions/workflows/deploy-cloud-run.yaml)"

[//]: # ([![GCP-DEPLOY]&#40;https://img.shields.io/github/actions/workflow/status/IvanildoBarauna/data-consumer-pipeline/deploy-cloud-run.yaml?&style=for-the-badge&logo=google&cacheSeconds=60&label=Deploy&#41;]&#40;https://github.com/IvanildoBarauna/data-consumer-pipeline/actions/workflows/deploy-cloud-run.yaml&#41;)


[![Codecov](https://img.shields.io/codecov/c/github/IvanildoBarauna/data-consumer-pipeline?style=for-the-badge&logo=codecov)](https://app.codecov.io/gh/IvanildoBarauna/data-consumer-pipeline)
[![Codecov](https://img.shields.io/codecov/c/github/data-consumer-pipeline/data-consumer-pipeline?style=for-the-badge&logo=codecov)](https://app.codecov.io/gh/data-consumer-pipeline/data-consumer-pipeline)

## Project Summary

Pipeline for processing and consuming streaming data from Pub/Sub, integrating with Dataflow for real-time data processing



## Development Stack

[![My Skills](https://skillicons.dev/icons?i=pycharm,python,github,gcp&perline=7)](https://skillicons.dev)

## Cloud Stack (GCP)

<img src="docs/icons/pubsub.png" Alt="Pub/Sub" width="50" height="50"><img src="docs/icons/dataflow.png" Alt="Dataflow" width="50" height="50"><img src="docs/icons/bigquery.png" Alt="BigQuery" width="50" height="50">

- Pub/Sub: Messaging service provided by GCP for sending and receiving messages between FastAPI and Dataflow pipeline.
- Dataflow: Serverless data processing service provided by GCP for executing the ETL process.
- BigQuery: Fully managed, serverless data warehouse provided by GCP for storing and analyzing large datasets.

## Continuous Integration and Continuous Deployment (CI/CD, DevOps)
![My Skills](https://skillicons.dev/icons?i=githubactions)


![My Skills](https://skillicons.dev/icons?i=githubactions)

## Contributing

See the following docs:

- [Contributing Guide](https://github.com/IvanildoBarauna/data-consumer-pipeline/blob/main/CONTRIBUTING.md)
- [Code Of Conduct](https://github.com/IvanildoBarauna/data-consumer-pipeline/blob/main/CODE_OF_CONDUCT.md)
- [Contributing Guide](https://github.com/ivdatahub/data-consumer-pipeline/blob/main/CONTRIBUTING.md)
- [Code Of Conduct](https://github.com/ivdatahub/data-consumer-pipeline/blob/main/CODE_OF_CONDUCT.md)

## Project Highlights:

Expand All @@ -59,9 +54,8 @@ See the following docs:

- Documentation: Creation of detailed documentation to facilitate the understanding and use of the application, including installation instructions, usage examples and troubleshooting guides.


# Data Pipeline Process:

1. Data Extraction: The data extraction process consists of making requests to the API to obtain the data. The requests are made in parallel workers using Cloud Dataflow to optimize the process. The data is extracted in JSON format.
2. Data Transformation: The data transformation process consists of converting the data to BigQuery Schema. The transformation is done using Cloud Dataflow in parallel workers to optimize the process.
3. Data Loading: The data loading process consists of loading the data into BigQuery. The data is loaded in parallel workers using Cloud Dataflow to optimize the process.
3. Data Loading: The data loading process consists of loading the data into BigQuery. The data is loaded in parallel workers using Cloud Dataflow to optimize the process.

0 comments on commit 6065014

Please sign in to comment.