Skip to content

ETL Process for Currency Quotes Data" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.

License

Notifications You must be signed in to change notification settings

ivanildobarauna-dev/data-pipeline-sync-ingest

Repository files navigation

Data Consumer API Project: ETL Process for Currency Quotes Data

Project Status License GitHub release (latest by date) Python Version

Black pylint

CI-CD DOCKER-DEPLOY

Codecov

Code Coverage KPI Graph

codecov

Project Stack

Python Docker Poetry Pandas Jupyter Matplotlib GitHub Actions

Project description

ETL Process for Currency Quotes Data" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.

Contributing

See the following docs:

Project Highlights:

  • MVC Architecture: Implementation of the Model-View-Controller (MVC) architecture, separating business logic, user interface and data manipulation for better organization and code maintenance.

  • Comprehensive Testing: Development of tests to ensure the quality and robustness of the code at various stages of the ETL process

  • Parallelism in Models: Use of parallelism in the data transformation and loading stages, increasing efficiency and reducing processing time.

  • Fire-Forget Messaging: Use of messaging (queue.queue) in the fire-forget model to manage files generated between the transformation and loading stages, ensuring a continuous and efficient data flow.

  • Parameter Validation: Sending valid parameters based on the request data source itself, ensuring the integrity and accuracy of the information processed.

  • Configuration Management: Use of a configuration module to manage endpoints, retry times and number of attempts, providing flexibility and ease of adjustment.

  • Common Module: Implementation of a common module for code reuse across the project, promoting consistency and reducing redundancies.

  • Dynamic Views: Generation of views with index.html using nbConvert, based on consolidated data from a Jupyter Notebook that integrates the generated files into a single dataset for exploration and analysis.

ETL Process:

  • Extraction: A single request is made to a specific endpoint to obtain quotes from multiple currencies.
  • Transformation: The request response is processed, separating each currency quote and storing it in individual files in Parquet format, facilitating data organization and retrieval.
  • Upload: Individual Parquet files are consolidated into a single dataset using a Jupyter Notebook, allowing for comprehensive analysis and valuable insights into currency quotes.

In summary, this project offers a robust and efficient solution for collecting, processing and analyzing currency quote data, using advanced architecture and parallelism techniques to optimize each step of the ETL process.

Repository structure
  • data/: Stores raw data in Parquet format.
    • ETH-EUR-1713658884.parquet: Example: Raw data for ETH-EUR quotes. file_name = symbol + extraction unix timestamp
  • notebooks/: Contains the data_explorer.ipynb notebook for data exploration.
  • etl/: Contains the project's source code.
    • run.py: Entrypoint of the application
  • common/: Library for code reuse and standardization.
    • utils/
      • logs.py: Package for log management.
    • common.py: Package for common code tasks like output directory retrieval or default timestamp.
    • logs/: For storing debug logs.
  • controller/
    • pipeline.py: Receives data extraction requests and orchestrates ETL models .
  • models/:
    • extract/
      • api_data_extractor.py: Receives the parameters from the controller, sends the request and returns in JSON.
    • transform/
      • publisher.py: Receives the JSON from the extractor, separates the dictionary by currency and publishes each of them to a queue to be processed individually.
    • load/
      • parquet_loader.py: In a separate thread, receive a new dictionary from queue that the transformer is publishing and generates .parquet files in the default directory.
  • views/: For storing data analysis and visualization.
How to run the application locally

Step by Step

Ensure Python 3.10 or higher is installed on your machine

  • Clone the repository:
$ git clone https://github.com/ivdatahub/data-consumer-api.git
  • Go to directory
$ cd data-consumer-api
  • Install dependencies and execute project
$ poetry install && poetry run python etl/run.py

Learn more about poetry

ETL and Data Analysis Results:

You can see the complete data analysis, the Jupyter Notebook is deployed in GitHub Pages

About

ETL Process for Currency Quotes Data" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •