Skip to content

lochness-labs/rds-ingestion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database Ingestion for Data Lakes

A Data Lake ingestion connector for databases

The main component of the repository is an AWS Glue job, of type pythonshell, which uses a glue connection and AWS Wrangler to make the parquet version of the extracted data.

The infrastructure is described (IaC) and deployed with Serverless Framework (https://www.serverless.com/framework/). The entry point is rds-ingest/serverless.yml.

The infrastructure has been developed on the AWS Cloud Platform.

Getting Started

Requirements

For local development only

Environments setup

The rds-ingest/env/ contains the environment configuration files, one for each of your AWS environments.

The name of the files corresponds to the environment names. For example: substitute example_enviroment.yml with dev.yml for a development environment.

Development environment setup

  1. Create virtualenv: virtualenv -p python3 venv
  2. Activate virtualenv: source venv/bin/activate
  3. Install requirements: pip install -r requirements.txt

Deployment instructions

  1. You need two AWS S3 buckets, one for the glue code and one as the Data Lake, if you have them, just keep in mind the names for the nexts steps, otherwise create the buckets on S3.

  2. Make a copy of rds-ingest/env/example-environment.yml, name it as your desired environment's name (for example dev.yml or prod.yml) and substitute:

    • example-data-s3-bucket-name for your data lake AWS S3 bucket.
    • example-code-s3-bucket-name for your code AWS S3 bucket.
    • eu-west-1 with your AWS region.
  3. Make a Glue Connection to your RDS instance and test if it works.

    • For example, we named it test-connection.
  4. Deploy on AWS with: sls deploy --stage {stage}.

  5. Substitute {stage} with one of the available stages defined as the YAML files in the rds-ingest/env/ directory.

Contributing

Feel free to contribute! Create an issue and submit PRs (pull requests) in the repository. Contributing to this project assumes a certain level of familiarity with AWS, the Python language and concepts such as virtualenvs, pip, modules, etc.

Try to keep commits inside the rules of https://www.conventionalcommits.org/. The sailr.json file is used for configuration of the commit hook, as per: https://github.com/craicoverflow/sailr.

License

This project is licensed under the Apache License 2.0.

See LICENSE for more information.

Acknowledgements

Many thanks to the mantainers of the open source libraries used in this project:

Serverless plugins

These are the Serverless plugin used on this project:

Contact us if we missed an acknowledgement to your library.


This is a project created by Linkalab and Talent Garden.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages