Note: Includes links to downloads/instructions; varies according to OS (mine is Windows); single ETL cluster
-
ZAGDB (.sql in this repo for reference)
-
PostgreSQL (https://www.postgresql.org/download/)
-
pfAdmin (https://www.pgadmin.org/download/pgadmin-4-windows/)
-
Create ZAGDB database on pdAdmin
- Note down credentials
- CREATE TABLE
- INSERT VALUES
- PyCharm (https://www.jetbrains.com/pycharm/download/)
- Back-End coding!
Data extracted and saved
-
Amazon AWS (https://aws.amazon.com/)
-
AWS S3 (https://s3.console.aws.amazon.com/s3/home?region=us-east-2)
Data loaded on AWS S3 Bucket
- AWS Redshift Cluster steps 1-5 (https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html)
Note down the username and password
Ways to query:
- Redshift Query Editor:
- PyCharm Execute Query
- Extra option if switching between different versions.
- Python virtual environments allow developers to control software dependencies in Python code. They're useful ways of ensuring that the correct package/library versions are consistently used every time the software runs. Virtual environments also help ensure that the results from running code are reproducible.
If you would like to discuss my project or any new opportunities, please email me at p.ankur.715@gmail.com or https://www.linkedin.com/in/ankurpatel715/.