- Production Grade Project
- Secure Data Handling with Encryption Decryption
- Comprehensive Logging
- AWS Integration with Boto3
- Data Marts for Targeted Insights
- Reading Transactional Data from AWS S3 Bucket
- Validation of Input Data
- Transforming the Data based on Business use case
- Loading Data into the Data Mart
- In an effort to motivate the billing team, the business unit wants to provide incentives tied to total sales.
- In an effort to enhance customer engagement, the business wants to track its customer base and offer corresponding coupons or discounts.
- It is required to produce a report on a daily basis.
- It is necessary to generate a monthly report at the end of every month.
- Utilized Apache Spark for seamless Data Transformation
- Data volume is generated at 15 GB per day
- The S3 Boto3 SDK has been used to connect with Amazon S3.
- Data is Loaded to the Data Mart for further processing by the Reporting tool.
Project structure:-
my_project/
├── resources/
│ ├── __init__.py
│ ├── dev/
│ │ ├── config.py
│ │ └── requirement.txt
│ └── qa/
│ │ ├── config.py
│ │ └── requirement.txt
│ └── prod/
│ │ ├── config.py
│ │ └── requirement.txt
│ ├── sql_scripts/
│ │ └── table_scripts.sql
├── src/
│ ├── main/
│ │ ├── __init__.py
│ │ └── delete/
│ │ │ ├── aws_delete.py
│ │ │ ├── database_delete.py
│ │ │ └── local_file_delete.py
│ │ └── download/
│ │ │ └── aws_file_download.py
│ │ └── move/
│ │ │ └── move_files.py
│ │ └── read/
│ │ │ ├── aws_read.py
│ │ │ └── database_read.py
│ │ └── transformations/
│ │ │ └── jobs/
│ │ │ │ ├── customer_mart_sql_transform_write.py
│ │ │ │ ├── dimension_tables_join.py
│ │ │ │ ├── main.py
│ │ │ │ └──sales_mart_sql_transform_write.py
│ │ └── upload/
│ │ │ └── upload_to_s3.py
│ │ └── utility/
│ │ │ ├── encrypt_decrypt.py
│ │ │ ├── logging_config.py
│ │ │ ├── s3_client_object.py
│ │ │ ├── spark_session.py
│ │ │ └── my_sql_session.py
│ │ └── write/
│ │ │ ├── database_write.py
│ │ │ └── parquet_write.py
│ ├── test/
│ │ ├── scratch_pad.py.py
│ │ └── generate_csv_data.py
│── readme.md
- Star Schema -
-
Tables and Data Snapshots -
- Data and schema received from transactional table into s3 -
-
Sales (Fact) Table -
-
Customer (Dimension) Table -
- Customers Table Data -
-
Store (Dimension) Table -
- Store Table Data -
-
Product (Dimension) Table -
- Product Table Data -
-
Sales Team (Dimension) Table -
- Sales Team Table Data -
-
Staging Table (For Auditing & Process Track) -
- Status - A - Process is Active and failed
- Status - I - Process is Inactive and completed
- Purpose of Staging table is to indicate the stage started, inactive and failed
-
Customer Data Mart -
- Using this Data Mart we can generate dynamic coupon code.
-
Sales Team Data Mart -
- Incentive is set for top 1st sales person, by default 1% of total sales per month.