Marketing-Campaign-Data-Analysis

Marketing-Campaign-Data-Analysis Using Apache Spark💥🐝

Cloud:

Version Control System:

Programming Language - PYTHON:

BIG DATA TOOL AND SOFTWARES:

Project Introduction:
So, I had this project where I wanted to analyze some marketing campaign data. I decided to use Apache Spark, specifically PySpark, and I did this in the cloud using Google Cloud Platform (GCP).
Data Loading:
Loading Data into HDFS:
- To get started, I needed to bring in the data. So, I loaded three JSON files - ad_campaigns_data.json, user_profile_data.json, and store_data.json into HDFS on GCP.
[x]PySpark Data Analysis:
Analyzing Data with PySpark:
- This was the exciting part. I used a Jupyter notebook and wrote PySpark code to tackle some specific analytical challenges. Here's what I did:
  - I delved into the data for each campaign_id, date, hour, os_type, and value to gather all the events and count them.
  - I repeated this process for campaign_id, date, hour, store_name, and value.
  - I also explored data for campaign_id, date, hour, gender_type, and value, gathering event counts.
Data Storage:
Storing Processed Data:
- To keep things organized, I stored the processed JSON data from each of these analytical problems in separate HDFS output directories.
Hive Table Creation:
Creating Hive Tables:
- Once I had the output data comfortably sitting in HDFS, I took the next step. I created external Hive tables. These tables allowed me to easily run SQL-like queries on the data. I used Json Serde, a serializer/deserializer, to make this happen.

So, that's the breakdown of my project. It involved data loading, PySpark analysis, data storage, and creating Hive tables for convenient querying.

Key Takeaway:
This project demonstrated the practical application of Apache Spark (PySpark) in a cloud environment, using Google Cloud Platform (GCP) to analyze marketing campaign data. The project highlighted the crucial stages of data loading, PySpark analysis, organized data storage, and the creation of external Hive tables for effective data querying. It showcases the power of big data tools and cloud computing in solving real-world analytical challenges.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Hive External Table.HQL		Hive External Table.HQL
Marketing Campaign Data Analysis Using PySpark.ipynb		Marketing Campaign Data Analysis Using PySpark.ipynb
README.md		README.md
ad_campaigns_data.json		ad_campaigns_data.json
store_data.json		store_data.json
user_profile_data.json		user_profile_data.json