Batch Pipeline On Docker To Easily Know Customer Purchasing Behaviors

Business Case

Our customers (subscribers) seek help to build skills to deploy simple and viable batch pipelines entirely on Docker involving the following relational and NoSQL databases:

Cassandra
MySQL
Redis

Results

I successfully engineered 3 batch data processing pipelines with PySpark while having the databases entirely on Docker.

I ingested, pre-processed and visualized the data in these databases to validate their successful deployment.

I also analyzed customer purchasing behavior.

Deployment

I plan to write a blog post about how to deploy these 3 batch pipelines on Docker soon. Stay tuned!

Data

I chose the eCommerce behavior data from multi category store available on Kaggle to focus on successfully implementing the 3 batch pipelines.

Real business data requires more pre-processing than the transformations I performed with this data.

Properties of data

Data file contains customer behavior data on a large multi-category online store's website for 1 month (November 2019).

Each row in the file represents an event.

All events are related to products and users
There are 3 different types of events → view, cart and purchase

The 2 purchase funnels are

view → cart → purchase
view → purchase

Here's the distribution of events in the data:

Batch Pipelines on Docker

Implementation

Storage

Cassandra

MySQL

Redis

Analysis

I performed the following analyses on the pre-processed (transformed) data in storage

Views by category

Purchase category vs Volume

Top 20 brands purchased

Purchase conversion volume

Acknowledgement

All data, I based my analysis on, is collected by and belongs to Open CDP project.

Connect with me

Prakash Dontaraju LinkedIn Twitter Medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Batch Pipeline On Docker To Easily Know Customer Purchasing Behaviors

Business Case

Results

Deployment

Data

Properties of data

Batch Pipelines on Docker

Implementation

Storage

Analysis

Acknowledgement

Connect with me

Files

README.md

Latest commit

History

README.md

File metadata and controls

Batch Pipeline On Docker To Easily Know Customer Purchasing Behaviors

Business Case

Results

Deployment

Data

Properties of data

Batch Pipelines on Docker

Implementation

Storage

Analysis

Acknowledgement

Connect with me