This project is based on Vishal Bulbule's sales data pipeline project.
- Project Name: Create data pipeline for sales data
- Project Description: This project is to allow a smooth process of uploading sales data & visualizing this data in a report. It will allow company executives to make better informed decisions.
I used E-Commerce Data dataset from Kaggle. This file contains data in the below sample format.
- I manually split the file into three sub-files, each containing data for:
- USA
- UK
- France
- Other than that, no data transformations. It is loaded to the sales table exactly as in the .csv file.
- BigQuery dataset for storing tables
- Looker for visualizing data
All of the data flows to one table:
From that table, a few reporting-level views are created:
Technology | Used for: |
---|---|
Python Flask | creating a simple webpage for uploading the data |
Google Cloud Storage (GCS) | Storing the files uploaded by users |
Cloud Functions | Creating a “sensor” script which uploads files from GCS to BigQuery |
BigQuery | Storing the data (& its views) |
Looker | Visualizing the data |
- No specific local setup required
- Google Cloud Platform:
- free trial account is sufficient
- User with owner role
- Enabled APIs: Storage API & BigQuery API (Google might ask for a few additional APIs, just enable them when prompted)
-
Locally:
- configure your ADC credentials (instruction):
- Here's what to do if you're having any access issues:
# Verify that the below file shows the right project nano /home/szymon/.config/gcloud/application_default_credentials.json # If Not -> gcloud auth application-default revoke gcloud auth application-default login --project=morning-report-428716
- run main.py. It will create a simple web page for uploading the data:
- Here's what to do if you're having any access issues:
- configure your ADC credentials (instruction):
-
Create Bucket:
- sales-data (for storing uploaded files)
-
Create Cloud Function sales-data-load
- Set the trigger on your GCS bucket \
-
In BigQuery:
- Create dataset sales
- Don't create a table. It will be auto-created once you upload your first file.
-
Locally:
- Go to the webpage for uploading data (http://127.0.0.1:5000/)
- Upload france.csv, usa.csv & uk.csv
-
In Bigquery:
- Verify that sales table got created
- Create views (see bigquery.sql)
-
In Looker:
Your Cloud Function will add new data to the sales table every time a sales rep uploads the data via your webpage.
Your Looker Report will be auto updated.