⛹🏾♀️ This repository contains the setup_nba_data_lake.py script, which automates the creation of a data lake for NBA analytics using AWS services.
⛹🏾♀️ The script integrates Amazon S3, AWS Glue, and Amazon Athena
⛹🏾♀️ It also sets up the infrastructure needed to store and query NBA-related data.
The setup_nba_data_lake.py script performs the following actions:
🔹 Creates an Amazon S3 bucket to store raw and processed data.
🔹 Uploads sample NBA data (JSON format) to the S3 bucket.
🔹 Creates an AWS Glue database and an external table for querying the data.
🔹 Configures Amazon Athena for querying data stored in the S3 bucket.
🔹 Cloud Provider: AWS
🔹 Core Services: S3, Glue, Athena
🔹 Command-Line Tool: AWS CloudShell
🔹 Programming Language: Python 3.x
🔹 Infrastructure as Code: Python script for automating AWS resource creation
🔹 External API: SportsData.io NBA API
🔹 Environment Management: Environment variables for secure API key storage
🔹 IAM Security: Least privilege policies for S3, Glue, and Athena
🔹 Data Querying: Amazon Athena
🔹 Data Storage: Amazon S3 (raw and processed data)
🔹 Data Cataloging: AWS Glue
🔹 Prerequisites 🔹
🔹 Go to Sportsdata.io and create a free account
🔹 IAM Role/Permissions: Ensure the user or role running the script has the following permissions:
🔹 S3: s3:CreateBucket, s3:PutObject, s3:DeleteBucket, s3:ListBucket Glue: glue:CreateDatabase, glue:CreateTable, glue:DeleteDatabase, glue:DeleteTable Athena: athena:StartQueryExecution, athena:GetQueryResults
Steps: ➡️❗ Click Here To View Detailed Visual Steps ❗⬅️
-
Log into AWS and Open CloudShell Console
-
Create the setup_nba_data_lake.py file
-
In another window, go to GitHub and copy the contents inside the setup_nba_data_lake.py file located in the SRC file to paste in the file you created in the cloudshell console.
-
Configure API key in the Python script.
-
Create .env file to create variables that store your API Key and NBA endpoint for your scripts.
-
Run the script
-
Manually Check For The Resources, validate the data is there.
-
Head over to Amazon Athena to try a query.
🔹 Securing AWS services with least privilege IAM policies.
🔹 Automating the creation of services with a script.
🔹 Integrating external APIs into cloud-based workflows.
🔹 Automate data ingestion with AWS Lambda
🔹 Implement a data transformation layer with AWS Glue ETL
🔹 Add advanced analytics and visualizations (AWS QuickSight)
