Skip to content

MhmedRjb/MillionSonganalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Million Songs analysis

This project is a comprehensive analysis of song data spanning several decades. It aims to uncover insights and trends in the music industry over time. The data analyzed includes various parameters such as the artist’s name, the year of song release, the singer’s gender, and the country of origin.

flowchart

Tech Stack & Tools

  • Infrastructure: Terraform & Docker
  • Orchestration: MAGEai
  • Database Storage:local
  • Data Processing: Apache Spark
  • ETL Scripts: Python
  • Serving Layer: Google Sheets & Looker

Pipeline Overview

The pipeline starts by ingesting raw data from CSV files. and collect another data using wikidata API Following the ETL (Extract, Transform, Load) process, The orchestration of the ETL workflow is MAGEai, then it save to googlesheet file in cloud. Finally, the insights derived from the processed data are visualized using lookerstudio.

Getting Started

This section will guide you through getting the project up and running on your local machine for development and testing purposes.

Prerequisites

  • Docker
  • Docker Compose
  • Terraform

Installation & Setup

  1. Download the Repository
  2. Open Google Cloud
  3. Create a Service Account in the project
  4. Generate a key and save it in the project path as Serviceaccounts.json
  5. Replace the existing file with this new key
  6. Copy this google sheet to your account with the same name
  7. copy the client_email from Serviceaccounts.json file and make it editor in google sheet by click
  8. Open Looker Studio and copy this report
  9. Define the data sources which is the Google Sheet file
  10. Run this command to build the infrastructure: terraform apply Select 'yes' when prompted Run this command to trigger the pipeline: curl -X POST http://localhost:6789/api/pipeline_schedules/1/pipeline_runs/5266e37a5e6545bb8d96531bf70471d5 If the pipeline doesn't start automatically, navigate to server: localhost:6789 and click on MillionSongsanalysis, then select 'run once'

in case you get a model not found error go to requirements.txt and install packages

After completing the above steps, the setup should be functional

Visualizations

lookerstudio report singers_data_visualization_page-0001

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published