This repository contains the code and data for a comprehensive big data analysis project focused on Eminem. The project leverages various data sources, including Twitter, Spotify, and YouTube, to analyze Eminem's presence on social media, his music catalog, and audience engagement.
- Eminem - Wikipedia: Visit Eminem's Wikipedia page for more information about his life and career.
- Overview
- Data Sources
- Tools and Technologies
- Setup
- Directory Structure
- Key Findings
- Graph Images
- Contributing
Eminem, also known as Marshall Mathers, is one of the most influential and successful artists in the history of hip-hop. This project aims to analyze various aspects of Eminem's career and music using big data techniques. The analysis includes sentiment analysis of social media conversations, exploration of his discography on Spotify, and audience engagement on YouTube.
The project utilizes data from the following sources:
- Twitter API: Collects tweets related to Eminem for sentiment analysis and network analysis.
- Spotify API: Retrieves information about Eminem's albums, tracks, and audio features for analysis.
- YouTube API: Gathers data on Eminem's videos, comments, and audience engagement for further analysis.
The analysis is performed using the following tools and technologies:
- R Programming Language: Used for data collection, preprocessing, analysis, and visualization.
- vosonSML: A package for social media and web data collection and analysis.
- Rspotify: An R wrapper for the Spotify Web API, used for retrieving Eminem's music data.
- tuber: An R package for interacting with the YouTube Data API to collect video and comment data.
- igraph: A package for network analysis and visualization.
- tidytext: A package for text mining and sentiment analysis.
To replicate the analysis, follow these steps:
- Clone this repository to your local machine.
- Install the required R packages listed in the
install_packages.R
file. - Obtain API keys for Twitter, Spotify, and YouTube.
- Update the authentication variables in the R scripts with your API keys.
- Run the main analysis script (
main.R
) to perform data collection, preprocessing, analysis, and visualization.
The repository is organized as follows:
- main.R/: Contains R scripts for data collection, preprocessing, analysis, and visualization.
- Data/: Contains the collected data files.
- Gephi/: Contains the collected Graph files.
- RStudio/: Contains R Markdown notebooks documenting the analysis process.
- GraphImages/: Contains the Graph PNG Images.
-
Eminem's Music Career:
- Active for 24 years.
- Released 4 albums and numerous songs.
- Collaborated with artists like Rihanna, Snoop Dogg, and Juice WRLD.
-
YouTube Engagement:
- Identified top-viewed and top-liked videos, including "Love The Way You Lie" ft. Rihanna and "Rap God."
- Observed a correlation between views and likes, indicating higher engagement for popular videos.
-
Twitter Sentiment Analysis:
- Analyzed public sentiments towards Eminem, revealing a mix of positive and negative emotions.
- Identified key terms frequently associated with Eminem, including "Slim Shady," "rap," and "music."
-
Decision Tree Modeling:
- Built a decision tree model to predict whether a song is by Eminem, achieving accurate predictions.
-
LDA Topic Modeling:
- Identified significant groups of words related to Eminem, such as "Slim Shady," "rap," and "music heart."
- Explore additional data sources beyond Spotify, YouTube, and Twitter to gather more comprehensive insights.
- Experiment with different parameters and algorithms to improve the accuracy and relevance of the analysis results.
- Incorporate user feedback and domain expertise to refine the analytical approach and interpretation of findings.
- Extend the analysis to include other social media platforms, such as Instagram and Facebook, to capture a broader spectrum of audience interactions.
- Explore advanced machine learning models and natural language processing techniques to extract deeper insights from social media conversations.
- Collaborate with domain experts and stakeholders to further refine the analysis methodology and address specific research questions.
The social media analytics conducted for Eminem provide valuable insights into his music career, public reception, and online engagement. By leveraging diverse data sources and analytical techniques, we gain a comprehensive understanding of Eminem's influence and impact in the music industry. This project serves as a foundation for future research and exploration in the field of social media analytics.
Contributions to this project are welcome! If you have any ideas, suggestions, or improvements, feel free to open an issue or submit a pull request.