The project aims to understand the underlying reasons for the success and failure of data science initiatives by analyzing a collection of news articles related to Data Science, Machine Learning, and Artificial Intelligence.
- Python programming language
- Natural Language Processing
- Text cleaning
- Named Entity Recognition
- Topic Modelling
- Sentiment Analysis
1. Data Collection and Preprocessing:
- A dataset containing news articles on Data Science, Machine Learning, and AI was provided by my University that I am studying in (University of Chicago).
- Noise Cleaning :
- Lowercasing
- Removed HTML tags, URLs and web crawl remnants
- Removed punctuations and digits
- Removed symbols and non-printable characters
- Removed newlines, tabs and extra white spaces - Pre-processing :
- Removed stopwords
- Lemmatization using WordNetLemmatizer()
2. Topic Detection:
- Use topic modeling techniques to categorize articles into major themes or topics.
- Assign each article to the appropriate topic for analysis.
3. Sentiment Analysis:
- Perform sentiment analysis to determine the sentiment (positive, negative) expressed in the articles.
- Customize sentiment analysis to fit the context of data science initiatives.
4. Reasons for Failure:
- Identify articles with negative sentiment discussing failures in data science projects.
- Extract reasons for these failures, such as technology issues, data challenges, or project management problems.
5. Reasons for Success:
- Identify articles with positive sentiment discussing achivements in data science projects.
- Extract reasons for these achievements
6. Sentiment Over Time Analysis:
- Create a timeline to visualize how sentiment changes over different time periods.
- Investigate whether sentiment patterns align with specific events or technological advancements.
7. Entity Identification:
- Use Named Entity Recognition to identify organizations, people, and locations mentioned in the articles.
- Compile a list of these entities for further analysis.
8. Targeted Sentiment Analysis:
- Analyze the sentiment associated with specific entities mentioned in the articles.
- Determine how organizations and people are portrayed in the context of data science projects.
9. Insights and Recommendations:
- Analyze the reasons for failure and success to extract insights.
- Develop actionable recommendations to enhance the success rates of data science initiatives.