This repository highlights data science projects from my academic journey, focusing on fraud detection, credit and risk analysis, and lead categorization. Specializing in banking and financial data, I developed advanced machine learning models and conducted EDA using Python and SQL, honing essential skills for an analyst.
Description: This project aims to apply Exploratory Data Analysis (EDA) techniques to understand and mitigate the risks associated with loan approvals in the financial services industry. The objective is to analyze the patterns in customer and loan data to help a consumer finance company make informed decisions on loan approvals. By identifying the driving factors behind loan defaults, the company can reduce the risk of financial loss while ensuring that creditworthy applicants are not unfairly rejected. Repository Link
Description: RSVP Movies, a prominent Indian film production company known for producing several blockbuster movies, is planning to expand its audience by releasing a movie targeted at the global market in 2022. To ensure the success of this ambitious project, they are seeking data-driven insights to guide their decisions throughout the production and release process.
Assuming myself as a data analyst and SQL expert i analyzed the data from movies released over the past three years and provide actionable recommendations that RSVP Movies can use to optimize their strategy for the upcoming global release. Repository Link
Description: X Education, an online education company, sells courses to industry professionals. The company receives a significant number of leads daily through various online marketing channels like websites and search engines. These leads are potential customers who have shown interest by browsing courses, filling out forms, or watching videos. However, despite the high volume of leads, the company struggles with a low lead conversion rate, typically around 30%.
To enhance efficiency and increase the conversion rate, X Education aims to identify the most promising leads—those with the highest likelihood of converting into paying customers. The goal is to create a model that assigns a lead score to each lead, helping the sales team focus their efforts on the most promising prospects. Repository Link
Description: In the highly competitive telecom industry, customer retention is crucial as acquiring a new customer costs significantly more than retaining an existing one. The telecom industry typically experiences an annual churn rate of 15-25%. To stay ahead in the market, it is vital for telecom companies to predict which customers are at high risk of churning and take proactive measures to retain them.
This project focuses on analyzing customer-level data from a leading telecom firm and building predictive models to identify customers who are at high risk of churn. The goal is to improve customer retention by understanding the main indicators of churn and targeting high-value customers, who are responsible for generating the majority of the company's revenue. Repository Link
Description: In the banking industry, managing credit risk is a critical aspect of operations, especially in the context of regulatory frameworks like Basel norms. A key component of credit risk management is the computation of Expected Credit Loss (ECL), which comprises the Probability of Default (PD), Loss Given Default (LGD), and Exposure at Default (EAD). This project focuses on developing a model to estimate the Loss Given Default (LGD) for defaulted loan accounts.
As a business analyst working for a bank, the objective of this project is to build a statistical model that can predict the LGD for borrowers. The LGD represents the proportion of the total loan amount that the bank expects to lose if a borrower defaults. A more accurate prediction of LGD helps the bank in better credit risk management and provisioning, ensuring compliance with regulatory standards and improving decision-making processes. Repository Link
Description: The rise in digital payment channels has significantly increased the risk of fraudulent transactions, posing a substantial threat to both financial institutions and their customers. Finex, a leading financial service provider based in Florida, has been facing a critical challenge due to an alarming rise in unauthorized credit and debit card transactions. These fraudulent activities have not only led to considerable financial losses but have also undermined customer trust and the bank’s credibility.
To address this issue, this project focuses on developing a machine learning-based fraud detection system capable of identifying and preventing fraudulent transactions in real-time. The project involves understanding the pipeline of a typical transaction, identifying the challenges at each step, and creating a robust solution that minimizes revenue loss while maintaining a seamless customer experience. Repository Link
Explore each project to see the detailed code, methodologies, and results that demonstrate my capabilities as a data scientist.