ironhack-labs · senlerk · Sep 23, 2024
diff --git a/IronHack Payments Cohort Analysis Report.md b/IronHack Payments Cohort Analysis Report.md
@@ -0,0 +1,61 @@
+IronHack Payments Cohort Analysis Report
+1. Exploratory Data Analysis (EDA) Report
+Overview of the Dataset:
+Total Records: 26,598
+Columns: 28, including transaction amounts, statuses, user IDs, dates, and more.
+Key Findings:
+Distribution of Transaction Amounts:
+
+The distribution is right-skewed, with the majority of transactions being smaller amounts.
+Most common transaction amounts are concentrated in the lower range (e.g., around $50 to $100).
+Indicates a preference for smaller cash advances, possibly due to user cautiousness or short-term financial needs.
+Status of Cash Requests:
+
+Several categories exist for cash request statuses, including rejected, accepted, and others.
+The frequency of these statuses varies, with certain statuses being more prevalent, such as money_back.
+Time-Based Patterns:
+
+Users’ activity is spread across different months, with some months showing higher user engagement.
+2. Data Quality Analysis Report
+Data Quality Issues Identified:
+Missing Values:
+
+Several columns have missing values. Notably:
+deleted_account_id has 24,494 missing values.
+moderated_at, cash_request_received_date, money_back_date have a significant number of missing entries.
+Resolution: For critical columns like amount, rows with missing values were dropped. For categorical columns like status_cash_request, missing values were filled with 'unknown' to maintain data integrity.
+Data Consistency:
+
+The date fields contained time zone information, which was dropped to ensure consistency in date processing.
+Actions Taken:
+Rows with missing amount were dropped to ensure the reliability of revenue-related analysis.
+Missing values in status_cash_request were filled with 'unknown'.
+Time zone information was removed from datetime fields to simplify date analysis.
+3. Cohort Analysis Insights
+Cohort Definition:
+Users are grouped into cohorts based on the month of their first cash advance request (cohort_month).
+Key Metrics and Insights:
+Average Usage per User Over Time:
+
+Measures the average number of times users in each cohort used the service.
+Observation: The average usage per user fluctuated slightly over time. Some cohorts show higher engagement, indicating that certain periods may attract more frequent users.
+Incident Rates by Cohort:
+
+The proportion of transactions that were rejected for each cohort.
+Observation: Incident rates vary across cohorts. Some cohorts have higher rates of rejected requests, suggesting either a change in user behavior or modifications in service policies.
+Total Revenue by Cohort:
+
+The total revenue generated by each cohort over time.
+Observation: Revenue contributions vary by cohort, with some months generating significantly higher revenue. This may correlate with user engagement levels or the overall economic climate during those periods.
+Retention Rates by Cohort:
+
+The percentage of users making additional transactions after their first request.
+Observation: Retention rates varied across cohorts, with some cohorts showing higher rates of repeat usage. This could indicate the effectiveness of the service in retaining users or differences in user needs.
+Overall Insights:
+User Engagement: Certain cohorts show higher average usage and retention rates, suggesting potential periods of high user satisfaction or effective service marketing.
+Risk and Incident Management: Variations in incident rates highlight the need for further investigation into factors influencing request rejections.
+Revenue Trends: Monitoring revenue by cohort helps identify the most profitable user groups and optimize marketing strategies accordingly.
+Recommendations:
+Further Analysis: Investigate cohorts with high incident rates to identify potential causes and improve user experience.
+Targeted Marketing: Focus on cohorts with high average usage and retention rates for targeted marketing campaigns.
+User Feedback: Collect feedback from cohorts with lower engagement to understand barriers and improve service offerings.
diff --git a/IronHack Payments Cohort Analysis Report.pdf b/IronHack Payments Cohort Analysis Report.pdf
diff --git a/Ironhack Payments Project.pdf b/Ironhack Payments Project.pdf
diff --git a/Lexique - Data Analyst.xlsx b/Lexique - Data Analyst.xlsx
diff --git a/Step1 - MergeFiles.ipynb b/Step1 - MergeFiles.ipynb
@@ -0,0 +1,64 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'merged_data.csv'"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "# Load the files\n",
+    "file1_path = 'extract - cash request - data analyst.csv'\n",
+    "file2_path = 'extract - fees - data analyst - .csv'\n",
+    "\n",
+    "# Reading the CSV files\n",
+    "df1 = pd.read_csv(file1_path)\n",
+    "df2 = pd.read_csv(file2_path)\n",
+    "\n",
+    "# Performing a full outer join on the 'id' column from df1 and df2\n",
+    "merged_df = pd.merge(df1, df2, on='id', how='outer', suffixes=('_cash_request', '_fees'))\n",
+    "\n",
+    "# Saving the merged DataFrame to a CSV file\n",
+    "merged_file_path = 'merged_data.csv'\n",
+    "merged_df.to_csv(merged_file_path, index=False)\n",
+    "\n",
+    "# Provide the path to the saved merged file\n",
+    "merged_file_path\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}