Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #8

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions IronHack Payments Cohort Analysis Report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
IronHack Payments Cohort Analysis Report
1. Exploratory Data Analysis (EDA) Report
Overview of the Dataset:
Total Records: 26,598
Columns: 28, including transaction amounts, statuses, user IDs, dates, and more.
Key Findings:
Distribution of Transaction Amounts:

The distribution is right-skewed, with the majority of transactions being smaller amounts.
Most common transaction amounts are concentrated in the lower range (e.g., around $50 to $100).
Indicates a preference for smaller cash advances, possibly due to user cautiousness or short-term financial needs.
Status of Cash Requests:

Several categories exist for cash request statuses, including rejected, accepted, and others.
The frequency of these statuses varies, with certain statuses being more prevalent, such as money_back.
Time-Based Patterns:

Users’ activity is spread across different months, with some months showing higher user engagement.
2. Data Quality Analysis Report
Data Quality Issues Identified:
Missing Values:

Several columns have missing values. Notably:
deleted_account_id has 24,494 missing values.
moderated_at, cash_request_received_date, money_back_date have a significant number of missing entries.
Resolution: For critical columns like amount, rows with missing values were dropped. For categorical columns like status_cash_request, missing values were filled with 'unknown' to maintain data integrity.
Data Consistency:

The date fields contained time zone information, which was dropped to ensure consistency in date processing.
Actions Taken:
Rows with missing amount were dropped to ensure the reliability of revenue-related analysis.
Missing values in status_cash_request were filled with 'unknown'.
Time zone information was removed from datetime fields to simplify date analysis.
3. Cohort Analysis Insights
Cohort Definition:
Users are grouped into cohorts based on the month of their first cash advance request (cohort_month).
Key Metrics and Insights:
Average Usage per User Over Time:

Measures the average number of times users in each cohort used the service.
Observation: The average usage per user fluctuated slightly over time. Some cohorts show higher engagement, indicating that certain periods may attract more frequent users.
Incident Rates by Cohort:

The proportion of transactions that were rejected for each cohort.
Observation: Incident rates vary across cohorts. Some cohorts have higher rates of rejected requests, suggesting either a change in user behavior or modifications in service policies.
Total Revenue by Cohort:

The total revenue generated by each cohort over time.
Observation: Revenue contributions vary by cohort, with some months generating significantly higher revenue. This may correlate with user engagement levels or the overall economic climate during those periods.
Retention Rates by Cohort:

The percentage of users making additional transactions after their first request.
Observation: Retention rates varied across cohorts, with some cohorts showing higher rates of repeat usage. This could indicate the effectiveness of the service in retaining users or differences in user needs.
Overall Insights:
User Engagement: Certain cohorts show higher average usage and retention rates, suggesting potential periods of high user satisfaction or effective service marketing.
Risk and Incident Management: Variations in incident rates highlight the need for further investigation into factors influencing request rejections.
Revenue Trends: Monitoring revenue by cohort helps identify the most profitable user groups and optimize marketing strategies accordingly.
Recommendations:
Further Analysis: Investigate cohorts with high incident rates to identify potential causes and improve user experience.
Targeted Marketing: Focus on cohorts with high average usage and retention rates for targeted marketing campaigns.
User Feedback: Collect feedback from cohorts with lower engagement to understand barriers and improve service offerings.
Binary file added IronHack Payments Cohort Analysis Report.pdf
Binary file not shown.
Binary file added Ironhack Payments Project.pdf
Binary file not shown.
Binary file added Lexique - Data Analyst.xlsx
Binary file not shown.
64 changes: 64 additions & 0 deletions Step1 - MergeFiles.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'merged_data.csv'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"# Load the files\n",
"file1_path = 'extract - cash request - data analyst.csv'\n",
"file2_path = 'extract - fees - data analyst - .csv'\n",
"\n",
"# Reading the CSV files\n",
"df1 = pd.read_csv(file1_path)\n",
"df2 = pd.read_csv(file2_path)\n",
"\n",
"# Performing a full outer join on the 'id' column from df1 and df2\n",
"merged_df = pd.merge(df1, df2, on='id', how='outer', suffixes=('_cash_request', '_fees'))\n",
"\n",
"# Saving the merged DataFrame to a CSV file\n",
"merged_file_path = 'merged_data.csv'\n",
"merged_df.to_csv(merged_file_path, index=False)\n",
"\n",
"# Provide the path to the saved merged file\n",
"merged_file_path\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading