Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .env-defaults
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
HOST=0.0.0.0
PORT=8050
USER_FILE_PATH=change/me/user.json
MATCH_FILE_PATH=change/me/matches.json
MEDIA_PATH=change/me/media/
ASSETS_PATH=app/assets/
GEOLITE_DB_PATH=data/GeoLite2-City.mmdb
6 changes: 0 additions & 6 deletions .env_example

This file was deleted.

2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ data/
.vscode
.pytest_cache
app/assets/
scratch/
screenshots/
11 changes: 0 additions & 11 deletions BREAKING.md

This file was deleted.

2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ COPY . .
EXPOSE 8050

# Command to run the application
CMD ["python", "app/app.py"]
CMD ["python", "app/main.py"]
228 changes: 132 additions & 96 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,97 +1,133 @@
## Overview
Hinge allows users to request an export of their personal data that was collected while they were using the app. If you have a Hinge account, you can request your data by going to Settings -> Download My Data. It typically takes between 24 and 48 hours to fulfill this request, and once the data are ready, Hinge provides a `.zip` file with your personal data.
# Hinge Data Analysis
## Purpose
This project analyzes personal data exported from Hinge to provide valuable insights into the user's experiences on the platform. By examining the user's profile, dating preferences, and interactions with other users, the project aims to reveal patterns, trends, and meaningful statistics that enhance the understanding of how users engage with Hinge and make decisions based on their preferences.

### Getting Your Personal Data
Hinge allows active users to request an export of their personal data that were collected while they have had an account. If you have a Hinge account, you can request your data by going to Settings -> Download My Data. It typically takes between 24 and 48 hours to fulfill this request, and once the data are ready, Hinge provides a `.zip` file with your personal data.

### The Data Export Provided by Hinge
The data export provided by Hinge contains several files, but the main thing is the `index.html` file, which is used to render a web page with tabs showing different data. The tabs provided by Hinge are labeled: User, Matches, Prompts, Media, Subscriptions, Fresh Starts, and Selfie Verification. Aside from viewing changes to your prompts or seeing which pictures you've uploaded, these data are not particularly useful, especially on the Matches tab which should be the most interesting part.

The Matches tab in the Hinge export contains a list of "Matches", or rather "interactions" as I call them in this project, like this:

**Match # 1**
2024-01-22 20:13:22
Like

**Match # 2**
2024-01-23 20:15:42
Like

**Match # 3**
2024-01-23 20:37:27
Match

2024-01-23 20:39:45
Chat: Hello, World!

2024-01-23 21:49:26
Remove

The list of Matches provided by Hinge leaves a lot to be desired, which is why I decided to build this project analyzing and visualizing interesting insights from the Hinge data export.

## How To Run The App

### Setting Up GeoLite2 Database
1. Create a free MaxMind account: [MaxMind Signup](https://www.maxmind.com/en/geolite2/signup)
2. Download **GeoLite2-City.mmdb** from [MaxMind](https://www.maxmind.com/en/accounts/current/downloads)
3. Place `GeoLite2-City.mmdb` in the project "data" directory or update the script to point to its location.


The application is a multi page Dash Plotly application that runs in a Docker container on port `8050`. Create a Docker build image with: `docker compose build` and run the app with: `docker compose up -d`. The app will be available at [http://0.0.0.0:8050/](http://0.0.0.0:8050/). To bring the container down, use `docker compose down`.

The page will render with information about the app and instructions on how to use it.

The "Upload Files" section allows users to upload a `matches.json` or `user.json` file for analysis. **At the moment, the program expects the file to be called `matches.json` or `user.json`, as they are in the export provided by Hinge.** After a file has been selected, it should show the uploaded file name(s) under the upload box.

[![Screenshot-2024-05-25-at-10-12-48.png](https://i.postimg.cc/KcV1SFcQ/Screenshot-2024-05-25-at-10-12-48.png)](https://postimg.cc/hhLDTkd7)

The "Data Insights" section contains links to display pages with data related to match data or user data. Click on "Matches" or "Users" to show the information and graphs for either topic. The visualizations will initially show as blank graphs until a file has been upload and the graphs have been reloaded. Clicking the "Reload Graphs" button will regenerate the graphs with the uploaded data.

## Match Analytics
The match analytics page contains several graphs that show different aspects of the match data. Hinge only provides data on the user's actions for privacy reasons, so most of the data pertains to how the user interacted with other users.

The first graph is the **Interaction Funnel**, which is a visualization of the different types of interactions that occurred between the user and other users. The outermost part of the funnel "Distinct Interactions" is the total number of unique interactions that occurred. This is a combination of likes the user received and did not reciprocate, likes the user sent and were not reciprocated, and likes the user sent that lead to matches and chats.

The funnel is a good way to see how many interactions were initiated by the user and how many lead to matches and conversations.

[![Screenshot-2024-05-25-at-10-17-24.png](https://i.postimg.cc/vHbZdBFr/Screenshot-2024-05-25-at-10-17-24.png)](https://postimg.cc/3WfTX3wN)

The **Outgoing Likes You've Sent** section contains charts that go into more detail about the user's outgoing likes. The first chart shows users on the app that the user liked more than once. This scenario is perplexing, as it is not clear how this can happen, but does occur infrequently in the data. The second pie chart to the right shows the ratio of how many outgoing likes the user sent with a comment.

[![Screenshot-2024-05-25-at-10-26-30.png](https://i.postimg.cc/SQwtX2N9/Screenshot-2024-05-25-at-10-26-30.png)](https://postimg.cc/XXkgmv5N)

Underneath the pie charts, there is a table called **What You're Commenting When You Like Someone's Content**, that shows the comments the user left on other users' profiles when the user liked them. This table is useful for seeing what the user was saying to other users when they liked them.

The next section **Frequency of Action Types by Day**, shows the frequency of different actions the user took on the app by day. This is useful for seeing patterns of activity and when they were most active on the app.

[![Screenshot-2024-05-25-at-12-31-35.png](https://i.postimg.cc/nLfN53P0/Screenshot-2024-05-25-at-12-31-35.png)](https://postimg.cc/JsKTH5mk)

After that, there is a pie chart called **How Many People Did You Give Your Number To?**, which shows exactly that. Of the all the interactions a user had that lead to chats, this graph shows the ratio of how many chats lead to the user giving out their phone number. This operates under the assumption that the user shared their phone number in one of the common formats listed below.

[![Screenshot-2024-05-25-at-12-36-13.png](https://i.postimg.cc/MpqFmnMF/Screenshot-2024-05-25-at-12-36-13.png)](https://postimg.cc/gntsYkKV)

The last section of the Match Analytics shows **Outgoing Message per Chat**. This bar graph is a distribution of how many messages were sent by the user in each interaction where messages were exchanged. This is useful for seeing the average length of conversations the user had with others.

[![Screenshot-2024-05-25-at-12-39-54.png](https://i.postimg.cc/J7jxY1LV/Screenshot-2024-05-25-at-12-39-54.png)](https://postimg.cc/hhPVfRvp)

## User Analytics
This tab is currently under construction and will be available in a future release.

## Caveats
Hinge changes and updates the schema of the data export from time to time, and that may or may not break the current analysis code and make things obsolete. So far, I haven't experienced any schema changes that have broken my code, but I assume that over time, changes will occur and things will no longer work. I haven't found a way to stay up to date with their schema changes at this time.

## Assumptions
Since there is no documentation provided by Hinge, here are some assumptions I am making about the data:
1. Blocks, or "un-matches" (`where block_type = 'remove'`) could go either direction, meaning that block could represent someone removing the match with the user, or it could represent the user removing the block with someone else
1. I assume this also includes people the user came across while swiping that they wanted to remove from the deck
2. Matches without a like in the same event mean that someone liked the user first, and the user matched with them (i.e. there was no outgoing like sent first)

## Scenario Matrix
There are several possible scenarios happening in the export data in what Hinge refers to as "matches". These are not all "matches", because some events are simply outgoing likes that were not reciprocated. This is why I refer to them as **interactions**, where an interaction represents the encounters (likes, matches, chats, blocks) that occurred between the user and another person.

Here are the different scenarios of interactions that occur in the data:

| Like | Match | Chats | Block | Meaning |
| ---- | ---- | ---- | ---- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| X | | | | The user sent an outgoing, the person did not like them back |
| X | X | X | | The user sent an outgoing like, the other person liked them back, at least one message was exchanged |
| | X | X | | The user received an incoming like, the user liked the other person back and at least one message was exchanged |
| | | | X | The match was removed or "unmatched", can't tell who unmatched who. For some reason, a lot of these exist without any other information and there is no way to tell which interaction it was originally linked to |
| | X | | X | The user received an incoming like, the user liked the other person back, no messages were exchanged, and the match was removed |
## Project Structure
This is the structure of the project and what each section does at a high level.
```bash
app/
├── analytics/ # Core application logic and data analysis for users and matches
│ ├── __init__.py
│ └── MatchAnalytics.py
│ └── UserAnalytics.py

├── analytics/ # Contains image files copied from the media folder in the export

├── pages/ # Visualization rendering and user interface
│ ├── __init__.py
│ └── HomePage.py
│ └── InfoPage.py
│ └── MatchPage.py
│ └── UserPage.py

├── tools/ # Misc. tools
│ ├── __init__.py
│ └── Logger.py

├── utilities/ # Helper functions, constants, config, and utilities
│ ├── __init__.py
│ └── DataUtility.py

tests/ # Unit and integration tests
├── __init__.py
└── analytics
│ ├── __init__.py
│ └── test_MatchAnalytics.py
│ └── test_UserAnalytics.py

data/ # Local storage for raw data from the personal export
└── export
│ ├── media/ # Images that were uploaded to Hinge
│ └── matches.json
│ └── user.json
│ └── media.json
│ └── prompts.json
│ └── prompt_feedback.json
│ └── selfie_verification.json

README.md # Project overview and instructions
requirements.txt # Python dependencies
.env # Environment variables (e.g., MATCH_FILE_PATH)
Dockerfile # Dockerfile
docker-compose.yml # Docker Compose configuration
LICENSE # Project license
```

## Analysis Breakdown
The application is divided into two main sections, each providing distinct insights into the user's Hinge usage.

### User Analysis
This section contains insights into how the user's profile is presented, the preferences they've set, and how their interactions shape their experience on the app.

#### User Uploaded Photos, User Demographics & User Location
These slides show basic user information that was uploaded to Hinge including uploaded photos, demographic information about the user, and information about the user's location.

*Example visualization*
![User slides](screenshots/user_slides.png)

#### Profile Information Visibility
Looks at displayed vs. not displayed attributes (ethnicity, religion, workplaces, dating intentions etc.), and helps identify if the user is open vs. private about certain topics.

*Example visualization*
![Profile Info Vis](screenshots/profile_info_vis.png)

#### Comparison Between The User and Their Preferences
This shows potential alignment or misalignment between the users profile and their preferences.

*Example visualization*
![User and Pref Comp](screenshots/com_bet_user_and_prefs.png)

#### Dating Preferences: Dealbreakers vs Open Choices
This bar chart compares the number of 'dealbreakers' versus 'open' preferences across different dating categories, highlighting which factors are most important or flexible in the user's online dating criteria.

*Example visualization*
![Dating Prefs](screenshots/dating_prefs.png)

### Match Analysis
#### Message Count Variability by Month (Last 12 Months)
This box plot shows how the number of messages exchanged per match varies across each month over the past year.

*Example visualization*
![Message Count Var](screenshots/msg_count_boxplot.png)

#### Response Latency between Match and First Message Sent
This graph visualizes the response latency, or the time delay between when a match occurs and when the first message is sent.

*Example visualization*
![Response Latency](screenshots/resp_latency.png)

#### Duration of Time Between Match and Remove
This histogram visualizes the duration of a connection and when it was removed or blocked.

*Example visualization*
![Match Rm Duration](screenshots/duration_match_rm.png)

#### Match Duration vs. Message Count
This scatter plot explores the relationship between the number of messages exchanged in a match and the time until the match was removed or blocked.

*Example visualization*
![Duration V Count](screenshots/duration_v_count.png)

## How to Use
1. Export your data from Hinge
2. Install dependencies
(Using a virtual environment is recommended)
`pip install -r requirements.txt`
3. Create a root level folder named `data` and copy the `export` folder from the data export inside the `data` folder. This should contain:
- `media/`
- `user.json`
- `match.json`
All of these are utilized by the project.
4. Create a `.env` file and set environment variables for the following:
- `USER_FILE_PATH`
- `MATCH_FILE_PATH`
Refer to the `.env-defaults` file for details.
5. The Flask app can be run in two ways:
1. Running the app locally
`python app/main.py`
2. Running the app with Docker Compose
`docker compose build`
`docker compose up -d`
2 changes: 1 addition & 1 deletion app/analytics/MatchAnalytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ def __init__(self):
self.match_file_path = os.environ.get("MATCH_FILE_PATH")

if self.match_file_path is None:
raise Exception("MATCH_FILE_PATH environment varviable is not set.")
raise Exception("MATCH_FILE_PATH environment variable is not set.")

if '.json' not in self.match_file_path:
raise Exception("The match file needs to be a JSON file.")
Expand Down
15 changes: 12 additions & 3 deletions app/analytics/UserAnalytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
import json
import os
import shutil
import logging

logging.basicConfig(level=logging.INFO)

class UserAnalytics:
def __init__(self):
Expand All @@ -28,12 +31,12 @@ def __init__(self):
user_data = json.load(file)

self.user_data = user_data

def get_media_file_paths(self):

# need to copy the files from the media_path to the assets_dir
_copy_files(self.media_path, self.assets_path)

jpg_files = [f for f in os.listdir(self.assets_path) if f.endswith(".jpg")]
def get_media_file_paths(self):
jpg_files = [f for f in os.listdir(self.assets_path) if f.endswith(".jpg") or f.endswith(".jpeg") or f.endswith(".png")]
return jpg_files

def get_account_data(self):
Expand Down Expand Up @@ -167,6 +170,12 @@ def collect_location_from_ip(self):

def _copy_files(src_dir, dest_dir):
os.makedirs(dest_dir, exist_ok=True)
logging.info(f"Copying images files from media directory: {src_dir} to asset directory; {dest_dir}." )

# only proceed if the destination directory is empty
if os.listdir(dest_dir):
logging.info(f"Asset directory: '{dest_dir}' is not empty. Skipping copy...")
return

# loop through all files in source directory
for file_name in os.listdir(src_dir):
Expand Down
3 changes: 0 additions & 3 deletions app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@
import pages.HomePage as HomePage
import pages.InfoPage as InfoPage

from tools.Logger import logger

external_stylesheets = [dmc.theme.DEFAULT_COLORS]
server = Flask(__name__)
app = Dash(__name__, server=server, use_pages=True, external_stylesheets=external_stylesheets)
Expand Down Expand Up @@ -76,5 +74,4 @@ def get_additional_text(page_name):
host = os.environ.get("HOST")
port = int(os.environ.get("PORT", 8050))

logger.info(f"Running the Hinge Data Analysis app on {host}:{port}...")
app.run(debug=True, host=host, port=port)
Loading