Skip to content

This project generates synthetic social network data and applies graph theory to analyze relationships and interactions. It includes synthetic tables for tweets, users, and interactions, a directed graph for user interactions with key metrics, and a bipartite graph linking tweets to hashtags.

License

Notifications You must be signed in to change notification settings

haroldeustaquio/SocialNet-Graphs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SocialNet Graphs

Overview

This repository generates synthetic social network data and applies graph theory to analyze user interactions and tweet-hashtag relationships. It includes synthetic tables for tweets, users, and interactions; a directed graph of user interactions that highlights key metrics such as total nodes, edges, in/out-degrees, and betweenness centrality; and an undirected bipartite graph linking tweets with hashtags to identify the most active tweets and popular hashtags. Underlying mathematical concepts from graph theory support the analyses.

Content


Synthetic Data

The script generates and exports three synthetic tables that simulate data from a social network. Below is a description of each created table and its structure.

Tweets Table

This table contains information about the tweets published by users. Each tweet has a unique identifier, information about the creator, the content category, associated hashtags, and the publication date.

Field Description
tweet_id Unique identifier of the tweet (starts at 1000 and increments sequentially).
creator User who created the tweet, randomly selected from the list of users.
category Category of the tweet (e.g., Politics, Economy, Sports, Culture, Technology, Health, World).
hashtags String of associated hashtags, separated by commas. Can contain 0, 1, or 2 random hashtags.
tweet_date Date and time of tweet publication in YYYY-MM-DD HH:MM:SS format (last 60 days).

Users Table

This table simulates the information of registered users on the social network, with details about their profile and activity.

Field Description
user_id Unique identifier of the user (e.g., User_1, User_2, ...).
username Username, same as the user_id.
account_creation_date Account creation date, randomly generated in the last 5 years.
age Age of the user, a random number between 18 and 60.
country Country of origin of the user (e.g., USA, Spain, Mexico, Argentina, Colombia, Chile, Peru).
bio Brief biography of the user, randomly selected from a list of descriptions.
followers_count Number of followers of the user, randomly generated between 50 and 10,000.

Interactions Table

This table stores the interactions that users perform on tweets, such as comments, likes, reposts, or mentions. Each interaction includes references to both the tweet and the users involved.

Field Description
interaction_id Unique identifier of the interaction (starts at 2000 and increments sequentially).
tweet_id Identifier of the tweet related to the interaction.
user_from User who performs the interaction.
user_to User who receives the interaction. Can be empty, depending on the type of interaction.
interaction_type Type of interaction, which can be comment, like, repost, or mention.
interaction_date Date and time the interaction was performed, in YYYY-MM-DD HH:MM:SS format (last 30 days).

Context 1: Analysis of User Interactions in a Graph

In this scenario, a directed graph is created to represent the interactions between users of a social network. Each node in the graph corresponds to a user, while the edges indicate interactions between them, such as comments, likes, reposts, and mentions. The goal is to analyze the network structure, obtain important metrics, and visualize the distribution and direction of interactions.

Graph Type:

  • Directed Graph (DiGraph).

Processed Data:

  • Nodes: Users with attributes (username, creation date, age, country, bio, followers).
  • Edges: Interactions (comments, likes, reposts, mentions) between users.

Key Results:

Metric Description
Total users (nodes) Total number of users in the network.
Total interactions (edges) Total number of connections between users.
Top 3 In-degree Users who receive the most interactions.
Top 3 Out-degree Users who initiate the most interactions.
Top 3 Betweenness Centrality Users who act as critical bridges in the network.

Adjacency Matrix: Generated to numerically visualize the connectivity between users.

Graph Visualization:

Graph Visualization

Context 2: Analysis of Bipartite Graph: Tweets and Hashtags

A bipartite graph is created that connects tweets with the hashtags they use. Each node represents a tweet or a hashtag, and an edge indicates that a tweet uses a hashtag. This allows identifying which tweets use the most hashtags and which hashtags are the most popular.

Graph Type:

  • Undirected Graph (Graph) with a bipartite structure.

Processed Data:

  • Nodes:
    • Tweets (type "tweet", positioned in one set).
    • Hashtags (type "hashtag", positioned in the other set).
  • Edges:
    • Connection between a tweet and the hashtags it uses, with the relation "uses".

Key Results:

Metric Description
Top 3 Tweets by number of hashtags used Tweets with the highest number of connections to hashtags.
Top 3 Hashtags by popularity Most used hashtags (highest number of tweets using them).

Bipartite Graph Visualization:

Bipartite Graph Visualization

Mathematical Foundation

Graphs are a central structure in discrete mathematics and computer science used to model relationships between pairs of objects. A graph $G = (V, E)$ is defined as:

  • Vertices (V): A set of nodes or points.
  • Edges (E): A set of connections (pairs of vertices) that represent relationships.

Types of Graphs

  • Undirected Graphs:

    • The edges do not have a direction.
    • An edge between $u$ and $v$ is represented as ${u, v}$.
  • Directed Graphs (Digraphs):

    • Each edge has a direction.
    • An edge from $u$ to $v$ is represented as $(u, v)$.
    • Useful in modeling asymmetrical relationships (e.g., follower relationships in social networks).
  • Weighted Graphs:

    • Edges carry a numerical weight.
    • Weights can represent distance, cost, or capacity.
  • Bipartite Graphs:

    • The vertex set is divided into two disjoint sets $U$ and $W$ such that every edge connects a vertex in $U$ to one in $W$.
    • Commonly used in modeling relationships between two different types of entities (e.g., tweets and hashtags).

Graph Representations

  • Adjacency Matrix:

    • A square matrix $A$ where $a_{ij}$ represents the presence (and sometimes the weight) of an edge between vertices $i$ and $j$.
    • Provides a straightforward way to check for edge existence, especially in dense graphs.
  • Adjacency List:

    • Each vertex maintains a list of adjacent vertices.
    • More memory-efficient for sparse graphs and provides faster iteration over neighbors.

Key Graph Metrics and Concepts

  • Degree:

    • Undirected Graphs: The degree of a vertex is the number of edges incident to it.
    • Directed Graphs:
      • In-degree: Number of incoming edges.
      • Out-degree: Number of outgoing edges.
  • Path and Connectivity:

    • A path is a sequence of vertices where each adjacent pair is connected by an edge.
    • Connectivity assesses whether there is a path between any two vertices in the graph.
    • Connected Graph: There is a path between every pair of vertices (for undirected graphs).
  • Cycles:

    • A cycle is a path that starts and ends at the same vertex with no other repetitions.
    • Acyclic Graph: A graph with no cycles, important in scenarios like dependency resolution.
  • Centrality Measures:

    • Degree Centrality: Based on the number of direct connections a vertex has.
    • Betweenness Centrality: Measures the frequency at which a vertex appears on the shortest paths between other vertices.
    • Closeness Centrality: Based on the average length of the shortest paths from a vertex to all others.
    • Eigenvector Centrality: Considers the influence of a vertex based on the centrality of its neighbors.
  • Graph Traversal Algorithms:

    • Breadth-First Search (BFS): Explores vertices in layers, useful for finding the shortest path in unweighted graphs.
    • Depth-First Search (DFS): Explores as far as possible along each branch before backtracking, useful for cycle detection and path finding.

Applications of Graph Theory

  • Social Network Analysis:

    • Modeling and analyzing relationships and interactions among individuals.
    • Identifying influencers using centrality metrics.
  • Computer Networks:

    • Routing algorithms that optimize the flow of data across networks.
  • Biology:

    • Analyzing biological networks such as neural networks and protein-protein interaction networks.
  • Recommendation Systems:

    • Using bipartite graphs to model relationships between users and products (or tweets and hashtags) to provide personalized recommendations.
  • Optimization Problems:

    • Problems like the traveling salesman or network flow, where graphs model the underlying structure of the problem.

About

This project generates synthetic social network data and applies graph theory to analyze relationships and interactions. It includes synthetic tables for tweets, users, and interactions, a directed graph for user interactions with key metrics, and a bipartite graph linking tweets to hashtags.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages