The following project contains a graph-based analysis of the football's transactions market. The main goal is to inferee the relationships in the football's transfer market, in terms of most central teams in the networks, most important actors in the capital gain generation and teams communities. Every further detail is provided in the file REPORT.pdf (italian). The same information is synthesized and exhibited in english in the file POSTER.pdf. We also provide in this Readme an overview of the most important points.
We aim to answer the following reserch questions:
- What are the main relationships between the teams of the 7 major European leagues in terms of the sports market?
- What are the main relationships between football clubs and agents in terms of the sports market?
- What are the relationships between the world's football leagues in terms of player transfers?
- Is there a relationship between the capital gains generated by the teams and their sports results?
The dataset are created with dedicated web scrapers available in the repository's Jupyter notebooks and collected from the website transfermarkt. In particular, the main scraper (scraper.ipynb) is an improvement of this software. We provide 4 datasets in csv format:
- transfers_complete.csv: every transaction from 2009 summer session and 2024 winter session, related to the main 7 european football leagues (Premier League, Serie A, LaLiga, Ligue 1, Bundesliga, Eredivisie, Liga Portugal);
- transfers_with_agents.csv: transfers_complete with the agent agency of the footballer (when available);
- capital_gains.csv: capital gains calculated as difference between the price of sale and the price of purchase (when available) for the same player and the same team;
- champ_performances_with_metrics.csv: team performances for every season in the considered temporal interval.
The analysis is carried on with the following techniques:
- Exploration analysis: charts to explore the general relationships in the data;
- Graph analysis:: graphs generated with networkx;
- Communities:: identification of the main communities with Louvain;
- Metrics: simultaneously with the graph and communities analysis, we provied dedicated metrics to describe the main characteristics of the networks. We developed a dedicate metric to evaluate the teams' performances (check the report for details).
We synthesize the most interesting results:
- The most central teams in terms of capital gain are Chelsea, Roma and Juventus. The biggest ones (capital gain total amount) are Benfica and Ajax.
- We identified 5 communities, each one involving a specific teams' nationality, except for the most heterogeneous community 3.
- The main agents' agencies are "Wasserman", "CAA Stellar" and "Unique Sports Group". There is a huge amount of free agents, because of the unavailable data.
- The most central leagues (nations) involved in the transfers network are Netherlands, Portugal and Spain, with England as the most wealthy league.
- There is not a consistent relationship between capital gain and performance.
Here a few plots about the above results:
Most central nodes by capital gain
Communities 0 (Italy) and 3 (Spain, Portugal, France)
Leagues' network (in transactions and out transactions)
Cagliari Calcio performance and gain
Performance-Gain regression models