The goal of this project is to generate a dataset of fencing matches pulled from the International Fencing Federation (FIE) website.
The International Fencing Federation (FIE) is recognized by the International Olympic Committee (IOC) as the world governing body of fencing, as such is charged with establishing the rules and implementation for international competition.
While the FIE is not the only fencing results page (see AskFRED or USFA), it has several advantages to other sources considered. To read more about the source comparisons, check out the 'Data Source(s)' section of docs/Data.md
The FIE website maintains a list of competition results and fencer bios. The following pages were used to collect data:
The output data, stored in final_output/
, contains the following dataframes for each division collected (e.g. Women's Foil), each in their own CSV file within a division directory:
List of tournaments in the division listed on the FIE competition results page.
variable | type | description |
---|---|---|
competition_ID | int | FIE competition ID |
season | int | start year of competition season |
name | string | Name of tournament |
category | string | age category for event (Cadet/Junior/Senior/Veterans) |
country | string | host country for tournament |
start_date | date | first day of event |
end_date | date | last day of event |
weapon | string | foil/epee/saber |
gender | string | event gender (Mens/Womens) |
timezone | string | timezone where the event took place |
url | string | FIE url for event page |
missing_results_flag | string | specifies missing (pools) results |
Sample of Dataframe (not all columns shown):
season | name | category | country | start_date | weapon | gender | competition+ID | |
---|---|---|---|---|---|---|---|---|
0 | 2015 | Memorial de Martinengo | Junior | SLOVAKIA | 2014-11-21 | Foil | Womens | 37 |
1 | 2004 | Trophée Federico II | Junior | ITALY | 2003-11-29 | Foil | Womens | 36 |
2 | 2011 | Coupe du Monde | Junior | SERBIA | 2011-03-06 | Foil | Womens | 35 |
List of bouts from pools across all tournaments stored in the Tournament Dataframe.
variable | type | description |
---|---|---|
fencer_id | int | ID of of fencer in the bout |
opp_ID | int | ID of the opponent in the bout |
fencer_age | int | age of fencer at the time of the event |
opp_age | int | age of opponent at the time of the event |
fencer_score | int | points scored in bout for fencer |
opp_score | int | points scored in bout for opponent |
winner_ID | int | ID matching the fencer who won |
fencer_curr_pts | double | fencer's points in the division at the start of the event |
opp_curr_pts | double | opponents's points in the division at the start of the event |
tournament_ID | string | tournament the pool occured in |
upset | boolean | True if fencer with fewer points won |
date | date | date of the pool |
Bouts are not double count, so fencer_ID
corresponds to the fencer with the lower number in the pools table ordering. If both fencers have no points, then upset
is False
. winner_ID
is included in the case of ties, where the scores will match.
Sample of Dataframe (not all columns shown):
fencer_ID | opp_ID | fencer_age | fencer_score | opp_score | winner_ID | fencer_curr_pts | tournament_ID | pool_ID | upset | date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 29240 | 27947 | 24 | 2 | 5 | 27947 | 0 | 2015-37 | 1 | False | 2014-11-21 |
1 | 29240 | 35149 | 24 | 5 | 3 | 29240 | 0 | 2015-37 | 1 | False | 2014-11-21 |
2 | 29240 | 28025 | 24 | 0 | 5 | 28025 | 0 | 2015-37 | 1 | False | 2014-11-21 |
Biographical information about each fencer stored by ID.
variable | type | description |
---|---|---|
id | int | FIE ID number for fencer |
name | string | Fencer's full name |
country | string | Fencer's country (when data was accessed) |
hand | string | whether the fencer is left or right handed |
age | int | Fencer's age (when data was accessed) |
date_accessed | datetime | Date and time when data was accessed |
Sample of Dataframe:
id | name | country | hand | age | date_accessed | |
---|---|---|---|---|---|---|
0 | 20482 | YANAOKA Haruka | JAPAN | Right | 26 | 2021-05-11 18:15:32 |
1 | 28701 | CIPRESSA Erica | ITALY | Right | 24 | 2021-05-11 18:15:44 |
2 | 33865 | TANGHERLINI Elena | ITALY | Left | 23 | 2021-05-11 18:16:12 |
Historical data about the fencers rankings/points in each division (weapon/age category).
variable | type | description |
---|---|---|
id | int | Fencer's FIE ID |
weapon | string | foil/epee/saber |
category | string | Age category of ranking (Cadet/Junior/Senior/Veterans) |
season | string | season for ranking in format YYYY/YYYY |
rank | int | ranking within division (weapon and category) for the given season |
points | double | points earned in the division (weapon and category) for the given season |
Sample of Dataframe (rendered with MultiIndex):
id | weapon | category | season | rank | points |
---|---|---|---|---|---|
32192 | Foil | Junior | 2013/2014 | 20 | 2 |
2014/2015 | 180 | 4 | |||
2015/2016 | 296 | 0 | |||
2016/2017 | 226 | 4 | |||
2017/2018 | 73 | 22 | |||
Senior | 2016/2017 | 433 | 0 | ||
2018/2019 | 312 | 0 |
To generate new division data, use the following steps:
-
Fork this repo to obtain a local copy
-
Edit the following lines in
main.py
to specify the division parameters:#f=foil, #e=epee, s=sabre weapon = 'f' # f=womens, m=mens gender = 'f' # c=cadet, j=junior, # s=senior, v=veteran category = ''
-
Add the path to
fie-fencing-dataset
to $PYTHONPATH (instructions below from here)- Open up Terminal
- Type
open .bash_profile
- In the text file that pops up, add this line at the end:
export PYTHONPATH=$PYTHONPATH: [path goes here].../fie-fencing-dataset
- Save the file, restart the Terminal
- Type
echo $PYTHONPATH
and check that the path tofie-fencing-dataset
is there.
-
From the terminal run
main.py
using python...$ python3 main.py
-
The script will save the four dataframes into four CSV files. These files will be saved in a subdirectory of
output
with the date the script was run (e.g.output/May-13-2021/
). -
To load and work with the dataframes you have created, run the following command in python or an interactive notebook like Jupyter:
from load_csv import tournament_df, bout_df, fencer_bio_df, fencer_rankings_df
- main
- load_csv
- /docs
- /fencers
- /pools
- /tournaments
- /final_output
- /helper
- demo
- initial_testing
- output
Webpage processing used pythons requests
package and the BeautifulSoup
package.