Trinh Viet Doan, Ljubica Pajevic, Vaibhav Bajpai, Jörg Ott
Technical University of Munich
IEEE Communications Magazine, November 2018. Publication →
Presented at MAT WG Meeting, RIPE 77, Amsterdam. Slides →
The dataset is collected from ~100 SamKnows probes:
The raw dataset is available at:
It is stored as a sqlite3 database youtube-may-2016-2018.db
. The schema of the tables can be found under ./data/youtube-traceroute-schema.sql
).
This repository contains (most of) the required metadata to reproduce the results, see below for further instructions.
To read from the database (see above), sqlite3
is needed.
The analyses were performed using jupyter
notebooks on Python 2.7
.
Required Python dependencies are listed in requirements.txt
and can be installed using pip install -r requirements.txt
.
For the calculation of CDFs and drawing of the corresponding plots, Pmf.py
→ and Cdf.py
→ from Think Stats → are used.
Further, as_types.txt
(downloaded from CAIDA's AS Classification →) is used to assign certain types to the ASes seen in the traceroute measurements.
Move the required datasets and modules to the right locations:
youtube-may-2016-2018.db
→./data/
Pmf.py
→.
Cdf.py
→.
as_types.txt
→./metadata/
Run the nb-create_tables.ipynb
notebook to process and aggregate the raw dataset, which will store the results in a separate database. After that, the other notebooks nb-*.ipynb
can be used to draw the plots presented in the paper.
All plots are saved under ./plots/
.
Note: the lookup of metadata was already done, however, it can be repeated by running ./metadata/metadata_lookup.py
.
For a previous version of the dataset (covering measurements from 05/2016 until 03/2017), more analyses and results can be found here →.
Please feel welcome to contact the authors for further details.
- Trinh Viet Doan (doan@in.tum.de)
- Ljubica Pajevic (kaerkkal@in.tum.de)
- Vaibhav Bajpai (bajpaiv@in.tum.de)
- Jörg Ott (ott@in.tum.de)