A SYSTRON Lab project
from the Department of Computer Science at the University of York
Table of Contents
Heightened interest from nation states to perform content censorship make it evermore critical to identify the impact of censorship efforts on the Internet. We undertake a study of Internet architecture, capturing the state of Internet topology with greater completeness than existing state-of-the-art.
There are a small number of nation states that do not follow this trend, for which we provide an analysis and explanation, demonstrating a relationship between geographical factors in addition to geopolitics. In summary, our work provides a deeper understanding of how these censorship measures impact the overall functioning and dynamics of the Internet.
This previous version of our code, which forms the foundation of our current approach, does not include the complete codebase used in our analysis, which targets specific HPC architecture. We intend to release a platform-agnostic version (which includes RIPE Atlas measurements) shortly.
- BGPKIT, an open-source BGP data toolkit. We use the
pybgpkit
bindings for BGPKit. - Requests, an HTTP library for Python.
- NetworkX, a network analysis library.
- Beautiful Soup, a data extractor from scraped web files.
- Pandas and NumPy.
You can quickly create a copy of this project locally to perform your own analysis. Depending on the specified timeframes and the quantity of data, running from this codebase is relatively memory-intensive. In using Pandas, we have done little to optimise the memory requirements of our approach.
We collect BGP table data from RIPE Routing Information Service (RIS), RouteViews and Packet Clearing House (PCH). This provides a snapshot of the routing table from a variety of geographic locations. Some data from these sources may be duplicated: either because the routes they see are the same, or because in some cases the route collectors may be situated in the same or nearby physical locations.
- Run
bgp_collector_ris_rv.py
with a supplied timestamp (TS_START
andTS_END
) range to collect BGP data from the RIPE RIS and RouteViews collectors for the specified time range. In the code supplied, we use RIB tables rather than UPDATE files, which mean only the retained (router-determined locally optimal) paths. To increase the dataset, using UPDATE files to observe routes outside of those retained can give better visibility. - Run
bgp_collector_pch.py
with a supplied date (YEAR
,MONTH
,DAY
) to collect BGP data from a given day. If observing a multi-day period, this script can be run multiple times with a different date supplied. This script uses Regex rather than a pandas converter to extract path information from the text-based files, so non-as_path
information is ignored.
- Run
bgp_adjacency_creator.py
with an input file listing all of the generated CSV files from (1) and (2), as well as an output destination for anadjacencies.csv
file. This will contain a long list of non-duplicated adjacencies based on neighbours in the listedas_path
, which enables us to observe multiple adjacencies for one AS, as well as ignoring prepending.
The code supplied in this version only provides country-code and node colouring capability, and relies on a pre-provided file names countries.csv
, which is expected to contain columns for country_code
(ISO 2-letter), country_name
(common name), avg_long
(country average longitude), avg_lat
(country average latitude), and colour
.
With this data, we can then query RIPEstat for registered ASN resources:
- Run
ripe_stat.py
providing the lookup date,countries.csv
file and an output destination forresource_data.csv
.
With the collected data, this script creates a GRAPHML and GEXF format graph for analysis in a tool like Gephi.
- Run
bgp_topology_graph.py
with theadjacencies.csv
, end timestamp (this allows for time-based analysis in Gephi),resource_data.csv
, an output GEXF location and output GRAPHML location.
- (Preprint) Unveiling Internet Censorship: Analysing the Impact of Nation States’ Content Control Efforts on Internet Architecture and Routing Patterns
- (Abstract) From Internet to Emulator: A Virtual Testbed for Internet Routing Protocols
Josh Levett: @Levett_Josh / joshua.levett (at) york.ac.uk