Skip to content

Latest commit

 

History

History
116 lines (72 loc) · 6.43 KB

README.md

File metadata and controls

116 lines (72 loc) · 6.43 KB

Internet Topology Mapping

A SYSTRON Lab project
from the Department of Computer Science at the University of York

Table of Contents
  1. About The Project
  2. Getting Started
  3. Publications
  4. Contact

About The Project

Heightened interest from nation states to perform content censorship make it evermore critical to identify the impact of censorship efforts on the Internet. We undertake a study of Internet architecture, capturing the state of Internet topology with greater completeness than existing state-of-the-art.

There are a small number of nation states that do not follow this trend, for which we provide an analysis and explanation, demonstrating a relationship between geographical factors in addition to geopolitics. In summary, our work provides a deeper understanding of how these censorship measures impact the overall functioning and dynamics of the Internet.

This previous version of our code, which forms the foundation of our current approach, does not include the complete codebase used in our analysis, which targets specific HPC architecture. We intend to release a platform-agnostic version (which includes RIPE Atlas measurements) shortly.

(back to top)

Dependencies Overview

(back to top)

Getting Started

You can quickly create a copy of this project locally to perform your own analysis. Depending on the specified timeframes and the quantity of data, running from this codebase is relatively memory-intensive. In using Pandas, we have done little to optimise the memory requirements of our approach.

BGP Data Collection

We collect BGP table data from RIPE Routing Information Service (RIS), RouteViews and Packet Clearing House (PCH). This provides a snapshot of the routing table from a variety of geographic locations. Some data from these sources may be duplicated: either because the routes they see are the same, or because in some cases the route collectors may be situated in the same or nearby physical locations.

  1. Run bgp_collector_ris_rv.py with a supplied timestamp (TS_START and TS_END) range to collect BGP data from the RIPE RIS and RouteViews collectors for the specified time range. In the code supplied, we use RIB tables rather than UPDATE files, which mean only the retained (router-determined locally optimal) paths. To increase the dataset, using UPDATE files to observe routes outside of those retained can give better visibility.
  2. Run bgp_collector_pch.py with a supplied date (YEAR, MONTH, DAY) to collect BGP data from a given day. If observing a multi-day period, this script can be run multiple times with a different date supplied. This script uses Regex rather than a pandas converter to extract path information from the text-based files, so non-as_path information is ignored.

Adjacency Inferencing

  1. Run bgp_adjacency_creator.py with an input file listing all of the generated CSV files from (1) and (2), as well as an output destination for an adjacencies.csv file. This will contain a long list of non-duplicated adjacencies based on neighbours in the listed as_path, which enables us to observe multiple adjacencies for one AS, as well as ignoring prepending.

Registry Data

The code supplied in this version only provides country-code and node colouring capability, and relies on a pre-provided file names countries.csv, which is expected to contain columns for country_code (ISO 2-letter), country_name (common name), avg_long (country average longitude), avg_lat (country average latitude), and colour.

With this data, we can then query RIPEstat for registered ASN resources:

  1. Run ripe_stat.py providing the lookup date, countries.csv file and an output destination for resource_data.csv.

Topology Graph

With the collected data, this script creates a GRAPHML and GEXF format graph for analysis in a tool like Gephi.

  1. Run bgp_topology_graph.py with the adjacencies.csv, end timestamp (this allows for time-based analysis in Gephi), resource_data.csv, an output GEXF location and output GRAPHML location.

(back to top)

Publications

(back to top)

Contact

Josh Levett: @Levett_Josh / joshua.levett (at) york.ac.uk

(back to top)