ORION Network Telescope

Michalis Kallitsis edited this page Oct 26, 2021 · 2 revisions


Network telescopes collect and record unsolicited Internet-wide traffic destined to a routed but unused address space usually referred to as "Darknet" or "blackhole" address space [1]. Darknets can provide global perspective on Internet behavior and are one of the key data sources used by the networking and security communities to understand malware propagation [2--6], Distributed Denial of Service (DDoS) attacks [1, 7], network scanning [8, 9], routing misconfigurations [10], and Internet outages [11, 12]. Merit Network has been operating one of a small number of researcher-accessible network telescopes for more than 15 years, that has facilitated an array of empirical studies [2, 3, 9, 10, 13--21], to name a few.

Darknet Figure 1. Scanning and backscatter traffic captured in the Darknet.

Figure 1 illustrates the two basic types of Darknet traffic captured by a network telescope, namely 1) scanning and 2) backscatter. Note that Darknet traffic is unidirectional, i.e., the Darknet is completely passive and does not respond back. Next, we explain in some detail the origins and different types of scanning and backscatter.

Scanning arises as a result of multiple network activities. First, scanning includes reconnaissance activities from nefarious users or nation states that try to enumerate potential networking vulnerabilities of all Internet-accessible hosts. In their attempt to scan or "probe" the entire IPv4 address space they will eventually hit the dark IP space monitored by Merit's Darknet team; as a result, the scanning packets sent by these nefarious users get recorded in PCAP format (i.e., a binary format for encoding information for network packets). The PCAP data thus provide valuable information such as the origin IP of the scanning activity, the port (i.e., application) that was being targeted, the exact time, duration and intensity of the scanning activities, etc.

A second type of scanning activity includes malware that try to propagate and "infect" more victims. Malware refers to malicious code running on compromised Internet hosts which are usually members of large botnets (i.e., large groups of compromised hosts running the same malicious code and controlled remotely by a "command-and-control center"); an example of such large botnet is the Mirai botnet [2], first appeared in late 2016 and still in operation. Similar to the Internet-wide reconnaissance activities discussed earlier, the malware also attempt to enumerate the whole IPv4 space to discover and infect new botnet victims. Inevitably, they would also fall into the Darknet "trap" and their network actions would also be recorded in PCAP format.

There are also benign types of scanning captured in network telescopes. These usually include "research scanning" activities such as the ones undertaken by or These organizations (and many others, including several universities) constantly scan the whole IPv4 space and interact with several applications/ports aiming to find misconfigurations and/or unsecure network hosts. This information is useful for understanding the Internet ecosystem and assessing the hygiene and security posture of various organizations.

Misconfigurations may also contribute to unsolicited Internet traffic appearing in a Darknet. This might be due to software bugs or inaccurate typing (i.e., "fat-fingered" typos) of network addresses. See the works of Wustrow et al. [21] and Benson et al. [15] for some illustrative examples.

Backscatter traffic captured in the Darknet represents a completely different network event. Backscatter traffic reveals victims of Distributed Denial of Service (DDoS) attacks and was first studied by Moore et al. [22]. Recently, Jonker et al. [7] examined backscatter traffic from a large network telescope and identified millions of DDoS victims. As Figure 1 depicts, backscatter emerges as a consequence of randomly-spoofed-based attempts to overwhelm a targeted host with heavy traffic in order to render them incapable of servicing their normal users. The attackers try to conceal their identity by selecting, in a random manner, a different IP from their true one. They then transmit large volumes of traffic to the victim IP using the falsified source IP. If the falsified IP happens to belong within the dark IP space monitored by the network telescope, the response from the victim to that (spoofed/falsified) IP would reach the Darknet and get recorded in PCAP format. Hence, the Darknet can capture exactly the time and identity of the attacked host.

The ORION Network Telescope

orion Figure 2. The ORION Network Telescope. PCAP Darknet data are processed using a Go-based parser that extracts meaningful events (such as scanning and backscatter) and exports them, after being annotated with several auxiliary datasets, to Google's BigQuery.

The architecture of Merit's network telescope is depicted in Figure 2. The Merit Darknet currently consists of 1856 /24s subnets (i.e., around 500,000 dark IPs). The ORION (Observatory for cyber-Risk Insights and Outages of Networks) infrastructure---designed and engineered with support from the National Science Foundation [23]---receives and records Darknet packets in PCAP format, as explained earlier. Packets are written to disk on a continuous, non-stop, real-time basis and organized in hourly PCAP files.

The hourly PCAP files are then processed using a single-pass algorithm developed in Go by our team [24] to extract Darknet events, such as scanning and backscatter. The code adopts the methodology of Durumeric et al. [9] for defining events based on 1) the source IP of a packet appearing in the Darknet, 2) the port targeted and 3) fields from the transport protocol in the cases of TCP/UDP or fields from the ICMP header in case of ICMP. (All other protocols constitute a negligible fraction of Darknet traffic and for now are just grouped together.) An event essentially corresponds to a grouping of packets pertaining to the same activity (e.g., a scanning IP that targets a specific port and protocol) into a single record kept into our software's memory (cache) until it "expires" and written on disk. We follow the approach of Moore et al. [1] to expire events that do not send any packets after 627 seconds; this interval corresponds to the "typical longest gap" (see [1], Section III.E, "Flow Timeout Problem") expected to occur for an event lasting 2 days and having a target rate of 100 pps. Waiting for this specific timeout interval before we expire an event ensures that we will not split a sequence of packets belonging to the same event into two separate events or more.

All events are annotated with several summary statistics and other metadata, including the exact times the activity started and ended, the total bytes/packets transmitted into the Darknet, DNS information of the IP the event is associated with, geolocation information for the event's IP, routing and ASN information, and others, as shown in our schema below. These records are then uploaded to Google BigQuery (see Figure 2) to a table called, whose schema is described in detail next. Making the Darknet events available via BigQuery enables ease of data sharing, further analyses with standard SQL queries and integration with other data sources also in BigQuery such as data from and others.

Schema description for Darknet events

The full schema for the BigQuery table is shown below.

The traffic type field is an integer (as shown in the schema above) with the following encoding:

Analyzing Darknet events in BigQuery: some illustrative examples

This section provides some examples of SQL queries for analyzing Darknet events.

Example 1 (Scanning): Number of unique IPs per Country

The following query returns the distribution of unique scanning IPs per country. The following filters are given:

  • Focus on the last 24 hours
  • Filter by large scans only, i.e., ones that scan at least 10% of the whole IPv4 space (see [9])
  • Focus on only scanning events, namely traffic types 11, 16 and 0 (or equivalently TCP SYN, UDP and ICMPEchoRequest).
  COUNT(DISTINCT sourceip) AS unique
  UniqueDests > 0.10 * 1856 * 256
  AND (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  unique DESC

Example 2 (Scanning): Distribution of ports for a particular Country

Next, we examine the distribution of ports, with respect to number of scans, for a particular country. The following filters are given:

  • Focus on scans originating from China (CN)
  • Consider all Darknet events that started on "2021-01-01" and onwards
  • As above, focus only on scanning events, i.e. traffic types 11, 16 and 0.
  COUNT(1) AS scans
  (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  AND country = 'CN'
  AND First > "2021-01-01"
  scans DESC

Example 3 (Scanning): Distribution of scanning events by organization and ASN

The following query breaks down the number of scanning events by organization (as provided by MaxMind's service). The following filters are provided:

  • Focus on scanning events between 2020-08-01 and 2021-08-01.
  • As above, focus only on scanning events, i.e. traffic types 11, 16 and 0.
  COUNT(1) AS scans
  First > "2020-08-01"
  AND First <= "2021-08-01"
  AND (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  scans DESC

Example 4 (Scanning): Identify scanners from a particular ASN

The following query is useful for identifying scanners from a particular ASN. This is useful for identifying scanning hosts within "bulletproof" ASNs (i.e., ASNs that tolerate dubious Internet activities) or ASNs that are mismanaged and have poor network hygiene.

Furthermore, this type of query can also be employed for situational awareness and specifically for identifying compromised hosts within one's own network. Recall that anything reaching the Darknet is inherently suspicious and thus this query is a quick way to validate that nothing unexpected appears in the Darknet.

The following filters are employed:

  • Searching events for the last 24 hours only
  • Focusing on entries where a RDNS entry exists
  • Focusing on a particular ASN (237 in this case, i.e., Merit Network)
  • As above, focus only on scanning events, i.e. traffic types 11, 16 and 0.
  UNNEST(rdns) AS dns
  AND (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  AND asn = 237
  packets DESC

Example 5 (Scanning): Join Darknet events with data

The query below is slightly more complicated that the earlier ones but highlights the advantage of hosting data in BigQuery and integrating / joining multiple data sources together.

The query zooms into a suspicious scanning event originating from India and observed around mid-September 2020. (The event seems to be associated with a Mozi botnet outbreak; see [25].) It focuses on "Telnet" traffic (port 23) and looks for the distribution of host "tags" as provided by Censys. The tags provide useful insights about the types of devices infected during that particular botnet outbreak.

  source_tag_table AS (
    COUNT(1) AS count
    sourceIP = `censys-io.ipv4_public.20200915`.ip,
    UNNEST(tags) AS tag
    AND country = "IN"
    AND First >= "2020-09-15"
    AND First < "2020-09-16"
    count DESC )
  count / (
    source_tag_table) * 100 AS fraction
  count DESC  

Example 6 (Backscatter): Find victims of DDoS attacks

The following example queries the 'events' table to extract all backscatter events that identify victims of DDoS attacks (see [7, 22]). The following filters are applied:

  • Focus on backscatter events captured within the last 24 hours
  • Focus on only backscatter events, namely traffic types 12, 14 and 1 (or equivalently TCP SYN/ACK, TCP RST and ICMPEchoReply).
  AND (traffic = 12
    OR traffic = 14
    OR traffic = 1)
  packets DESC

BigQuery API for programmatic access to data

The queries exhibited above can be readily executed at the BigQuery Web-console. However, should someone is interested to programmatically execute the same queries and incorporate the results into their own pipeline, Google provides several client libraries, a command-line utility and a REST API [26].

For completeness, we provide two methods to obtain the results of the query shown in Example 4. First, we show Python code to obtain the query results as a Pandas dataframe object. The code, shown below, also exports the dataframe object as a CSV file for further processing / analysis.

import google.auth
from import bigquery
from import bigquery_storage_v1beta1
from google.oauth2 import service_account
import json

# Main program

query_string = f"""
AND (traffic = 11
OR traffic = 16
OR traffic = 0) AND dns IS NOT NULL AND asn = 237

creds = "/etc/creds.json" # Add the path of your JSON creds here
project = "BQ_PROJECT_NAME" # Add the project name here

credentials = service_account.Credentials.from_service_account_file(creds)

# Make clients.
bqclient = bigquery.Client(
bqstorageclient = bigquery_storage_v1beta1.BigQueryStorageClient(

dataframe = (

dataframe.to_csv('/tmp/out.csv', float_format = "%.2f")

The second method utilizes the bq command-line utility provided by Google [27]. The command is as follows:

cat query.sql | bq query --service_account_credential_file '/etc/creds.json' --nouse_legacy_sql \
--format csv -n 100000000 > /tmp/out.csv

In the command above, we assume that the query of interest is defined in query.sql.


