Skip to content

ORION Network Telescope

Michalis Kallitsis edited this page Oct 26, 2021 · 2 revisions

Introduction

Network telescopes collect and record unsolicited Internet-wide traffic destined to a routed but unused address space usually referred to as "Darknet" or "blackhole" address space [1]. Darknets can provide global perspective on Internet behavior and are one of the key data sources used by the networking and security communities to understand malware propagation [2--6], Distributed Denial of Service (DDoS) attacks [1, 7], network scanning [8, 9], routing misconfigurations [10], and Internet outages [11, 12]. Merit Network has been operating one of a small number of researcher-accessible network telescopes for more than 15 years, that has facilitated an array of empirical studies [2, 3, 9, 10, 13--21], to name a few.

Darknet Figure 1. Scanning and backscatter traffic captured in the Darknet.

Figure 1 illustrates the two basic types of Darknet traffic captured by a network telescope, namely 1) scanning and 2) backscatter. Note that Darknet traffic is unidirectional, i.e., the Darknet is completely passive and does not respond back. Next, we explain in some detail the origins and different types of scanning and backscatter.

Scanning arises as a result of multiple network activities. First, scanning includes reconnaissance activities from nefarious users or nation states that try to enumerate potential networking vulnerabilities of all Internet-accessible hosts. In their attempt to scan or "probe" the entire IPv4 address space they will eventually hit the dark IP space monitored by Merit's Darknet team; as a result, the scanning packets sent by these nefarious users get recorded in PCAP format (i.e., a binary format for encoding information for network packets). The PCAP data thus provide valuable information such as the origin IP of the scanning activity, the port (i.e., application) that was being targeted, the exact time, duration and intensity of the scanning activities, etc.

A second type of scanning activity includes malware that try to propagate and "infect" more victims. Malware refers to malicious code running on compromised Internet hosts which are usually members of large botnets (i.e., large groups of compromised hosts running the same malicious code and controlled remotely by a "command-and-control center"); an example of such large botnet is the Mirai botnet [2], first appeared in late 2016 and still in operation. Similar to the Internet-wide reconnaissance activities discussed earlier, the malware also attempt to enumerate the whole IPv4 space to discover and infect new botnet victims. Inevitably, they would also fall into the Darknet "trap" and their network actions would also be recorded in PCAP format.

There are also benign types of scanning captured in network telescopes. These usually include "research scanning" activities such as the ones undertaken by Censys.io or Shodan.io. These organizations (and many others, including several universities) constantly scan the whole IPv4 space and interact with several applications/ports aiming to find misconfigurations and/or unsecure network hosts. This information is useful for understanding the Internet ecosystem and assessing the hygiene and security posture of various organizations.

Misconfigurations may also contribute to unsolicited Internet traffic appearing in a Darknet. This might be due to software bugs or inaccurate typing (i.e., "fat-fingered" typos) of network addresses. See the works of Wustrow et al. [21] and Benson et al. [15] for some illustrative examples.

Backscatter traffic captured in the Darknet represents a completely different network event. Backscatter traffic reveals victims of Distributed Denial of Service (DDoS) attacks and was first studied by Moore et al. [22]. Recently, Jonker et al. [7] examined backscatter traffic from a large network telescope and identified millions of DDoS victims. As Figure 1 depicts, backscatter emerges as a consequence of randomly-spoofed-based attempts to overwhelm a targeted host with heavy traffic in order to render them incapable of servicing their normal users. The attackers try to conceal their identity by selecting, in a random manner, a different IP from their true one. They then transmit large volumes of traffic to the victim IP using the falsified source IP. If the falsified IP happens to belong within the dark IP space monitored by the network telescope, the response from the victim to that (spoofed/falsified) IP would reach the Darknet and get recorded in PCAP format. Hence, the Darknet can capture exactly the time and identity of the attacked host.

The ORION Network Telescope

orion Figure 2. The ORION Network Telescope. PCAP Darknet data are processed using a Go-based parser that extracts meaningful events (such as scanning and backscatter) and exports them, after being annotated with several auxiliary datasets, to Google's BigQuery.

The architecture of Merit's network telescope is depicted in Figure 2. The Merit Darknet currently consists of 1856 /24s subnets (i.e., around 500,000 dark IPs). The ORION (Observatory for cyber-Risk Insights and Outages of Networks) infrastructure---designed and engineered with support from the National Science Foundation [23]---receives and records Darknet packets in PCAP format, as explained earlier. Packets are written to disk on a continuous, non-stop, real-time basis and organized in hourly PCAP files.

The hourly PCAP files are then processed using a single-pass algorithm developed in Go by our team [24] to extract Darknet events, such as scanning and backscatter. The code adopts the methodology of Durumeric et al. [9] for defining events based on 1) the source IP of a packet appearing in the Darknet, 2) the port targeted and 3) fields from the transport protocol in the cases of TCP/UDP or fields from the ICMP header in case of ICMP. (All other protocols constitute a negligible fraction of Darknet traffic and for now are just grouped together.) An event essentially corresponds to a grouping of packets pertaining to the same activity (e.g., a scanning IP that targets a specific port and protocol) into a single record kept into our software's memory (cache) until it "expires" and written on disk. We follow the approach of Moore et al. [1] to expire events that do not send any packets after 627 seconds; this interval corresponds to the "typical longest gap" (see [1], Section III.E, "Flow Timeout Problem") expected to occur for an event lasting 2 days and having a target rate of 100 pps. Waiting for this specific timeout interval before we expire an event ensures that we will not split a sequence of packets belonging to the same event into two separate events or more.

All events are annotated with several summary statistics and other metadata, including the exact times the activity started and ended, the total bytes/packets transmitted into the Darknet, DNS information of the IP the event is associated with, geolocation information for the event's IP, routing and ASN information, and others, as shown in our schema below. These records are then uploaded to Google BigQuery (see Figure 2) to a table called orion_network_telescope.events, whose schema is described in detail next. Making the Darknet events available via BigQuery enables ease of data sharing, further analyses with standard SQL queries and integration with other data sources also in BigQuery such as data from Censys.io and others.

Schema description for Darknet events

The full schema for the BigQuery table orion_network_telescope.events is shown below.

{
  "description": "the source IP associated with this event",
  "name": "SourceIP",
  "type": "STRING"
},
{
  "description": "the port associated with this event",
  "name": "Port",
  "type": "INTEGER"
},
{
  "description": "traffic type (i.e., protocol and/or protocol fields) associated with this event",
  "name": "Traffic",
  "type": "INTEGER"
},
{
  "description": "start time (UTC) for this event",
  "name": "First",
  "type": "TIMESTAMP"
},
{
  "description": "end time (UTC) for this event",
  "name": "Last",
  "type": "TIMESTAMP"
},
{
  "description": "total number of bytes for this event",
  "name": "Bytes",
  "type": "INTEGER"
},
{
  "description": "total number of packets for this event",
  "name": "Packets",
  "type": "INTEGER"
},
{
  "description": "number of unique /24 Darknet subnets associated with this event",
  "name": "UniqueDest24s",
  "type": "INTEGER"
},
{
  "description": "number of unique Darknet IPs associated with this event",
  "name": "UniqueDests",
  "type": "INTEGER"
},
{
  "description": "flag indicating whether the Mirai signature is present in this event",
  "name": "Mirai",
  "type": "BOOLEAN"
},
{
  "description": "flag indicating whether the ZMAP signature is present in this event",
  "name": "Zmap",
  "type": "BOOLEAN"
},
{
  "description": "flag indicating whether the Masscan signature is present in this event",
  "name": "Masscan",
  "type": "BOOLEAN"
},
{
  "description": "Geolocation information from MaxMind for this event: latitude",
  "name": "Lat",
  "type": "FLOAT"
},
{
  "description": "Geolocation information from MaxMind for this event: longitude",
  "name": "Long",
  "type": "FLOAT"
},
{
  "description": "Geolocation information from MaxMind for this event: country",
  "name": "Country",
  "type": "STRING"
},
{
  "description": "Geolocation information from MaxMind for this event: city",
  "name": "City",
  "type": "STRING"
},
{
  "description": "Autonomous System Number (ASN) provided by MaxMind",
  "name": "ASN",
  "type": "INTEGER"
},
{
  "description": "Organization name provided by MaxMind",
  "name": "Org",
  "type": "STRING"
},
{
  "description": "Routing prefix of source IP; information obtained using CAIDA's pfx2as dataset.",
  "name": "Prefix",
  "type": "STRING"
},
{
  "description": "An array containing (up to) 3 full packets for this event",
  "name": "Samples",
  "type": "BYTES",
  "mode": "REPEATED"
},
{
  "description": "The reverse DNS record (if one exists) for the source IP of this event",
  "name": "RDNS",
  "type": "STRING",
  "mode": "REPEATED"
},
{
  "description": "For traffic types associated with TCP, a mnemonic description thereof (e.g., TCP SYN/ACK)",
  "name": "TCP",
  "type": "STRING"
},
{
  "description": "For traffic types associated with ICMP, a mnemonic description thereof (e.g., ICMP Echo Request)",
  "name": "ICMP",
  "type": "STRING"
}

The traffic type field is an integer (as shown in the schema above) with the following encoding:

Type Code
ICMPEchoRequest 0
ICMPEchoReply 1
ICMPDestinationUnreachable 2
ICMPSourceQuench 3
ICMPRedirect 4
ICMPTimeExceeded 5
ICMPParameterProblem 6
ICMPTimestampReply 7
ICMPInfoReply 8
ICMPAddressMaskReply 9
ICMPOther 10
TCPSYN 11
TCPSYNACK 12
TCPACK 13
TCPRST 14
TCPOther 15
UDP 16
UnknownTraffic 17

Analyzing Darknet events in BigQuery: some illustrative examples

This section provides some examples of SQL queries for analyzing Darknet events.

Example 1 (Scanning): Number of unique IPs per Country

The following query returns the distribution of unique scanning IPs per country. The following filters are given:

  • Focus on the last 24 hours
  • Filter by large scans only, i.e., ones that scan at least 10% of the whole IPv4 space (see [9])
  • Focus on only scanning events, namely traffic types 11, 16 and 0 (or equivalently TCP SYN, UDP and ICMPEchoRequest).
SELECT
  country,
  COUNT(DISTINCT sourceip) AS unique
FROM
  `orion_network_telescope.events`
WHERE
  UniqueDests > 0.10 * 1856 * 256
  AND (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  AND First > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  AND First <= CURRENT_TIMESTAMP()
GROUP BY
  country
ORDER BY
  unique DESC

Example 2 (Scanning): Distribution of ports for a particular Country

Next, we examine the distribution of ports, with respect to number of scans, for a particular country. The following filters are given:

  • Focus on scans originating from China (CN)
  • Consider all Darknet events that started on "2021-01-01" and onwards
  • As above, focus only on scanning events, i.e. traffic types 11, 16 and 0.
SELECT
  port,
  COUNT(1) AS scans
FROM
  `orion_network_telescope.events`
WHERE
  (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  AND country = 'CN'
  AND First > "2021-01-01"
GROUP BY
  port
ORDER BY
  scans DESC

Example 3 (Scanning): Distribution of scanning events by organization and ASN

The following query breaks down the number of scanning events by organization (as provided by MaxMind's service). The following filters are provided:

  • Focus on scanning events between 2020-08-01 and 2021-08-01.
  • As above, focus only on scanning events, i.e. traffic types 11, 16 and 0.
SELECT
  org,
  asn,
  COUNT(1) AS scans
FROM
  `orion_network_telescope.events`
WHERE
  First > "2020-08-01"
  AND First <= "2021-08-01"
  AND (traffic = 11
    OR traffic = 16
    OR traffic = 0)
GROUP BY
  org,
  asn
ORDER BY
  scans DESC

Example 4 (Scanning): Identify scanners from a particular ASN

The following query is useful for identifying scanners from a particular ASN. This is useful for identifying scanning hosts within "bulletproof" ASNs (i.e., ASNs that tolerate dubious Internet activities) or ASNs that are mismanaged and have poor network hygiene.

Furthermore, this type of query can also be employed for situational awareness and specifically for identifying compromised hosts within one's own network. Recall that anything reaching the Darknet is inherently suspicious and thus this query is a quick way to validate that nothing unexpected appears in the Darknet.

The following filters are employed:

  • Searching events for the last 24 hours only
  • Focusing on entries where a RDNS entry exists
  • Focusing on a particular ASN (237 in this case, i.e., Merit Network)
  • As above, focus only on scanning events, i.e. traffic types 11, 16 and 0.
SELECT
  sourceip,
  dns,
  port,
  traffic,
  packets
FROM
  `orion_network_telescope.events`,
  UNNEST(rdns) AS dns
WHERE
  First > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  AND First <= CURRENT_TIMESTAMP()
  AND (traffic = 11
    OR traffic = 16
    OR traffic = 0)
  AND dns IS NOT NULL
  AND asn = 237
ORDER BY
  packets DESC

Example 5 (Scanning): Join Darknet events with Censys.io data

The query below is slightly more complicated that the earlier ones but highlights the advantage of hosting data in BigQuery and integrating / joining multiple data sources together.

The query zooms into a suspicious scanning event originating from India and observed around mid-September 2020. (The event seems to be associated with a Mozi botnet outbreak; see [25].) It focuses on "Telnet" traffic (port 23) and looks for the distribution of host "tags" as provided by Censys. The tags provide useful insights about the types of devices infected during that particular botnet outbreak.

WITH
  source_tag_table AS (
  SELECT
    tag,
    COUNT(1) AS count
  FROM
    `orion_network_telescope.events`
  INNER JOIN
    `censys-io.ipv4_public.20200915`
  ON
    sourceIP = `censys-io.ipv4_public.20200915`.ip,
    UNNEST(tags) AS tag
  WHERE
    port=23
    AND country = "IN"
    AND First >= "2020-09-15"
    AND First < "2020-09-16"
  GROUP BY
    tag
  ORDER BY
    count DESC )
SELECT
  tag,
  count,
  count / (
  SELECT
    SUM(count)
  FROM
    source_tag_table) * 100 AS fraction
FROM
  source_tag_table
ORDER BY
  count DESC  

Example 6 (Backscatter): Find victims of DDoS attacks

The following example queries the 'events' table to extract all backscatter events that identify victims of DDoS attacks (see [7, 22]). The following filters are applied:

  • Focus on backscatter events captured within the last 24 hours
  • Focus on only backscatter events, namely traffic types 12, 14 and 1 (or equivalently TCP SYN/ACK, TCP RST and ICMPEchoReply).
SELECT
  sourceip,
  port,
  traffic,
  first,
  packets,
  TCP, ICMP
FROM
  `orion_network_telescope.events`
WHERE
  First > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  AND First <= CURRENT_TIMESTAMP()
  AND (traffic = 12
    OR traffic = 14
    OR traffic = 1)
ORDER BY
  packets DESC

BigQuery API for programmatic access to data

The queries exhibited above can be readily executed at the BigQuery Web-console. However, should someone is interested to programmatically execute the same queries and incorporate the results into their own pipeline, Google provides several client libraries, a command-line utility and a REST API [26].

For completeness, we provide two methods to obtain the results of the query shown in Example 4. First, we show Python code to obtain the query results as a Pandas dataframe object. The code, shown below, also exports the dataframe object as a CSV file for further processing / analysis.

import google.auth
from google.cloud import bigquery
from google.cloud import bigquery_storage_v1beta1
from google.oauth2 import service_account
import json

# Main program

query_string = f"""
SELECT
  sourceip,
  dns,
  port,
  traffic,
  packets
FROM
  `orion_network_telescope.events`,
UNNEST(rdns) AS dns WHERE
First > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR) AND First <= CURRENT_TIMESTAMP()
AND (traffic = 11
OR traffic = 16
OR traffic = 0) AND dns IS NOT NULL AND asn = 237
ORDER BY packets DESC
"""

creds = "/etc/creds.json" # Add the path of your JSON creds here
project = "BQ_PROJECT_NAME" # Add the project name here

credentials = service_account.Credentials.from_service_account_file(creds)

# Make clients.
bqclient = bigquery.Client(
    credentials=credentials,
    project=project,
)
bqstorageclient = bigquery_storage_v1beta1.BigQueryStorageClient(
    credentials=credentials
)

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_arrow(bqstorage_client=bqstorageclient)
    .to_pandas()
)

dataframe.to_csv('/tmp/out.csv', float_format = "%.2f")

The second method utilizes the bq command-line utility provided by Google [27]. The command is as follows:

cat query.sql | bq query --service_account_credential_file '/etc/creds.json' --nouse_legacy_sql \
--format csv -n 100000000 > /tmp/out.csv

In the command above, we assume that the query of interest is defined in query.sql.

References

[1] D. Moore, C. Shannon, G. Voelker, and S. Savage. Network Telescopes: Technical Report. Technical report, Cooperative Association for Internet Data Analysis (CAIDA), Jul 2004.

[2] Manos Antonakakis, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, Jaime Cochran, Zakir Durumeric, J. Alex Halderman, Luca Invernizzi, Michalis Kallitsis, Deepak Kumar, Chaz Lever, Zane Ma, Joshua Mason, Damian Menscher, Chad Seaman, Nick Sullivan, Kurt Thomas, and Yi Zhou. Understanding the mirai botnet. In 26th USENIX Security Symposium (USENIX Security 17), pages 1093–1110, Vancouver, BC, 2017. USENIX Association.

[3] Michael Bailey, Evan Cooke, Farnam Jahanian, Jose Nazario, and David Watson. The internet motion sensor: A distributed blackhole monitoring system. In Proceedings of Network and Distributed System Security Symposium (NDSS 05, pages 167–179, 2005.

[4] D. Inoue, M. Eto, K. Yoshioka, S. Baba, K. Suzuki, J. Nakazato, K. Ohtaka, and K. Nakao. nicter: An incident analysis system toward binding network monitoring with malware analysis. In 2008 WOMBAT Workshop on Information Security Threats Data Collection and Sharing, pages 58–66, April 2008.

[5] D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver. Inside the slammer worm. IEEE Security Privacy, 1(4):33–39, July 2003.

[6] C. Shannon and D. Moore. The spread of the witty worm. IEEE Security Privacy, 2(4):46–50, July 2004.

[7] Mattijs Jonker, Alistair King, Johannes Krupp, Christian Rossow, Anna Sperotto, and Alberto Dainotti. Millions of targets under attack: A macroscopic characterization of the dos ecosystem. In Proceedings of the 2017 Internet Measurement Conference, IMC ’17, pages 100–113, New York, NY, USA, 2017. ACM.

[8] A. Dainotti, A. King, K. Claffy, F. Papale, and A. Pescap. Analysis of a /0; stealth scan from a botnet. IEEE/ACM Transactions on Networking, 23(2):341–354, April 2015.

[9] Zakir Durumeric, Michael Bailey, and J. Alex Halderman. An internet-wide view of internet-wide scanning. In Proceedings of the 23rd USENIX Conference on Security Symposium, SEC’14, pages 65–78, Berkeley, CA, USA, 2014. USENIX Association.

[10] Jakub Czyz, Kyle Lady, Sam G. Miller, Michael Bailey, Michael Kallitsis, and Manish Karir. Understanding ipv6 internet background radiation. In Proceedings of the 2013 Conference on Internet Measurement Conference, IMC ’13, pages 105–118, New York, NY, USA, 2013. ACM.

[11] Alberto Dainotti, Roman Amman, Emile Aben, and Kimberly C. Claffy. Extracting benefit from harm: Using malware pollution to analyze the impact of political and geophysical events on the internet. SIGCOMM Comput. Commun. Rev., 42(1):31–39, January 2012.

[12] Alberto Dainotti, Claudio Squarcella, Emile Aben, Kimberly C. Claffy, Marco Chiesa, Michele Russo, and Antonio Pescape. Analysis of country-wide internet outages caused by censorship. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC ’11, pages 1–18, New York, NY, USA, 2011. ACM.

[13] M. Bailey, E. Cooke, F. Jahanian, A. Myrick, and S. Sinha. Practical darknet measurement. In 2006 40th Annual Conference on Information Sciences and Systems, pages 1496–1501, March 2006.

[14] Michael Bailey, Evan Cooke, Farnam Jahanian, Niels Provos, Karl Rosaen, and David Watson. Data reduction for the scalable automated analysis of distributed darknet traffic. In Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, IMC ’05, pages 21–21, Berkeley, CA, USA, 2005. USENIX Association.

[15] Karyn Benson, Alberto Dainotti, kc claffy, Alex C. Snoeren, and Michael Kallitsis. Leveraging internet background radiation for opportunistic network analysis. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference, IMC ’15, pages 423–436, New York, NY, USA, 2015. ACM.

[16] Jakub Czyz, Michael Kallitsis, Manaf Gharaibeh, Christos Papadopoulos, Michael Bailey, and Manish Karir. Taming the 800 pound gorilla: The rise and decline of ntp ddos attacks. In Proceedings of the 2014 Conference on Internet Measurement Conference, IMC ’14, pages 435–448, New York, NY, USA, 2014. ACM.

[17] Alberto Dainotti, Karyn Benson, Alistair King, kc claffy, Michael Kallitsis, Eduard Glatz, and Xenofontas Dimitropoulos. Estimating internet address space usage through passive measure- ments. SIGCOMM Comput. Commun. Rev., 44(1):42–49, December 2013.

[18] Yang Liu, Armin Sarabi, Jing Zhang, Parinaz Naghizadeh, Manish Karir, Michael Bailey, and Mingyan Liu. Cloudy with a chance of breach: Forecasting cyber security incidents. In 24th USENIX Security Symposium (USENIX Security 15), pages 1009–1024, Washington, D.C., 2015. USENIX Association.

[19] A. Mirian, Z. Ma, D. Adrian, M. Tischer, T. Chuenchujit, T. Yardley, R. Berthier, J. Mason, Z. Durumeric, J. A. Halderman, and M. Bailey. An internet-wide view of ics devices. In 2016 14th Annual Conference on Privacy, Security and Trust (PST), pages 96–103, Dec 2016.

[20] Matthew Sargent, Jakub Czyz, Mark Allman, and Michael Bailey. On the power and limitations of detecting network filtering via passive observation. In Passive and Active Measurement, volume 8995 of Lecture Notes in Computer Science, pages 165–178. Springer International Publishing, 2015.

[21] Eric Wustrow, Manish Karir, Michael Bailey, Farnam Jahanian, and Geoff Huston. Internet background radiation revisited. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC ’10, pages 62–74, New York, NY, USA, 2010. ACM.

[22] David Moore, Geoffrey M. Voelker and Stefan Savage, Inferring Internet Denial-of-Service Activity, USENIX Security, 2001.

[23] Michalis Kallitsis, Zakir Durumeric, Stilian Stoev, CNS-1823192, CRI: II-New: ORION: Observatory for Cyber-Risk Insights and Outages of Networks.

[24] Conrad Edwards, Mark Weiman, Zakir Durumeric, Michalis Kallitsis, "Go-based Darknet parser to extract Darknet events", https://github.com/Merit-Research/darknet-events

[25] Christian Dietrich, A detailed look into the Mozi P2P IoT botnet, https://www.youtube.com/watch?v=HGYpymyXvio

[26] Google BigQuery, BigQuery API Referecne, https://cloud.google.com/bigquery/docs/reference/libraries

[27] Google, Inc., The bq command-line utility, https://cloud.google.com/bigquery/docs/bq-command-line-tool