NAT_paper_ACM_short.tex

\documentclass{acm_proc_article-sp}
% http://www.acm.org/sigs/publications/proceedings-templates


% ------
% Fonts and typesetting settings
\usepackage[sc]{mathpazo}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
% \linespread{1.05} % Palatino needs more space between lines
\usepackage{microtype}


% ------
% Page layout
% \usepackage[hmarginratio=1:1,top=32mm,columnsep=20pt]{geometry}
\usepackage[font=it]{caption}
\usepackage{paralist}
% \usepackage{multicol}

% ------
% Lettrines
% \usepackage{lettrine}


% ------
% Abstract
% \usepackage{abstract}
% 	\renewcommand{\abstractnamefont}{\normalfont\bfseries}
% 	\renewcommand{\abstracttextfont}{\normalfont\small\itshape}

% ------
% Clickable URLs (optional)
\usepackage{hyperref}


% ------
% BibTex for bibliography
\usepackage[square,numbers]{natbib}
%\usepackage[square,sort&compress,authoryear]{natbib}

% ------
% Math
% \usepackage{amsthm}
\usepackage{amsmath, mathtools}
\usepackage{fixltx2e}
\usepackage{wrapfig}
% \usepackage{tikz}
\usepackage{float}
\usepackage{multirow}
%\usepackage{appendices}
\usepackage{subcaption}

\usepackage{algorithm}% http://ctan.org/pkg/algorithm
\usepackage{algpseudocode}% http://ctan.org/pkg/algorithmicx
\usepackage[compatibility=false]{caption}% http://ctan.org/pkg/caption

\newtheorem{mydef}{Definition}
\newcommand{\ignore}[1]{}
% \setlength{\parskip}{0cm plus4mm minus3mm}


%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\title{Traversing symmetric NAT with predictable port allocation}

\numberofauthors{1}
\author{
% 1st. author
\alignauthor
Du\v{s}an Klinec, Vashek Maty\'{a}\v{s}\\
       \affaddr{Faculty of Informatics, Masaryk University}\\
       \affaddr{Brno, Czech Republic}\\
       \email{\{xklinec,matyas\}@fi.muni.cz}
% % 2nd. author
% \alignauthor
% G.K.M. Tobin\titlenote{The secretary disavows
% any knowledge of this author's actions.}\\
%        \affaddr{Institute for Clarity in Documentation}\\
%        \affaddr{P.O. Box 1212}\\
%        \affaddr{Dublin, Ohio 43017-6221}\\
%        \email{webmaster@marysville-ohio.com}
% % 3rd. author
}

\maketitle
\begin{abstract}
\noindent Network Address Translators often cause trouble for VoIP and other P2P services since central servers are needed for communication.
The presence of such potentially malicious hosts in a communication path is not desired, mainly due to security consequences, poor link quality 
and increased cost.
Several solutions exist for traversing NAT, but a symmetric one is still problematic. We propose algorithms using a single source port for 
symmetric NAT traversal. Each with different properties and applicability. 
% Our paper investigates the NAT traversal problem by stating the problem formally and modeling a network connected to 
% NAT as a random process, in order to evaluate algorithms in different settings.
\end{abstract}

\section{Introduction}
Network Address Translators (NATs) have been increasing in popularity in recent years, 
mainly as a means of slowing down IPv4 address space depletion. On one hand, NAT offers
the ability to share a public, globally routable IP address among many internal network hosts. On 
the other hand, it breaks the basic end-to-end principle of the Internet and thus favors server-client 
communication model for clients connected behind NAT.

In many applications, a peer-to-peer (P2P) connection is of crucial importance and the quality of a service depends on 
its parameters heavily. Main motivation for this work is Voice-over-IP (VoIP) where direct connection 
between communicating parties is necessary. If P2P connection is not possible to establish, relay 
servers have to be used (TURN~\citep{rfc5766} protocol is used). The presence 
of another node in the communication path can have negative effects on the communication channel quality 
such as increased latency, jitter, lower bandwidth. The relay server is a potential single point of failure and 
has to scale with increasing demands of the network, etc.
% Thus relay servers require non-negligible resources and become expensive infrastructure components for VoIP and other service 
% providers where establishing a direct connection between parties is of high importance.

In this paper, we focused on a symmetric NAT with a predictable port allocation function. 
The contributions of this paper are as follows:%
% \begin{compactitem}
% \item 
a) we propose new effective algorithms for establishing a direct connection between hosts behind such NATs; %
% \item 
b) we demonstrate the implementations of these algorithms; %, together with a simulation based on a theoretical approach from a queueing theory. These implementations we provide in public domain.%
% \item 
c) we tested our algorithms using data from a real network; % (of our mobile phone operator).%
% \item 
d) last but not least, we provide a formalization of the NAT traversal problem.
% \end{compactitem}
% We propose 
% new algorithms for establishing direct connection between hosts behind such NATs that are effective and 
% have low system requirements.

\section{Existing techniques}
Many different solutions for traversing non-symmetric NATs exist, e.g., NAT traversal in Internet Key Exchange~\citep{rfc3947} or 
Interactive Connectivity Establishment~(ICE)~\citep{rfc5245}. %Terredo\ignore{tunneling}~\citep{rfc4380}.

Some devices with NAT functionality provide special interface enabling traversal, for example NAT Port Mapping Protocol~\citep{rfc6886}. However, in general, 
it is not widely deployed and it cannot be relied upon in a general case. 
Another solution is ICE\ignore{~\citep{rfc5245}}. It is used as a technique to traverse NAT in most applications nowadays 
although it does not work with symmetric NATs. ICE mainly uses STUN~\citep{rfc5389} and TURN\ignore{~\citep{rfc5766}} for its operation.

% \section{Related work}
Interesting algorithms for traversing symmetric NAT were proposed by Y.~Takeda in \citep{takeda}, although does not cover the case with both clients using symmetric NAT
with a port sensitive allocation function. Takeda mentions that sending more packets does not increase a probability of establishing a new connection.
We show that it is actually possible. Wang~\emph{et~al.} \citep{Wang:2006:RSN:1156422.1156550} propose a NAT traversal algorithm for the scenario where two hosts are
behind symmetric NATs, based on changing of a source port, while we explore options with a constant source port.

RFC~5128 \citep{rfc5128} maps current state of the art in NAT traversal, but does not cover our case. 
The research by Wei~\emph{et~al.} in \citep{wei} is devoted to symmetric NAT traversal, but their algorithm is using a different from ours.
Disadvantage of their approach comes with the Time To Live~(TTL) parameter modification of the IP~packet. This is usually considered
as a privileged operation in the operating system and super-user privileges are required for this modification. 

In our work we study the situation where two parties, $A$ and $B$, try to establish a direct connection, both of them are behind 
a symmetric NAT (possibly a cascade) with a port sensitive, predictable allocation function. 
%Other scenarios with predictable allocation function are quite easy to solve and covered in previous research. 

\section{Network Address Translation}
The notation used for the rest of our paper follows.

% \paragraph{Notation.}
% Symmetric NAT could be defined as follows.

\begin{mydef}
% \begin{align*}
$\mathbb{SC} = \mathbb{IP} \times \mathbb{PORT}$
% \end{align*}%
 is a Socket. $\mathbb{IP}$ is the set of all possible IP addresses and $\mathbb{PORT}$ 
is the set of all possible ports. %, basically $\mathbb{PORT} = \{1,2,\dots,65535\}$.
% \begin{align*}
$\mathbb{SP} = \mathbb{IP} \times \mathbb{PORT} \times \mathbb{IP} \times \mathbb{PORT}$
% \end{align*}%
 is a Socket Pair.
\end{mydef}
% 
% \begin{mydef}
% \begin{align*}
% \mathbb{AT} \subseteq \mathbb{SP} \times \mathbb{PORT}
% \end{align*} is an allocation table of the NAT. Second component is external port allocated for particular socket pair.
% \end{mydef}
% Allocation table is a set of all socket pairs and corresponding allocated external port.

% \paragraph{NAT and communication.}
\subsection{NAT and communication}
NAT connects two networks, denoted as {\em internal} and {\em external}, connected to an internal and an external NAT interface, respectively.
Hosts have addresses $\text{IP\textsubscript{in}} \in \mathbb{IP}_{in} \subseteq \mathbb{IP}$ 
and $\text{IP\textsubscript{ex}} \in \mathbb{IP}_{ex} \subseteq \mathbb{IP}$ in the internal and the external network,
respectively, where $\mathbb{IP}_{in} \cap \mathbb{IP}_{ex} = \emptyset$.
Unless it is not stated otherwise the external NAT address\footnote{We assume NAT has only one external address for simplicity.} is denoted as 
$\text{IP\textsubscript{nat}} \in \mathbb{IP}_{ex}$.

Packet received on the external interface has the socket pair {(IP\textsubscript{ex},PORT\textsubscript{ex},IP\textsubscript{nat},PORT\textsubscript{nat})}.
Packet received on the internal interface has the socket pair {(IP\textsubscript{in},PORT\textsubscript{in},IP\textsubscript{ex},PORT\textsubscript{ex})}.

% \paragraph{Filtering rule.} Let assume packet with socket pair {(IP\textsubscript{out},PORT\textsubscript{out},IP\textsubscript{ex},PORT\textsubscript{ex})} 
% received on NAT external interface (we'll use this notation in further text). Packet is allowed to pass to internal network iff 
% \begin{align*}
% &  \exists \; \text{IP\textsubscript{in}} \in \mathbb{IP}, \; \text{PORT\textsubscript{in}} \in \mathbb{PORT}:\\
% & ((\text{IP\textsubscript{in}}, \text{PORT\textsubscript{in}}, \text{IP\textsubscript{out}}, \text{PORT\textsubscript{out}}), \text{PORT\textsubscript{ex}} ) \in \mathbb{AT}
% \end{align*}
% 
% If packet is allowed to pass, it is then forwarded to host {(IP\textsubscript{in}, PORT\textsubscript{in})}. A new record 
% to the allocation table is added when internal host sends packet to the external host. 
% 
% \paragraph{Allocation rule.} ToDo

\subsection{NAT classes}\label{sec:nat}
NAT is classified with respect to a \emph{filtering rule} and a \emph{allocation rule}, which state what happens after receiving
packet from an external and an internal network, respectively.

Common NAT categorization is according to~\citep{rfc3489}: a) Full Cone;
b) IP Restricted Cone; c) Port Restricted Cone; d) Symmetric.

% MOVED TO APPENDIX
% \paragraph{Full cone NAT.} ~\\
% Allocation table $\mathbb{AT} \subseteq \{ \mathbb{IP}_{in} \times \mathbb{PORT} \} \times \mathbb{PORT}$ holds associations between internal 
% hosts socket and mapped external NAT port. 
% % Rule is created once the connection from the internal host is initiated to an arbitrary external address.
% Filtering rule is:
% \begin{align*}
% & PASS((\text{IP\textsubscript{ex}}, \text{PORT\textsubscript{ex}}, \text{IP\textsubscript{nat}}, \text{PORT\textsubscript{nat}} )) \Leftrightarrow \\
% &  \exists \; \text{IP\textsubscript{in}} \in \mathbb{IP}_{in}, \; \text{PORT\textsubscript{in}} \in \mathbb{PORT}:\\
% & ((\text{IP\textsubscript{in}}, \text{PORT\textsubscript{in}}), \text{PORT\textsubscript{nat}} ) \in \mathbb{AT}
% \end{align*}
%  
% \paragraph{Address restricted cone NAT.} ~\\
% Allocation table $\mathbb{AT} \subseteq \{\mathbb{IP}_{in} \times \mathbb{PORT} \times \mathbb{IP}_{ex}\} \times \mathbb{PORT}$ 
% holds associations between internal hosts socket, external host IP address and mapped external NAT port. 
% % Rule is created once the connection from the internal host is initiated to an arbitrary external address.
% Filtering rule is:
% \begin{align*}
% & PASS((\text{IP\textsubscript{ex}}, \text{PORT\textsubscript{ex}}, \text{IP\textsubscript{nat}}, \text{PORT\textsubscript{nat}} )) \Leftrightarrow \\
% &  \exists \; \text{IP\textsubscript{in}} \in \mathbb{IP}, \; \text{PORT\textsubscript{in}} \in \mathbb{PORT}:\\
% & ((\text{IP\textsubscript{in}}, \text{PORT\textsubscript{in}}, \text{IP\textsubscript{ex}}), \text{PORT\textsubscript{nat}} ) \in \mathbb{AT}
% \end{align*}

% \paragraph{Port Restricted Cone NAT} ~\\
\par\smallskip
\noindent\textbf{Port Restricted Cone NAT.} 
Allocation table $\mathbb{AT} \subseteq \{\mathbb{IP}_{in} \times \mathbb{PORT} \times \mathbb{IP}_{ex} \times \mathbb{PORT}\} \times \mathbb{PORT}$ 
holds associations between the internal host socket, external host socket and mapped external NAT port. 
% Rule is created once the connection from the internal host is initiated to an arbitrary external address.
The filtering rule is:
\begin{align*}
& PASS((\text{IP\textsubscript{ex}}, \text{PORT\textsubscript{ex}}, \text{IP\textsubscript{nat}}, \text{PORT\textsubscript{nat}} )) \Leftrightarrow \\
&  \exists \; \text{IP\textsubscript{in}} \in \mathbb{IP}, \; \text{PORT\textsubscript{in}} \in \mathbb{PORT}:\\
& ((\text{IP\textsubscript{in}}, \text{PORT\textsubscript{in}}, \text{IP\textsubscript{ex}}, \text{PORT\textsubscript{ex}}), \text{PORT\textsubscript{nat}} ) \in \mathbb{AT}
\end{align*}

% \paragraph{Symmetric NAT} ~\\
\par\smallskip
\noindent\textbf{Symmetric NAT.} 
Allocation table $\mathbb{AT}$ and filtering rule are the same as in the Port Restricted Cone NAT. The difference lies in the way 
the new entries in the allocation table are created. For the Cone NAT holds that multiple entries in the allocation table 
can have the same external port assigned, while the Symmetric NAT creates new allocation entries under certain circumstances.

\emph{Allocation function (AF)} defines port allocation rules for new connections (not in $\mathbb{AT}$). Assume $\mathbb{AT}=\emptyset$, for simplicity. 
AF can be either \emph{predictable} or \emph{random}. In our setting, a predictable AF is assumed, e.g., \emph{incremental}. 
If the previously allocated port was $\text{PORT\textsubscript{nat}}$ then
the next port will be $\text{PORT\textsubscript{nat}}+\Delta$, where typically $\Delta=1$.
If the chosen port is already taken, AF iterates until a free one is found.
AF can have different \emph{sensitivity} determining the cases where new allocation is created or the previous one is used instead.

\emph{Address sensitive allocation function} creates a~new allocation if a destination address differs from the existing allocation 
from the same source:
\begin{align*}
& \exists s=(a_1,a_2,a_3,a_4) \in \mathbb{SP}, p \in \mathbb{PORT}: \\
& (s, p) \in \mathbb{AT} \Rightarrow ((\forall s^{\prime}=(a_1^{\prime},a_2^{\prime},a_3^{\prime},a_4^{\prime}) \in \mathbb{SP}, \\
& a_1 \neq a_1^{\prime}, a_2 \neq a_2^{\prime}, a_3 \neq a_3^{\prime}, \forall p^{\prime} \in \mathbb{PORT}: \\
& ((s^{\prime}, p^{\prime}) \in \mathbb{AT})) \Rightarrow p \neq p^{\prime})
\end{align*}

\emph{Port sensitive allocation function} creates a new allocation if a destination address and a port differs from the existing
allocations from the same source:
\begin{align*}
& \exists s \in \mathbb{SP}, p \in \mathbb{PORT}: (s, p) \in \mathbb{AT} \Rightarrow \\
& \Rightarrow ((\forall s^{\prime} \in \mathbb{SP}, s^{\prime} \neq s , \forall p^{\prime} \in \mathbb{PORT}: \\
& ((s^{\prime}, p^{\prime}) \in \mathbb{AT})) \Rightarrow p \neq p^{\prime})
\end{align*}
% 
%     PASS((\text{IP\textsubscript{out}}, \text{PORT\textsubscript{out}}, \text{IP\textsubscript{ex}}, \text{PORT\textsubscript{ex}} )) \Leftrightarrow \\
% &  \exists \; \text{IP\textsubscript{in}} \in \mathbb{IP}, \; \text{PORT\textsubscript{in}} \in \mathbb{PORT}:\\
% & ((\text{IP\textsubscript{in}}, \text{PORT\textsubscript{in}}, \text{IP\textsubscript{out}}, \text{PORT\textsubscript{out}}), \text{PORT\textsubscript{ex}} ) \in \mathbb{AT}
% \end{align*}
% \section{Related work II}
% \begin{compactitem}
% \item chineese\citep{Wang:2006:RSN:1156422.1156550}
% \item uPNP
% \item PCP~\citep{rfc6887}
% \end{compactitem}

\section{UDP hole punching with STUN}
All algorithms in our paper make use of the same principle, \emph{UDP hole punching}. 
A public internet host cannot send packets to the private host behind NAT/firewall directly. 
Communication has to be initiated from the inside\footnote{Not taking
pre-set port forwarding, DMZ, uPnP, etc. into consideration. }, what is problematic if both
communicating sides are private.

Assume two parties, A and B with addresses IP\textsuperscript{A} and IP\textsuperscript{B}, respectively.
A and B are behind a public NAT with addresses $\text{IP}^{\text{A}}_{\text{nat}}$ and $\text{IP}^{\text{B}}_{\text{nat}}$,
respectively. By default an internal host does not know its external address or NAT type, thus a technique like 
STUN\footnote{Core idea: publicly available service running on 2 IP addresses and 2 ports 
answering questions like: what is my IP address and port? Change IP address of response if Il get it, etc.} 
is used to learn it. In case of incremental NAT it is also necessary to determine $\Delta$ and the next port
that is likely to be allocated for a new connection $\text{PORT}^{\text{A}}_{\text{nat}}$ for A and $\text{PORT}^{\text{B}}_{\text{nat}}$
for B. The period between determining this information, sending it to the other party and starting the algorithm is
denoted as an \emph{initialization phase} or a \emph{silent period}\ignore{\footnote{Since the algorithm itself does nothing.}} of the NAT traversal algorithm.

UDP hole punching works as follows:\\
\begin{compactitem}
 \item [1.] $A: \; IP^A_{in}:PORT^A_{in} \longrightarrow IP^B_{nat}:PORT^B_{nat}$ \\
Locally a~new mapping is created ($PORT^A_{nat}$), packet is dropped on B's~NAT.
 \item [2.] $B: \; IP^A_{nat}:PORT^A_{nat} \longleftarrow  IP^B_{in}:PORT^B_{in}$ \\
Locally a~new mapping is created ($PORT^B_{nat}$), packet reaches A using allocation created in step~1.
 \item [3.] $A: \; IP^A_{in}:PORT^A_{in} \longrightarrow IP^B_{nat}:PORT^B_{nat}$ \\
Packet reaches B using allocation created in in step~2.
\end{compactitem}

% By default a packet 
% coming from the internet (\textit{IP\textsubscript{A},Port\textsubscript{A}}) to the internal 
% network is dropped on the NAT unless there is a valid mapping in NAT allocation table.

\section{Theoretical network model}
In order to evaluate algorithms in multiple environments with different parameters (workload, number of clients, etc.)
we decided to model network behavior with a stochastic process. In particular, we chose a \emph{Poisson process}, $\{N(t), t\geq0\}$ 
that is used in queueing theory to model arrivals of requests to the server queue~\citep{Nelson:1995:PSP:207382}. 

The core idea of NAT traversal is to predict the next allocated external port by NAT, thus from this perspective 
it is important to model how the NAT internal state changes over time, i.e., how many external ports 
were allocated in a given time interval. Thus the workload is an important parameter for predicting the allocated port.

We model the internal network connected to NAT with respect to newly created connections (a new port allocation) as a 
\emph{time homogeneous Poisson process} $Po(\lambda)$. Value of $\lambda$ models
different workloads of the network. This way we can evaluate algorithms with respect to various parameters, e.g., 
probability of establishing a connection. 

% Random variable $X_t \sim Po(\lambda t)$ says how many
% port allocations were made in time interval $[0,t]$

\begin{mydef}
$N(t)$ is a random variable of a number of new port allocations made in time 
interval~$[0,t]$ where $N(t) \sim Po(\lambda t)$.
\begin{center}                                                     
Then $P[N(t)=n] = \frac{(\lambda t)^n}{n!} e^{-\lambda t}$
\end{center}
\end{mydef}

Moreover, we assume an incremental NAT with $\Delta=1$ (without loss of generality) on both sides. If we sample the NAT 
state\footnote{STUN requests to determine current port number, it creates a new NAT allocation record.}
each $T=10$~milliseconds the NAT state can be modeled as a random process $\{C_i | C_i \in \mathbb{N}, C_i \geq i, i\geq0\}$ where:
\[
C_i = \begin{dcases*}
         0 & if $i=0$ \\
         C_{i-1} + 1 + X_i, \; X_i \sim Po(\lambda T) & otherwise 
        \end{dcases*}
\]
Rewritten as $C_i = i + \sum_{j=1}^{i}X_j$. Expected value $E[C_i] = i + E[\sum_{j=1}^{i}X_i] = i (1+\lambda T)$.

% \paragraph{Port pool exhaustion.} 
\par\smallskip
\noindent\textbf{Port pool exhaustion.} NAT timeout\footnote{Allocation is deleted if no packet is detected within this time interval.} is assumed to be 3~minutes, 
what is a typical setting in practice. This constrains $\lambda$ to the interval $[0, 0.36]$, intuitively from $0$ to $360$ new connections in $1$~sec on average. 
If it is higher then NAT is rendered as unusable since the port pool is exhausted\footnote{$P[X > 65535] = 0.0019, X \sim Po(180000 \cdot 0.36)$.} quickly.

\section{Generic algorithm structure}
Algorithms that we propose have a common structure. In order to maximize the probability of establishing
a connection (i.e., success rate) it proceeds in discrete \emph{steps}. In each step it 
tries to open a new hole in the local NAT and to predict the corresponding port allocated in the destination
NAT.

\begin{mydef}
The symmetric NAT traversal algorithm is a mapping 
$\mathcal{A}: \mathcal{U} \times \mathcal{S} \times \mathcal{M} \rightarrow \mathcal{P}_{src} \times \mathcal{P}_{dst}$, 
where 
\begin{compactitem}
\item $\mathcal{U}=\{0,1\}$ is the set of communicating parties.
\item $\mathcal{S} = \{0, 1, \dots, n-1\}, \; n \in \mathbb{N}$ is a step of the algorithm.
\item $\mathcal{M}$ is the set of vectors of parameters of the environment (i.e., network model 
parameters and additional information obtained in initial phase). 
In our setting $\mathcal{M} = \{ (\lambda, t_s) $ $| \lambda~>~0, t\geq~0 \}$. Value $t_s$ denotes the length of the silent period. %For simplicity $\mathcal{M} = \{ \lambda | \lambda > 0 \}$.
\item $\mathcal{P}_{src} \subseteq \mathbb{N}$ and $\mathcal{P}_{dst} \subseteq \mathbb{N}$
are sets of source and destination ports respectively.
\end{compactitem}
\end{mydef}
If $\mathcal{M}$ is not important for the algorithm it can be ommitted in the notation.
%If the third parameter is not important for the algorithm it can be ignored in the notation. 

Prior algorithm run it is needed to 
determine the NAT state (for UDP hole punching), this is assumed to be done by a technique like STUN. For simplicity, we assume that port numbers
start from $0$. The length of the silent period is also known to the algorithm prior execution in order to estimate the current state of 
the NAT, e.g., difference between the last measurement and just before actual algorithm execution.

% \paragraph{Notation.}
\par\smallskip
\noindent\textbf{Notation.} 
For instance, line $A(0,1) = (2,3)$ means that party $0$ in step $1$ sent packet from source port $2$ to destination port $3$
of the party~$1$.

% \paragraph{Time.} 
\par\smallskip
\noindent\textbf{Time.} 
%The algorithm proceeds in discrete time steps. 
Algorithm starts in time $t_0 = 0$~ms and each step $i \in \mathcal{S}$
happens in time $t_i = iT$ for some $T$. If not mentioned otherwise, $T = 10$~ms.

% \paragraph{Algorithm log \& successful run.} 
\par\smallskip
\noindent\textbf{Algorithm log \& successful run.} 
IP addresses of the hosts and NAT are known and fixed prior to the algorithm run, and are thus omitted from the notation for simplicity. For the following text we assume the maximum number of steps of the algorithm $s_{max}=1000$.

\begin{mydef}
\label{def:log}
Algorithm log $\mathcal{L}^A_{\mathcal{A}}$ is a set of tuples $(p_1, p_2, p_3) \in \mathcal{L}^A_{\mathcal{A}}$ produced
by the NAT traversal algorithm $\mathcal{A}$ execution on the host $A$, where: \\
%$\mathcal{A}(0, s, (\lambda, t_s)) = (p_1, p_2)$ 
%are generated by $\mathcal{A}$ for some parameters $\lambda, t_s$ in some step $s$. The $p_3$ is 
%an allocated port by $NAT^A$ for request $(p_1, p_2)$ in a given step. Algorithm log for host $B$ is defined analogically. 
%Furthermore from our setting the 
%following formula holds:\footnote{From definition of a port sensitive symmetric NAT.} %$\mathcal{L^{A}} \subseteq PORT_{in}^{A} \times PORT_{nat}^{B} \times PORT_{nat}^{A}$, 

\begin{compactitem}
 \item $(p_1, p_2) = \mathcal{A}(0, s, (\lambda, t_s)) $ are generated by the algorithm $\mathcal{A}$ for  
       parameters $(\lambda, t_s)$ in some step $s$ of the algorithm $\mathcal{A}$;
 \item $p_3$ is an allocated port by $NAT^A$ for request $(p_1, p_2)$ in a given step $s$. In our network model
       $p_3=C_s=C_{s-1} + 1 + X, X~\sim~Po(\lambda T)$;
 \item algorithm log for host $B$ is defined analogically;
 \item furthermore from our setting the following formula holds:\footnote{From definition of a port sensitive symmetric NAT.}
\end{compactitem} %
%
% \begin{compactitem}
% \item $p_1$ is a source port.
% \item $p_2$ is a destination port.
% \item $p_3$ is an allocated port by local NAT for this request.
% \item following formula holds\footnote{from definition of a port sensitive symmetric NAT}
% \end{compactitem}
%
\begin{align*}
& (p_1, p_2, p_3) \in \mathcal{L}^A_{\mathcal{A}}: \forall (p_1^{\prime}, p_2^{\prime}, p_3^{\prime}) \in \mathcal{L}^{A}_{\mathcal{A}} :\\
& p_3 = p_3^{\prime} \Leftrightarrow \left( p_1=p^{\prime}_1 \wedge p_2=p^{\prime}_2 \right)
\end{align*}
\end{mydef}

\begin{mydef}
\label{def:match}
Algorithm $\mathcal{A}$ run is successful, i.e., established a direct connection between $A,B$ if~\footnote{Derived from UDP hole punching.}:
\begin{align*}
& MATCH(\mathcal{L}^{A}_{\mathcal{A}}, \mathcal{L}^{B}_{\mathcal{A}}) \Leftrightarrow \exists(p_1, p_2, p_3) \in \mathcal{L}^{A}_{\mathcal{A}}: \\
& \exists(p_1^{\prime}, p_2^{\prime}, p_3^{\prime}) \in \mathcal{L}^{B}_{\mathcal{A}}: p_2 = p_3^{\prime} \wedge p_2^{\prime} = p_3.
%
%  \in \mathcal{L}^A_{\mathcal{A}}: \forall (p_1^{\prime}, p_2^{\prime}, p_3^{\prime}) \in \mathcal{L}^{A}_{\mathcal{A}} :\\
%& p_3 = p_3^{\prime} \left( \Leftrightarrow p_1=p^{\prime}_1 \wedge p_2=p^{\prime}_2 \right)
\end{align*}
\end{mydef}
The task is to find an algorithm $\mathcal{A}$ such that the probability that $MATCH(\mathcal{L}^{A}_{\mathcal{A}}, \mathcal{L}^{B}_{\mathcal{A}})$
holds is maximized for minimal number of steps needed.

\subsection{Algorithm I -- baby-step, giant-step}
We propose the algorithm with high success rate for small network loads, i.e.,
$\lambda \leq 0.03, T=10$~ms. When the error becomes larger, the
algorithm step is not fast enough to compensate errors and thus fails for a larger $\lambda$.

\begin{mydef}
Baby-step, giant-step
\[
u \in \mathcal{U}, s \in \mathcal{S}: 
\mathcal{A}(u, s) = \begin{dcases*}
         (0, s)  & \text{if} $u=0$\\
         (0, 2s) & \text{if} $u=1$
        \end{dcases*}
\]
\end{mydef}

Intuitively, the source port is kept constant, one side of the protocol scans ports sequentially, another
side of the protocol scans ports with step 2.

\subsection{Algorithm II -- fixed destination port}
Algorithm with fixed destination port described in by \citep{Wang:2006:RSN:1156422.1156550}
changes the source port in order to create new holes in NAT in each step.

\begin{mydef}
Fixed destination port algorithm
\[
u \in \mathcal{U}, s \in \mathcal{S}: \;
\mathcal{A}(u, s) = (s, \Delta_u)
\] for some constants $\Delta_0, \Delta_1$ based on the current network workload determined in initial phase of the protocol.
\end{mydef}

\subsection{Algorithm III -- expected port value}
Some more effective algorithms can be designed if the random process the network follows is known (and their
parameters).

\begin{mydef}
Expected port value algorithm
\begin{align*}
& u \in \mathcal{U}, s \in \mathcal{S}, (\lambda, t_s) \in \mathcal{M}:\\
& \mathcal{A}(u, s, (\lambda, t_s)) = (0, \lambda t_s + s(1 + \lambda T))
\end{align*}
\end{mydef}

In step $i$ the algorithm tries the expected value for $C_i$. Since the $C_i$ has unimodal distribution,
the expected value is also the most probable value and the algorithm uses maximum likelihood approach.
% \newpage
\subsection{Algorithm IV -- Poisson sampling}
Algorithm with the best success rate in our simulation is based on sampling Poisson distribution.
The main idea is to let the algorithm simulate the Poisson process that the network of the
other party follows. Duplicates are removed since it does not lead to a new port allocation.

\begin{mydef}
Poisson sampling algorithm 

% \captionof{algorithm}{Euclid’s algorithm}\label{algo1}

\begin{algorithmic}[1]
  \algtext*{EndWhile}% Remove "end while" text
  \algtext*{EndIf}% Remove "end if" text
  \algtext*{EndFunction}% Remove "end if" text
  \newcommand{\LineIf}[2]{ \State \algorithmicif\ {#1}\ \algorithmicthen\ {#2}}% \algorithmicend\ \algorithmicif }
  \Function{init4}{$\lambda, T$} %\Comment{The g.c.d. of a and b}
    \State $B\gets [], s \gets 0$
    %\State $s\gets 0$
    \While{$len(B)< s_{max}$}
      \State $x\gets \text{P}(\lambda T (1+s C(\lambda T))$
      \State $s\gets s+1$
      \LineIf{$x \notin B$}{$B.append(x)$} %\If {$x \notin B$} \State $B.append(x)$ \EndIf
    \EndWhile
    \State \textbf{return} $B$
    %\Return $B$
   \EndFunction\\
    
   \State $M\gets init4(\lambda,T)$\\
   \State $u \in \mathcal{U}, s \in \mathcal{S}, (\lambda, t_s) \in \mathcal{M}:$
%     \State $\mathcal{A}(u, s, (\lambda, t_s)) = (0, \lambda t_s + M[s])$
   \Function{A}{$u, s, (\lambda, t_s)$} %\Comment{The g.c.d. of a and b}
    %\LineIf{$s = 0$}{$M = init4(\lambda,T)$} %\If {$x \notin B$} \State $B.append(x)$ \EndIf
    \State \textbf{return} $(0, \lambda t_s + M[s])$
   \EndFunction
\end{algorithmic}
%\begin{align*}
%& u \in \mathcal{U}, s \in \mathcal{S}, (\lambda, t_s) \in \mathcal{M}:\\
%& \mathcal{A}(u, s, (\lambda, t_s)) = (0, \lambda t_s + \text{P}(\lambda T (1+s C(\lambda T)))
%\end{align*} 
where $\text{P}(\mu) = X \sim Po(\mu)$, samples Poisson distribution with
parameter $\mu$, coefficient function $C(\lambda T)$ gives a constant 
coefficient\footnote{This function was found empirically by maximizing the success rate in simulations.}.
Approximations of the $C(\lambda T)$ by an inverse logarithm and a quartic model:
\begin{align*}
C(x) = & ({0.163321 \cdot  ln(64.2568 \cdot x)})^{-1}\\
C(x) = & (0.172876+1.28162\cdot x-1.41256\cdot x^2 \\
       & +0.825093\cdot x^3-0.184726\cdot x^4)^{-1}
\end{align*}
\end{mydef}

% 

\subsection{Silent period in algorithms} If the length of the silent period is too big then
the algorithm can face problems with a gap that occurred in NAT ports. Thus 
Algorithms III and IV estimate the starting offset caused by the silent period as an
expected value $E[Po(\lambda \cdot t_1)] = \lambda\cdot t_1$.

% \vspace*{-8mm}
%\parskip=0pt
% \titlespacing{\subsection}{0pt}{*0}{*0}
\section{Evaluation}

\subsection{Comparison of Algorithms I and II} 
Algorithm~I has the following benefits over~II:
\begin{compactitem}
 \item Works with high success rate for $\lambda \in [0, 0.035] \sim 35$ new connections in 1~second on average.
 \item Can be stopped as soon as the connection is established, overhead is proportional to~$\lambda$.
 \item Single source port is used, only one listening thread is needed to implement
it\footnote{NAT is abused to multiplex multiple connections to one source port.} compared to Alg. II.
Due to this fact a practical implementation is simple and lightweight with respect to required system resources.
 \item Is $\lambda$-invariant. Provided network load is low and no additional measurements are needed.
\end{compactitem}

% \paragraph{Simulation.}
\subsection{Simulation}


\begin{figure}%[H]
{\scalebox{0.48}{\includegraphics[trim=40 20 0 0]{alg_1.pdf}}}
\caption{Algorithm evaluation in a simulation, $T=10~ms$. Number of a simulation rounds for one parameter setting is 1000.
 A, B, C correspond to the Algorithm I, II and III respectively.
 D, E correspond to the Algorithm IV with optimized $C(\lambda T)$ by an exhaustive search in D case, 
 in E case $C(\lambda T)$ is approximated by a polynomial.}
\label{fig:alg_eval}
\end{figure}
    
\begin{figure}%[H]
{\scalebox{0.48}{\includegraphics[trim=40 20 0 0]{alg_steps.pdf}}}
\caption{Steps of the algorithms needed to success on average with respect to $\lambda$. Algorithms denoted in a same way as in figure \ref{fig:alg_eval}.}
\label{fig:alg_steps}
\end{figure}


% % \begin{wrapfigure}{r}{0.5\textwidth}
% %  \begin{figure}
% \begin{wrapfigure}{r}{0.5\textwidth}%
% % \wrapfigure{r}{.5\linewidth}{
%      \centering{
%     \begin{subfigure}{0.5\textwidth} 
% 	{\scalebox{0.43}{\includegraphics[trim=40 0 0 0]{alg_1.pdf}}}
% 	\caption{Algorithm evaluation in a simulation, $T=10~ms$. Number of a simulation rounds for one parameter setting is 1000.
% 	A, B, C correspond to the Algorithm I, II and III respectively.
% 	D, E correspond to the Algorithm IV with optimized $C(\lambda T)$ by an exhaustive search in case D, 
% 	in case E is then$C(\lambda T)$ is approximated by a polynomial.}
% 	\label{fig:alg_eval}
%      \end{subfigure}
% %     \end{figure}
% % \end{wrapfigure}
% % \begin{wrapfigure}{r}{0.5\textwidth}    
%      \begin{subfigure}{0.5\textwidth} %\begin{figure}[H]
% 	{\scalebox{0.43}{\includegraphics[trim=40 0 0 0]{alg_steps.pdf}}}
% 	\caption{Steps of the algorithms needed to success on average with respect to $\lambda$. 
% 	Algorithms denoted in a same way as in figure \ref{fig:alg_eval}.}
% 	\label{fig:alg_steps}
%      \end{subfigure}}%\end{figure}
%     \vspace{-30pt}
% %  \end{figure}
% \end{wrapfigure}


Figure \ref{fig:alg_eval} shows algorithms success rate with respect to~$\lambda$. 
Figure \ref{fig:alg_steps} depicts number of steps needed 
to succeed on average. From figures it is visible that Alg. I is more suitable for networks with low 
workload since has $100\%$ success rate for $\lambda \in [0, 0.035]$. Outside this interval, 
Alg. I is not working and another solution has to be applied. The problem comes as new connections are
created faster than algorithm progresses.

Alg. II is set to have fixed $\Delta=900$, in order to maximize the success rate. The disadvantage of this
algorithm is that it waits on a particular destination port and if it is taken by another connection
the algorithm fails. 

Alg. IV has a high success rate for 
noisy networks, but is very sensitive for the $\lambda$ estimation -- this decreases practical usability 
if $\lambda$ changes quickly over time (non-homogeneous Poisson process). Closed form formula for $C(\lambda T)$ 
optimizing success rate and analytic derivation of the algorithm is still an open question. 

% \begin{itemize}
%  \item ToDo: fitting model to the real data from MU network
%  \item ToDo: evaluating algorithms on real data from MU network
% \end{itemize}

\subsection{Real data modeling}
Data collected in our university network by NetFlow probes were studied to test the correspondence to the Poisson process
with respect to newly created connections/allocations on NAT. NAT was simulated in our program using
NetFlow data to provide network traffic to the NAT. NetFlow data capture 1 hour of a network communication in two different
day hours (03 -- low peak, 13 -- high peak).

NAT state, e.i., newly opened connections in window $T=10$, $50$, $100$~ms,
was sampled multiple times in a row, Poisson and Negative Binomial distributions were fitted using Maximum Likelihood Estimation (MLE)
and goodness-of-fit test with Pearson $\chi^2$ test was performed with confidence level $\alpha=0.05$.

\begin{table*}[t]
  \centering
  % set, T, sample size, PoFit, NB Fit, sample mean, variance of sample mean, sample variance
  \begin{tabular}{| l | r | r | r | r | r | r | r |} % 
    \hline
    Set & T [ms] & Samples & Poisson rej. [\%] & NBinom rej. [\%] & $\overline{m}_{E[X]}$ &  $V[\overline{m}_{E[X]}]$  & $\overline{m}_{V[X]}$ \\ \hline \hline 
    \multirow{7}{*}{03} & \multirow{3}{*}{10}   & 100		& 2.977		& 1.667		& 0.5121        & 0.0228                    & 0.5321 \\ \cline{3-8}
      &      & 500      	& 5.637		& 0.920		& 0.4851        & 0.0086        & 0.5146 \\  \cline{3-8}
      &      & 1000     	& 10.294	& 1.579		& 0.4831        & 0.0068        & 0.5142 \\  \cline{2-8}
      & \multirow{3}{*}{50}     & 100     	& 6.543		& 2.276 	& 2.3707	& 0.2373 & 2.9969 \\ \cline{3-8}
      &      & 500     		& 25.180	& 7.194		& 2.3871        & 0.1053        & 3.1179 \\ \cline{3-8}
      &      & 1000    		& 52.174	& 23.188	& 2.3920        & 0.0696        & 3.1576 \\ \cline{2-8}
      & \multirow{2}{*}{100}  	& 100		& 8.914		& 3.911		& 4.7263        & 0.7031                    & 7.1496 \\ \cline{3-8}
      &      & 500     		& 57.746	& 9.859		& 4.7668	& 0.2876 	& 7.4667 					\\ \hline
    \multirow{2}{*}{13}  & \multirow{2}{*}{10}     & 100     & 5.191      & 1.824     & 2.6438       &  0.2695    & 3.0020 \\ \cline{3-8}
      &      & 500     & 20.368     & 2.829     & 2.6579       &  0.1025                    & 3.1495 \\ \hline
  \end{tabular}
  \caption{Results from NAT simulation on the real data captured from the university network. 
    The meaning of columns from the left is following: data set name, sampling time, number of samples collected,
    \% of cases rejecting hypothesis about Poisson distribution,  
    \% of cases rejecting hypothesis about Negative Binomial distribution,
    sample mean of sample mean of port number samples (pns),
    sample variance of sample mean of pns,
    sample mean of variance of pns.}
  \label{tab:1}
\end{table*}

Regarding data from set 13, which represents the high peak activity during the day, we conclude that if a number of samples 
is lower, e.g., 100, the process of creating a new connection on NAT can 
be modeled as a time homogeneous Poisson process successfully (hypothesis rejected in $5.191\%$ of cases).
The time window in which the network was monitored in this case is $T \cdot 100 = 1000~ms$. 

When the number of samples is higher, e.g., 500, the time window of the studied network traffic is longer and
in a real world scenario the probability that $\lambda$ is constant in time is lower with the increasing length of the time
window. Due to this the homogeneous model is not strong enough, data is 
overdispersed and this causes failure to fit Poisson distribution (hypothesis rejected in $20.368\%$ of cases). As suggested
in literature, Negative Binomial model fits better for overdispersed\footnote{The data overdispersion may be caused by dependent events on the network, e.g., by time-triggered updates downloading.}
data (hypothesis rejected in $2.829\%$ of cases). 
%The open question is how to make use of this model in the NAT traversal.

For 100 samples the match with the model is satisfactory. Provided the algorithm is able to run completely (including learning $\lambda$), the proposed algorithms should work like in a simulation.

Regarding data from set 03, which represents the low peak activity we can conclude that a lower network workload causes more stable mean of sampled ports (MLE for Poisson distribution), thus the Poisson distribution hypothesis is not rejected even in longer time intervals for an estimated $\lambda$. 


\section{Discussion and implementation}
For sake of the paper, we assumed that $\lambda$ is the same on both sides. However, new challenges 
are likely to arise if we allow for different parameters on both sides, and we have not studied this yet.

Another open question is to find a model that optimizes given task, i.e., establishing a direct
connection. We proposed a start step, to model the same process that occurs on the network, i.e.,
Poisson process, but this method requires setting a coefficient function $C(\lambda T)$ for which 
we do not have exact analytic explanation. This method is also very sensitive to parameter fluctuations
and thus not very practical for real world networks. 

In our setting, we assume $T=10~ms$ in order to minimize the speed of the evolution of a Poisson process.
Such a short sending interval may cause some packets to get dropped on the path to the destination. Yet the 
important issue for us is whether they reach the local NAT in order to create a mapping in the allocation table. As proposed 
in \citep{Wang:2006:RSN:1156422.1156550}, it is possible to run the algorithm with $T=10~ms$ to create a mapping
and then re-run the algorithm with higher $T$, e.g., $T=100~ms$ in order to deliver packets to the destination
using created mapping and holes created in the previous round.

Success rate of the algorithms strongly relies on accurate measurements of the network 
properties the client resides in. These properties can be measured prior to the connection establishment 
from the point the client connects to the network, as is usually done in ICE, and to gather 
more information, then to build better network model and to carry on in with a better performance afterwards.

% %We ignored the cases where there are already some allocations on NAT from older connections
% %on the port interval the NAT traversal algorithm is using.
% 
% Another problematic issue comes when this algorithm is followed by multiple host pairs on
% the same network in the same time. They would artificially increase the network workload
% causing it would not follow Poisson distribution rendering our algorithms not usable in
% this case.

We developed an application written in Python that simulates NAT and mentioned algorithms, computes
statistics for port distributions on NAT and simulates algorithms for different parameters. It
may be helpful for another researchers interested in this area so we published it in public domain
hosted on GitHub: \url{https://github.com/ph4r05/NATSimTools}.

To validate our results, we implemented Android application using our baby-step giant-step algorithm
to traverse symmetric carrier-grade NAT that our mobile phone operator was using. The implementation
works according to expectations.

\section{Conclusion}
The aim of our work was to study the problem of NAT traversal from a statistical point of view,
and to provide algorithms with a high success rate for networks with a low workload that would be easy to 
implement and would require little system resources.

We propose several algorithms solving this problem, yet with some open questions remaining. 
Our goal was to find the algorithms that maximize success rate using only one source port strategy, possibly using this
in further research, perhaps in combination with previous multiple source port strategies.

We modeled the network workload as a Poisson process and we verified our approach on real data by hypothesis testing.
Our testing application was published in public domain. We also validated our results by traversing symmetric 
carrier-grade NAT of our mobile phone provider.

We believe that by removing unnecessary intermediate nodes from
the communication path we restrict the attack and failure surfaces and increase the communication reliability.
Communication paths get less complex, transparency increases and this all support a more natural way of the 
interconnecting nodes on P2P basis.

\par\smallskip
\noindent\textbf{Acknowledgement.} 
This work was supported by the Czech Science Foundation, project GAP202/11/0422. 
Thanks to Computer Security Incident Response Team of Masaryk University (CSIRT-MU) for providing anonymized NetFlow datasets.
\vspace{1cm}


% \renewcommand{\bibfont}{\small}
% \bibliographystyle{IEEEtran}
\bibliographystyle{abbrv}
\bibliography{paper}

% \end{multicols}
\end{document}