Mullvad VPN app architecture

This document describes the code architecture and how everything fits together.

For security and anonymity properties, please see security.

Some components have specific documentation that go into greater detail:

winfw

Mullvad vs talpid

Explain the differences between these layers and why the distinction exists. My thought was that after this section every aspect of the app is explained under either the Mullvad or the Talpid header. So it's clear which part they belong to. I yet don't know if this makes sense though.

Mullvad part of daemon

Frontend <-> system service communication

Talking to api.mullvad.net

Selecting relay and bridge servers

See this document.

Problem reports

Talpid part of daemon

Tunnel state machine

The tunnel state machine is the part of Talpid that coordinates the events for establishing a VPN connection. It acts upon requests for establishing a secure VPN connection or for disconnecting an already established connection and returning the system to its initial state. This involves also using other parts of Talpid to configure the system so that the security policies are applied and so that the connection works correctly without any further manual configuration necessary.

The tunnel state machine starts in an initial Disconnected state. In this state, no changes are made to the operating system and no security policies are applied. When a request is sent to the state machine to establish a connection, the state machine will progress first into a Connecting state that will configure the operating system and setup a tunnel with a connection to a VPN server. Once the configuration is complete and the connection is verified to be working, the state machine then proceeds to a Connected state.

A request can be made to close the VPN connection. Such request will lead the state machine into a Disconnecting state, which will close the connection to the VPN server and restore the operating system to its original configuration. After the process is complete, the state machine returns to the Disconnected state.

If an error occurs in the Connecting or Connected states, the state machine may proceed to an Error state. It might reach this state either immediately (when an error occurs in the Connecting state) or after passing through another state to tear down the tunnel (when an error occurs in the Connected state). Either way, in this state the operating system is configured to block all connections to avoid leaking any data. The objective is to ensure no data leaks from the tunnel while the user has requested a secure connection, as defined in the security document.

A high-level overview of the tunnel state machine can be seen in the diagram below:

                +--------------+   Request to connect    +------------+
  Start ------->| Disconnected +------------------------>| Connecting |
                +--------------+                         +----+--+--+-+
                    ^                                      ^  |  ^  |
                    |           Will attempt to reconnect  |  |  |  |
                    |   .----------------------------------'  |  |  |
                    |   |                                     |  |  |
                    |   |                   .-----------------'  |  |
                    |   |                   | Unrecoverable      |  |
                    |   |                   |     error          |  |
                    |   |    Request to     V                    |  |
 System is restored |   |    disconnect +-------+                |  | Connection is configured
   to its initial   |   |   .-----------+ Error +----------------'  |       and working
    configuration   |   |   |           +-------+  Request to       |
                    |   |   |               ^       connect         |
                    |   |   |               |                       |
                    |   |   |  .------------'                       |
                    |   |   |  | Unrecoverable                      |
                    |   |   |  |  error while                       |
                    |   |   |  |  in connected                      |
                    |   |   V  |     state                          V
                 +--+---+------+-+                         +-----------+
                 | Disconnecting |<------------------------+ Connected |
                 +---------------+  Request to disconnect  +-----------+
                                      or unrecoverable
                                           error

State machine inputs

There are two types of inputs that the tunnel state machine react to. The first one is commands sent to the state machine, and the second is external events that the state machine listens to.

Tunnel commands

Besides the two main commands Connect and Disconnect, there are a few other commands that can be sent to the tunnel state machine. The following list includes all the commands the tunnel state machine can receive.

Connect: establish a secure VPN connection
Disconnect: tear down the active VPN connection and return the operating system to its initial configuration
Allow LAN: enable or disable local network sharing, changing the security policies for some of the states
Block when disconnected: configures whether the state machine should apply the security policy for blocking all connections when it's in the Disconnected state, effectively requesting the system to never allow connections outside the tunnel

External events

Depending on the state of the machine, it will also listen for specific external events and act on them possibly by changing states. All of these events can be considered as tunnel events, but they happen on different scenarios and because of different causes.

Tunnel is Up: the tunnel monitor notifies that the tunnel is working correctly
Tunnel is Down: the tunnel monitor notifies that the tunnel has disconnected
Tunnel monitor stopped: communication to the tunnel monitor was lost
Is offline: notify the tunnel state machine if the operating system is connected or not to the network, so that it can safely wait for connectivity to be restored without endlessly retrying to establish the VPN connection. Some care needs to be taken to not get stuck in the offline state for too long on macOS.

State machine outputs

Every time the state machine changes state, it will output a TunnelStateTransition. This is an enum type representing which state the tunnel state machine has entered and any associated metadata that might be useful.

Disconnected
Connecting: includes the information of the endpoint it is trying to connect to
Connected: includes the information of the endpoint it is connected to
Disconnecting: includes the state it will transition to once successfully disconnected, which is represented as the action it will take after disconnected, listed below:
- Nothing: proceed to the Disconnected state
- Block: proceed to the Error state
- Reconnect: proceed to the Connecting state
Error: includes the cause of the error and the information if the operating system was successfully configured to block all connections

System DNS management

Firewall integration

Detecting device offline

The tunnel state machine has an offline monitor that tries to detect when a device will certainly not be able to connect to a tunnel or reach the API. In doing so, the offline monitor cannot send any traffic. In general, this involves either relying on platform APIs specifically designed for this purpose or querying the system's networking config to enumerate network interface state or the routing table.

Windows

On Windows, connectivity is inferred if there exists an enabled interface with either an IPv4 or an IPv6 address and the machine is not suspended. The suspend/wakeup events matter because previously the TAP driver and currently Wintun might not work correctly early after boot, as such, the offline mode is used to enforce a grace period.

The conditions are affirmed by doing the following:

Listening for changes in the network adapter state via NotifyIpInterfaceChange - receiving callbacks whenever a network interface is changed or added via winnet.
Checking if the machine is suspended by listening for power state broadcasts by creating a window and listening for power state messages.

Linux

On Linux, connectivity is inferred by checking if there exists a route to a public IP address. Currently the Mullvad API IP is used, but the actual IP does not matter as long as it's not a local one. This is done via Netlink and the route is queried via the exclusion firewall mark - otherwise, when a tunnel is connected, the address would always be routable as it'd be routed through the tunnel interface. As such, the offline monitor is somewhat coupled to routing and split tunelling on Linux.

macOS

On macOS, the offline monitor uses SCNetworkReachability callbacks to detect changes in connectivity and then enumerates network interfaces via SCDynamicStore and assumes connectivity if an active physical interface exists. The interfaces are enumerated because sometimes SCNetworkReachability can trigger a callback signalling full connectivity even when there exists no default route.

Issues

After coming back from sleep, the network reachability callback won't be invoked until macOS has done some verification tasks, some of which may depend on DNS. Since our firewall will block DNS, the tasks will be delayed, and so will the callback. Circumventing the call back is of no use - until the timeouts are hit, macOS won't publish a default route to the routing table, which is needed for routing tunnel traffic.

Android

To detect connectivity on Android, the app relies on ConnectivityManager by listening for changes to the availability of non-VPN networks that provide internet connectivity. Connectivity is inferred if such a network exists.

iOS

The iOS app uses WireGuard kit's offline detection, which in turn uses NWPathMonitor to listen for changes to the route table and assumes connectivity if a default route exists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

architecture.md

architecture.md

Mullvad VPN app architecture

Mullvad vs talpid

Mullvad part of daemon

Frontend <-> system service communication

Talking to api.mullvad.net

Selecting relay and bridge servers

Problem reports

Talpid part of daemon

Tunnel state machine

State machine inputs

Tunnel commands

External events

State machine outputs

System DNS management

Firewall integration

Detecting device offline

Windows

Linux

macOS

Issues

Android

iOS

OpenVPN plugin and communication back to system service

Split tunneling

Frontends

Desktop Electron app

Android

iOS

CLI

Files

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Mullvad VPN app architecture

Mullvad vs talpid

Mullvad part of daemon

Frontend <-> system service communication

Talking to api.mullvad.net

Selecting relay and bridge servers

Problem reports

Talpid part of daemon

Tunnel state machine

State machine inputs

Tunnel commands

External events

State machine outputs

System DNS management

Firewall integration

Detecting device offline

Windows

Linux

macOS

Issues

Android

iOS

OpenVPN plugin and communication back to system service

Split tunneling

Frontends

Desktop Electron app

Android

iOS

CLI