Skip to content

Latest commit

 

History

History
527 lines (420 loc) · 18.7 KB

Vxlan_hld.md

File metadata and controls

527 lines (420 loc) · 18.7 KB

Vxlan SONiC

High Level Design Document

Rev 1.3

Table of Contents

Revision
Rev Date Author Change Description
0.1 Prince Sunny Initial version
1.0 Prince Sunny Review comments/feedback
1.1 Prince Sunny Review comments
1.2 Prince Sunny Design change for VNET Table flow
1.3 Prince Sunny VNet and Route Delete flow

About this Manual

This document provides general information about the Vxlan feature implementation in SONiC.

Scope

This document describes the high level design of the Vxlan feature. Kernel VRF (L3mdev) programming for VNET peering is beyond the scope of this document.

Definitions/Abbreviation

Table 1: Abbreviations
VNI Vxlan Network Identifier
VTEP Vxlan Tunnel End Point
VM Virtual Machine
VRF Virtual Routing and Forwarding
VNet Virtual Network

1 Requirements Overview

1.1 Functional requirements

This section describes the SONiC requirements for Vxlan feature primarily in the context of VNet.

At a high level the following should be supported:

Phase #1

  • Should be able to perform the role of Vxlan Tunnel End Point (VTEP)
  • VNet peering between customer VMs and Baremetal servers VNet Requirements.
  • Distributed Vxlan routing with Symmetric IRB model (RIOT)

Phase #2

  • Integration with BGP EVPN
  • Should support untagged or tagged traffic (Overlay layer 2 networks over layer 3 underlay)
  • Should be able to do HER for unicast traffic based on configured flood list
  • CLI commands to configure Vxlan

1.2 Orchagent requirements

Vxlan orchagent:

  • Should be able to create VRF/BRIDGE/VLAN to VNI mapping.
  • Should be able to create NH Tunnel and Tunnel termination tables.
  • Should be able to create tunnels and encap/decap mappers.

Vnet orchagent:

  • Should be able to create VRFs per VNET tables.
  • Should be able to track peering configurations.
  • Should be VNet/VRF aware

Vnet Route orchagent:

  • Should be able to handle routes within a VNet
  • Should be able to create NH tunnels for the endpoints
  • Should be VNet/VRF aware

FDB orchagent:

  • Should be VTEP aware
  • Should support static configuration of FDB entries learnt on remote VTEP

INTFs orchagent:

  • Should be VRF aware
  • Should be able to create router interfaces in a specific VRF

1.3 CLI requirements

  • User should be able to get FDB learnt per VNI
  • User should be able to configure Vxlan tunnels and VTEPs (Overlay)

In summary:

	- config vxlan <vxlan_name> vlan <vlan_id> vni <vni_id>
	- config vxlan <vxlan_name> src_if <interface>
	- config vxlan <vxlan_name> vlan <vlan_id> flood vtep <ip1, ip2, ip3>
	- show mac vxlan <vxlan_name> <vni_id>
	- show vxlan <vxlan_name>

Configuring VNet peering via CLI is beyond the scope

1.4 Scalability requirements

1.4.1 VNet Peering

Table 2: VNet peering scalability
Vxlan component Expected value
VNI 8k
Tunnel encaps 128k
VMs 512k
VRFs 128
Routes 512k

1.5 Warm Restart requirements

Phase #1 shall not include warm restart capabilities. SAI VR objects are not compliant with warm restart currently. This shall be revisited in Phase #2.

2 Modules Design

2.1 Config DB

Following new tables will be added to Config DB. Unless otherwise stated, the attributes are mandatory

2.1.1 VXLAN Table

VXLAN_TUNNEL|{{tunnel_name}} 
    "src_ip": {{ip_address}} 
    "dst_ip": {{ip_address}} (OPTIONAL)

VXLAN_TUNNEL_MAP|{{tunnel_name}}|{{tunnel_map}}
    "vni": {{ vni_id}}
    "vlan": {{ vlan_id }}

2.1.2 VNET/Interface Table

VNET|{{vnet_name}} 
    "vxlan_tunnel": {{tunnel_name}}
    "vni": {{vni}} 
    "peer_list": {{vnet_name_list}} (OPTIONAL)

INTERFACE|{{intf_name}} 
    "vnet_name": {{vnet_name}} 
    
INTERFACE|{{intf_name}}|{{prefix}}  
    { }
    
VLAN_INTERFACE|{{intf_name}} 
    "vnet_name": {{vnet_name}} 
    
VLAN_INTERFACE|{{intf_name}}|{{prefix}}  
    { }
    
NEIGH_TABLE|{{intf_name}}|{{ip_address}}
    "family": "IPv4" 

2.1.3 ConfigDB Schemas

; Defines schema for VXLAN Tunnel configuration attributes
key                                   = VXLAN_TUNNEL:name             ; Vxlan tunnel configuration
; field                               = value
SRC_IP                                = ipv4                          ; Ipv4 source address, lpbk address for tunnel term
DST_IP                                = ipv4                          ; Ipv4 destination address, for P2P

;value annotations
ipv4          = dec-octet "." dec-octet "." dec-octet "." dec-octet     
dec-octet     = DIGIT                     ; 0-9  
                  / %x31-39 DIGIT         ; 10-99  
                  / "1" 2DIGIT            ; 100-199  
                  / "2" %x30-34 DIGIT     ; 200-249
		  
; Defines schema for VXLAN Tunnel map configuration attributes
key                                   = VXLAN_TUNNEL:tunnel_name:name ; Vxlan tunnel configuration
; field                               = value
VNI                                   = DIGITS                        ; 1 to 16 million values
VLAN                                  = 1\*4DIGIT                     ; 1 to 4094 Vlan id
; Defines schema for VNet configuration attributes
key                                   = VNET:name                     ; Vnet name
; field                               = value
VXLAN_TUNNEL                          = tunnel_name                   ; refers to the Vxlan tunnel name
VNI                                   = DIGITS                        ; 1 to 16 million VNI values
PEER_LIST                             = \*vnet_name                   ; vnet names seperate by "," 
                                                                             (empty indicates no peering)
; Defines schema for VNet Interface configuration attributes
key                                   = INTERFACE:name                ; Vnet interface name. This can be port, vlan 
                                                                        or port-channel interface
; field                               = value
VNET_NAME                             = vnet_name                     ; vnet name where the interface belongs to

; Defines schema for VNet Interface configuration attributes
key                                   = INTERFACE:name:prefix         ; Vnet interface name with IP prefix. No change to 
                                                                        existing schema. 
; field                               = value

; Defines schema for VNet Neighbor configuration attributes
key                                   = NEIGH_TABLE:name:ip_address   ; Vnet neighbor with IP address. Swss shall resolve
                                                                        the mac addresss for this configuration
; field                               = value
family                                = IPv4/IPv6                     ; Address family

Please refer to the schema document for details on value annotations.

2.2 APP DB

Two new tables would be introduced to specify routes and tunnel end points in VNet domain.

VNET_ROUTE_TABLE:{{vnet_name}}:{{prefix}} 
    "nexthop": {{ip_address}} (OPTIONAL) 
    "ifname": {{intf_name}} 
 
VNET_ROUTE_TUNNEL_TABLE:{{vnet_name}}:{{prefix}} 
    "endpoint": {{ip_address}} 
    "mac_address":{{mac_address}} (OPTIONAL) 
    "vni": {{vni}}(OPTIONAL) 
VXLAN_FDB_TABLE::{{tunnel_name}}:{{vni_id}}:{{mac_address}}
    "remote_vtep": {{ip_address}} 

VRFMgrD creates the following VNET Table

VNET_TABLE:{{vnet_name}}
    "vxlan_tunnel": {{tunnel_name}}
    "vni": {{vni}} 
    "peer_list": {{ vnet_name_list }}

2.2.1 APP DB Schemas

; Defines schema for VNet Route table attributes
key                                   = VNET_ROUTE_TABLE:vnet_name:prefix ; Vnet route table with prefix
; field                               = value
NEXTHOP                               = ipv4                          ; Nexthop IP address
IFNAME                                = ifname                        ; Interface name
; Defines schema for VNet Route tunnel table attributes
key                                   = VNET_ROUTE_TUNNEL_TABLE:vnet_name:prefix ; Vnet route tunnel table with prefix
; field                               = value
ENDPOINT                              = ipv4                          ; Host VM IP address
MAC_ADDRESS                           = 12HEXDIG                      ; Inner dest mac in encapsulated packet (Optional)
VNI                                   = DIGITS                        ; VNI value in encapsulated packet (Optional)
; Defines FDB entries for remote VTEP
key                                   = VXLAN_FDB_TABLE:tunnel_name:vni_id:mac_address ; Remotely learnt mac-address
REMOTE_VTEP                           = ipv4                          ; Remote VTEP where the host resides
; Defines schema for VXLAN VRF Tunnel map attributes
key                                   = VXLAN_TUNNEL:tunnel_name:name ; Vxlan tunnel map
; field                               = value
VNI                                   = DIGITS                        ; 1 to 16 million values
VRF                                   = vrf_name                      ; VRF name 
; Defines schema for VNET Table attributes
key                                   = VNET_TABLE:name               ; VNet table name
; field                               = value
VXLAN_TUNNEL                          = tunnel_name                   ; refers to the Vxlan tunnel name
VNI                                   = DIGITS                        ; 1 to 16 million VNI values
PEER_LIST                             = \*vnet_name                   ; vnet names seperate by "," 
                                                                             (empty indicates no peering)

2.3 Orchestration Agent

Following orchagents shall be modified. Flow diagrams are captured in a later section.

VxlanOrch

This is the major subsystem for Vxlan that handles configuration request. Vxlanorch creates the tunnel and attaches encap and decap mappers. Seperate tunnels are created for L2 Vxlan and L3 Vxlan and can attach different VLAN/VNI or VRF/VNI to respective tunnel.

VrfMgrD

VrfMgrD gets the VNET Table config and creates the L3mdev interface in kernel. VrfMgrD updates the APP_DB with VNET_TABLE later to be used by VnetOrch. VrfMgrD also updates the STATE_DB for the status of VRF created.

VrfOrch

VrfOrch creates VRF in SAI from APP_DB updates from VrfMgrD for the regular VRF configurations. RouterOrch fetch this information for programming routes based on VRF.

VnetOrch/VnetRouteOrch

VnetOrch is another major component introduced for the VNet usecase. VnetOrch creates ingress/Egress (based on context) VRF or BRIDGE in SAI for a VNet and also maintains the peering list. VnetOrch call VxlanOrch API to create the encap/decap mappers for the VNet. VnetRouterOrch fetch the VRF and peering information for replicating the routes, if applicable. When app-route-table has new updates for the VNet, VnetRouteOrch gets the VNet objects (VRF or BRIDGE) from VnetOrch and programs SAI.
- VNET_ROUTE_TABLE is translated to create subnet/local route entries
- VNET_ROUTE_TUNNEL_TABLE is translated to create routes with tunnel nexthop

IntfMgrD

IntfMgrD creates the kernel routing interface and enslave it to the VRF L3mdev. IntfMgrD waits for VRF creation update in STATE_DB and updates the APP_DB INTF_TABLE with the Vrf/VNet name.

IntfsOrch

Add VrfOrch as a member of IntfsOrch. IntfsOrch creates Router Interfaces based on interface table (INTF_TABLE) and the VRF information. For VNet usecase, IntfOrch calls VnetOrch API to handle router interface creation.

FdbOrch

Add VxlanOrch as a member of FDBOrch. For FDB entries learnt on remote VTEP, app-fdb-table shall be updated and programmed to SAI by getting the BridgeIf/RemoteVTEP mapping from VxlanOrch. (TBD)

The overall data flow diagram is captured below for all TABLE updates.

2.4 SAI

Shown below table represents main SAI attributes which shall be used for Vxlan

Table 3: VNet peering SAI attributes
Vxlan component SAI attribute
Vxlan Tunnel type SAI_TUNNEL_TYPE_VXLAN
Encap mapper SAI_TUNNEL_MAP_TYPE_VIRTUAL_ROUTER_ID_TO_VNI
Decap mapper SAI_TUNNEL_MAP_TYPE_VNI_TO_VIRTUAL_ROUTER_ID
Nexthop tunnel SAI_NEXT_HOP_TYPE_TUNNEL_ENCAP
Tunnel term type SAI_TUNNEL_TERM_TABLE_ENTRY_TYPE_P2MP
Vxlan MAC SAI_SWITCH_ATTR_VXLAN_DEFAULT_ROUTER_MAC
Vxlan port SAI_SWITCH_ATTR_VXLAN_DEFAULT_PORT

2.5 CLI

Commands summary (Phase #2):

	- config vxlan <vxlan_name> vlan <vlan_id> vni <vni_id>
	- config vxlan <vxlan_name> src_if <interface>
	- config vxlan <vxlan_name> vlan <vlan_id> flood vtep <ip1, ip2, ip3>
	- show mac vxlan <vxlan_name> <vni_id>
	- show vxlan <vxlan_name>

2.5.1 Vxlan utility interface

vxlan
Usage: vxlan [OPTIONS] COMMAND [ARGS]...

  Utility to operate with Vxlan configuration.

Options:
  --help  Show this message and exit.

Commands:
  config   Set Vxlan configuration.
  show     Show Vxlan information.

2.5.2 Config CLI command

Config command should be extended in order to add "vxlan" alias

Usage: config [OPTIONS] COMMAND [ARGS]...

  SONiC command line - 'config' command

Options:
  --help  Show this message and exit.

Commands:
...
  vxlan               vxlan related configuration.

2.5.3 Show CLI command

Show command should be extended in order to add "vxlan" alias

show
Usage: show [OPTIONS] COMMAND [ARGS]...

  SONiC command line - 'show' command

Options:
  -?, -h, --help  Show this message and exit.

Commands:
  ...
  vxlan                   Show vxlan related information

3 Flows

3.1 Vxlan VNet peering

Layer 2 Vxlan

TBD

3.2 Vxlan CLI flow

TBD

4 Example configuration

Vnet Configurations

Vnet 1 
	□ VNI - 2000
	□ VMs
		VM1. CA: 100.100.1.1/32, PA: 10.10.10.1, MAC: 00:00:00:00:01:02
	□ BM1 
		Connected on Ethernet1 
		Ip: 100.100.3.2/24
		MAC: 00:00:AA:AA:AA:01

Vnet 2 
	□ VNI - 3000
	□ VMs
		VM2. CA: 100.100.2.1/32, PA: 10.10.10.2, MAC: 00:00:00:00:03:04
	□ BM2 
		Connected on Ethernet2 in Vlan2000
		Ip: 100.100.4.2/24
		MAC: 00:00:AA:AA:AA:02

ConfigDB objects: 

{ 
    "VXLAN_TUNNEL": {
        "tunnel1": {
            "src_ip": "10.10.10.10"
        }
    },

    "VNET": {
        "Vnet_2000": {
            "vxlan_tunnel": "tunnel1",
            "vni": "2000",
            "peer_list": ""
        }
    },

    "INTERFACE": {
        "Ethernet1": { 
            "vnet_name": "Vnet_2000"
        }
    },
     
    "INTERFACE": {
        "Ethernet1|100.100.3.1/24": {}
    }
   
    "NEIGH": {
        "Ethernet1|100.100.3.2": {
            "family": "IPv4"
     },
     
    "VNET": {
        "Vnet_3000": { 
            "vxlan_tunnel": "tunnel1", 
            "vni": "3000", 
            "peer_list": "Vnet_2000"
        }
    },
   
    "VLAN": {
        "Vlan2000": {
            "vlanid": 2000
        }
    },
    
    "VLAN_MEMBER": {
        "Vlan2000|Ethernet2": {
            "tagging_mode": "tagged"
         }
    },

    "VLAN_INTERFACE": {
        "Vlan2000": {
             "vnet_name": "Vnet_3000"
	 }
    },
  
    "VLAN_INTERFACE": {
        "Vlan2000|100.100.4.1/24": {}
    },

    "NEIGH": {
        "Vlan2000|100.100.4.2": {
            "family": "IPv4"
     },

APPDB Objects: 

{  
    "VNET_ROUTE_TABLE:Vnet_2000:100.100.3.0/24": { 
        "ifname": "Ethernet1", 
    }, 

    "VNET_ROUTE_TABLE:Vnet_3000:100.100.4.0/24": { 
        "ifname": "Vlan2000", 
    }, 

    "VNET_ROUTE_TUNNEL_TABLE:Vnet_2000:100.100.1.1/32": { 
        "endpoint": "10.10.10.1", 
    }, 

    "VNET_ROUTE_TUNNEL_TABLE:Vnet_3000:100.100.2.1/32": { 
        "endpoint": "10.10.10.2", 
        "mac_address": "00:00:00:00:03:04"
    }, 
}