-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NDP.p4 trimming switch code targetting Tofino TNA #92
Open
dragosdmtrsc
wants to merge
4
commits into
p4lang:master
Choose a base branch
from
correctnetworks:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# NDP trimming switch | ||
|
||
This directory hosts the `ndp.p4` trimming switch | ||
implementation targetting the TNA architecture. | ||
For more details, see our paper [Re-architecting datacenter networks and stacks for low latency and high performance](http://nets.cs.pub.ro/~costin/files/ndp.pdf). | ||
|
||
The p4 code, table population scripts and instructions for building and running | ||
NDP for the Tofino switch are in the `dev_root/` directory. | ||
|
||
# How it works | ||
|
||
To summarize, `ndp.p4` keeps an under-approximation of | ||
the buffer occupancy for each port in ingress (by means of a | ||
three-color meter). Whenever the meter turns red, it means | ||
that the buffer is full and packet undergoes trimming. To achieve | ||
that, we mark the packet to be cloned to egress and setup | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 'to be cloned from ingress to egress' |
||
the clone session to truncate the packet in such way as to | ||
only keep the packet headers. | ||
|
||
Since Tofino is keeping per-pipeline meter state, we may | ||
end up in the situation where multiple ingress pipelines are | ||
flooding a single output port without any of the meters turning | ||
red. To solve this situation, we devise a three-level meter | ||
strategy and make use of the Deflect-on-Drop capabilities | ||
on the Tofino to make ingress meters trim more aggressively for | ||
the port which is experiencing drops. After a pre-defined | ||
interval, the meters switch to an intermediate level of trimming | ||
and after even more time, when the incast has passed, to their original trim rate. | ||
|
||
A more detailed description of the implementation: | ||
|
||
* on ingress | ||
|
||
0) the packet undergoes regular ipv4 forwarding with fwd decision to port egport | ||
1) if packet is ndp control => output packet to HIGH_PRIORITY_QUEUE | ||
2) if packet is ndp data => pass packet through meter[egport] | ||
|
||
2.1) if meter color is GREEN => output packet to LOW_PRIORITY_QUEUE | ||
|
||
2.2) if meter color != GREEN => clone packet to sid where (sid maps to egport, HIGH_PRIORITY_QUEUE, | ||
packet length = 80B) | ||
3) if packet is not ndp => proceed with forwarding on OTHERS_QUEUE | ||
|
||
* on egress: | ||
1) if packet is ndp data and comes in from DoD port (dropped due to congestion) | ||
2) when trimmed or normal packets come in => do rewrites (mac src and dst addresses) and set ndp trim flags | ||
3) when clone packet back to egress to sesssion esid (esid maps to recirculation port, HIGH_PRIORITY_QUEUE, packet length = 80B) | ||
4) when packet comes back from egress clone => forward as-is (i.e. recirculate back into ingress) and notify all pipelines | ||
to transition into pessimistic mode | ||
|
||
### NDP modes: | ||
* Each egress port works in 3 modes: | ||
- optimistic | ||
- pessimistic | ||
- "halftimistic" | ||
|
||
The mode decides what meter will be used for NDP packets going out on the given port | ||
|
||
* In optimistic, we use meter_optimistic (line-rate) | ||
|
||
* In pessimistic, we use meter_pessimistic (1/4 * line-rate) | ||
|
||
* In halftimistic, we use meter_halftimistic (1/2 * line-rate) | ||
|
||
Initially, the switch starts in optimistic mode for all ports. | ||
|
||
Whenever a DoD packet is received in egress => all ingress pipelines are notified to | ||
trim more aggressively (i.e. transition into pessimistic mode). | ||
|
||
A port remains in pessimistic mode for T0 ns if no extra DoDs occur. | ||
After T0 ns, the port transitions into halftimistic mode. | ||
|
||
A port remains in halftimistic mode for T1 ns if no other DoDs occur. | ||
After T1 ns, the port transitions back into optimistic mode. | ||
|
||
NB: T0 and T1 are hardcoded into ndp.p4 and are currently set | ||
to 6us and 24us respectively. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
P4C ?= $(SDE_INSTALL)/bin/bf-p4c | ||
|
||
all: .ndp.ts | ||
|
||
.ndp.ts: ndp.p4 common/util.p4 common/headers.p4 | ||
ifndef SDE_INSTALL | ||
$(error SDE_INSTALL is undefined) | ||
endif | ||
$(P4C) $(OPT) $< --bf-rt-schema ndp.tofino/bfrt.json | ||
cp -r ndp.tofino $(SDE_INSTALL)/ | ||
@touch $@ | ||
|
||
package: | ||
tar -cvf ndp.tar -T package.txt | ||
|
||
clean: | ||
rm -rf .ndp.ts ndp.tar | ||
rm -rf .dodtest.ts | ||
|
||
.PHONY: clean package |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
# ndp.p4 | ||
|
||
This repo contains TNA P4 code for running NDP with trimming. | ||
|
||
* ndp.p4 contains the P4 source code | ||
* setup_ndp.py contains the control-plane code which populates | ||
NDP's tables | ||
* samples/ - contains configuration samples for Tofino | ||
|
||
## Compiling | ||
|
||
`ndp.p4` was tested against version `9.2.0` of the Intel SDE | ||
(formerly `bf-sde`). We assume that the following environment variables are set prior to building and running: `$SDE`, `$SDE_INSTALL`. | ||
|
||
``` | ||
make | ||
``` | ||
The output of this command is `$SDE_INSTALL/ndp.tofino/` | ||
|
||
## Running | ||
Deploying the P4 switch on hardware: | ||
``` | ||
$SDE/run_switchd.sh -p ndp -c $SDE_INSTALL/ndp.tofino/ndp.conf | ||
``` | ||
|
||
## Control plane | ||
The control plane consists of the script `setup_ndp.py` | ||
which takes as input a configuration file and populates | ||
the entries for NDP. The current configuration is static | ||
(i.e. no dynamic routing, no ARP etc.). | ||
|
||
`setup_ndp.py` works in two modes: | ||
* single-pipe: no extra CLI arguments - the input is a *single-pipe* json - it will set all tables as symmetric (see below) | ||
* multi-pipe: requires `-multi-pipe` CLI option. Expects as input a | ||
multi-pipe json file which consists of a dictionary where keys are | ||
strings representing pipe_ids and values are objects with the sole | ||
attribute "file" whose value points to the single-pipe input json | ||
for the particular pipe_id. The *single-pipe* input format is | ||
described below | ||
|
||
The *single-pipe* input to `setup_ndp.py` is a json file with the following contents: | ||
- arp - the contents of the ARP table of the switch (maps IPv4 -> MAC) - a dictionary of **arp entry** objects key IPv4 address, value MAC address | ||
- rates - a list of **rate** objects with the following attributes: | ||
- eg_port - dev_port for which the following attributes apply | ||
- rate_kbps - int - meter speed in kbps (required) | ||
- burst_kbits - int - meter "buffer" (burst size) - in kBits (required) | ||
- shaper_rate_kbps - int - meter speed in kbps (optional: default to rate_kbps) - NB: shaper_rate_kbps = 0 ==> shaper is disabled for given port | ||
- shaper_burst_kbits - int - burst size of shaper (optional: defaults to shaper_burst_kbits) | ||
- port_speed - str - one of 10G, 25G, 40G, 50G, 100G (required) | ||
- port_bufsize - int | ||
- fec - str - one of NONE, RS (required) | ||
- entries - a list of **entry** objects with the following attributes (all of them are required): | ||
- smac - source MAC of outgoing port | ||
- dip - destination IPv4 | ||
- eg_port - dev_port - outgoing device port | ||
- nhop - IPv4 of next hop or 0 if the destination IP subnet is directly connected | ||
- allow_pessimism - bool - optional: default True. Disables optimistic/pessimistic modes and only considers optimistic meter | ||
|
||
## Examples | ||
|
||
First of all, set up the PYTHONPATH | ||
``` | ||
PYTHONPATH=$SDE_INSTALL/lib/python2.7/site-packages/:$SDE_INSTALL/lib/python2.7/site-packages/tofino | ||
``` | ||
* multi-pipe | ||
``` | ||
python setup_ndp.py -multi-pipe samples/multi_pipe/r1.json | ||
``` | ||
|
||
* single-pipe | ||
``` | ||
python setup_ndp.py samples/single_pipe/r0_config.json | ||
``` | ||
|
||
* troubleshooting | ||
If running the script fails with something like `google.protobuf.internal` | ||
not found, run the following (assuming original python site-packages is in /usr/local/lib/python2.7/site-packages) | ||
``` | ||
cp -r /usr/local/lib/python2.7/site-packages/protobuf*/google/protobuf/ $SDE_INSTALL/lib/python2.7/site-packages/google/ | ||
``` | ||
|
||
Changing the running mode (single-pipe vs multi-pipe) between two | ||
consecutive runs may sometimes lead to errors. If this is the case, | ||
re-deploying ndp.p4 on the switch should solve the issue. | ||
|
||
## How it works | ||
|
||
Check out the original [NDP SIGCOMM'17 paper](https://dl.acm.org/doi/10.1145/3098822.3098825). | ||
|
||
### Current implementation | ||
* on ingress | ||
|
||
0) the packet undergoes regular ipv4 forwarding with fwd decision to port egport | ||
1) if packet is ndp control => output packet to HIGH_PRIORITY_QUEUE | ||
2) if packet is ndp data => pass packet through meter[egport] | ||
|
||
2.1) if meter color is GREEN => output packet to LOW_PRIORITY_QUEUE | ||
|
||
2.2) if meter color != GREEN => clone packet to sid where (sid maps to egport, HIGH_PRIORITY_QUEUE, | ||
packet length = 80B) | ||
3) if packet is not ndp => proceed with forwarding on OTHERS_QUEUE | ||
|
||
* on egress: | ||
1) if packet is ndp data and comes in from DoD port (dropped due to congestion) | ||
2) when trimmed or normal packets come in => do rewrites (mac src and dst addresses) and set ndp trim flags | ||
3) when clone packet back to egress to sesssion esid (esid maps to recirculation port, HIGH_PRIORITY_QUEUE, packet length = 80B) | ||
4) when packet comes back from egress clone => forward as-is (i.e. recirculate back into ingress) and notify all pipelines | ||
to transition into pessimistic mode | ||
|
||
### NDP modes: | ||
* Each egress port works in 3 modes: | ||
- optimistic | ||
- pessimistic | ||
- "halftimistic" | ||
|
||
The mode decides what meter will be used for NDP packets going out on the given port | ||
|
||
* In optimistic, we use meter_optimistic (line-rate) | ||
|
||
* In pessimistic, we use meter_pessimistic (1/4 * line-rate) | ||
|
||
* In halftimistic, we use meter_halftimistic (1/2 * line-rate) | ||
|
||
Initially, the switch starts in optimistic mode for all ports. | ||
|
||
Whenever a DoD packet is received in egress => all ingress pipelines are notified to | ||
trim more aggressively (i.e. transition into pessimistic mode). | ||
|
||
A port remains in pessimistic mode for T0 ns if no extra DoDs occur. | ||
After T0 ns, the port transitions into halftimistic mode. | ||
|
||
A port remains in halftimistic mode for T1 ns if no other DoDs occur. | ||
After T1 ns, the port transitions back into optimistic mode. | ||
|
||
NB: T0 and T1 are hardcoded into ndp.p4 and are currently set | ||
to 6us and 24us respectively. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: 'each port' -> 'each egress port'