Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP CCAs in Rust #1062

Draft
wants to merge 13 commits into
base: staging/rust-net
Choose a base branch
from
Draft

Conversation

vobst
Copy link
Member

@vobst vobst commented Feb 11, 2024

Hi all!

I've been experimenting with writing TCP congestion control algorithms (CCAs) in Rust. These patches contain abstractions for writing modules that define a single CCA, a minimal proof-of-concept CCA, and a reimplementation of BIC (the predecessor of the CUBIC algorithm that is the default choice in the kernel).

The end goal of this work is to eventually have a new CCA, i.e., one with no current upstream implementation, implemented in Rust at some point in the (probably distant) future. With that in mind, I'd appreciate if you could give my abstraction design a close look. I'm happy about all kinds of feedback (design, structure, implementation, comments, docs,...) but since this is a pretty early stage, high-level things may be more useful that tiny details :) (Note: The relevant patches are the last three.)

By now, I've been using these patches for a few days on my host system (these very characters are sent on a connection that uses bic_rust ;) ), but I lack the knowledge/setup to perform proper testing against the C implementation in a lab setup; I'd appreciate some help/suggestions on that. For what it helps, here is an Arch Linux package that the Arch-users among you can use to easily build a kernel package for your host system.

In general, these choices select Rust BIC as the default for all connections:

Symbol: RUST_TCP_ABSTRACTIONS [=y]                                                                                              │
  │ Type  : bool                                                                                                                    │
  │ Defined at net/ipv4/Kconfig:469                                                                                                 │
  │   Prompt: TCP: Rust abstractions                                                                                                │
  │   Depends on: NET [=y] && INET [=y] && RUST [=y]                                                                                │
  │   Location:                                                                                                                     │
  │     -> Networking support (NET [=y])                                                                                            │
  │       -> Networking options                                                                                                     │
  │         -> TCP/IP networking (INET [=y])                                                                                        │
  │ (1)       -> TCP: Rust abstractions (RUST_TCP_ABSTRACTIONS [=y])
Symbol: TCP_CONG_BIC_RUST [=y]                                                                                                  │
  │ Type  : tristate                                                                                                                │
  │ Defined at net/ipv4/Kconfig:505                                                                                                 │
  │   Prompt: Binary Increase Congestion (BIC) control (Rust rewrite)                                                               │
  │   Depends on: NET [=y] && INET [=y] && TCP_CONG_ADVANCED [=y] && RUST_TCP_ABSTRACTIONS [=y]                                     │
  │   Location:                                                                                                                     │
  │     -> Networking support (NET [=y])                                                                                            │
  │       -> Networking options                                                                                                     │
  │         -> TCP/IP networking (INET [=y])                                                                                        │
  │           -> TCP: advanced congestion control (TCP_CONG_ADVANCED [=y])                                                          │
  │ (4)         -> Binary Increase Congestion (BIC) control (Rust rewrite) (TCP_CONG_BIC_RUST [=y])
  │ Symbol: DEFAULT_BIC_RUST [=y]                                                                                                   │
  │ Type  : bool                                                                                                                    │
  │ Defined at net/ipv4/Kconfig:709                                                                                                 │
  │   Prompt: Bic (Rust)                                                                                                            │
  │   Depends on: <choice>                                                                                                          │
  │   Visible if: <choice> && TCP_CONG_BIC_RUST [=y]=y                                                                              │
  │   Location:                                                                                                                     │
  │     -> Networking support (NET [=y])                                                                                            │
  │       -> Networking options                                                                                                     │
  │         -> TCP/IP networking (INET [=y])                                                                                        │
  │           -> TCP: advanced congestion control (TCP_CONG_ADVANCED [=y])                                                          │
  │             -> Default TCP congestion control (<choice> [=y])                                                                   │
  │ (2)           -> Bic (Rust) (DEFAULT_BIC_RUST [=y])

And these sysctl knobs can be used to set it at runtime:

$ cat /proc/sys/net/ipv4/tcp_available_congestion_control
reno bic bic_rust cubic my_cca
$ echo bic_rust | sudo tee /proc/sys/net/ipv4/tcp_congestion_control

Note: There are a bunch of print statements that are obviously not intended to remain there forever, but I helps to see the algorithm "at work".

//! Binary Increase Congestion Control (BIC) for Fast Long-Distance
//! Networks - Lisong Xu, Khaled Harfoush, and Injong Rhee
//! IEEE INFOCOM 2004, Hong Kong, China, 2004, pp. 2514-2524 vol.4
//! doi: 10.1109/INFCOM.2004.1354672
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a link if available?

Copy link
Member Author

@vobst vobst Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There is a (dead) link in the C file. The Wayback Machine has a backup.

The DOI link would be the cleanest option, however, there the pdf is behind a paywall. Other sources for the pdf exist, but nothing I'd want to link from a kernel source file ;)

/// In binary search, go to point: `cwnd + (W_max - cwnd) / BICTCP_B`.
// SAFETY: Obvious.
// TODO: Convert to `new::(x).unwrap()` once 'const_option' is stabilised.
const BICTCP_B: NonZeroU32 = unsafe { NonZeroU32::new_unchecked(4) };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you will need a // SAFETY: ... comment for these, even though obvious.

However - since you use a lot of these constants as u32 rather than needing them to be NonZero*, it could be more clear to just make them u32 and do a static assert that they are nonzero.

Copy link
Member Author

@vobst vobst Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you will need a // SAFETY: ... comment for these, even though obvious.

In fact, it is not even going to compile when passing zero ... playground ... but I'll still add some more meaningful comment.

However - since you use a lot of these constants as u32 rather than needing them to be NonZero*, it could be
more clear to just make them u32 and do a static assert that they are nonzero.

Hmm, eventually I'd like to convert most of these constants to module parameters to match the C implementation. Ideally, there will be an option to have module parameters of type NonZeroXYZ because currently accidentally panicking your kernel is as easy as echo 0 > /sys/module/tcp_bic/parameters/max_increment. Therefore, even though it doesn't make much sense at present, I'd leave them as NonZeroU32 to make transitioning easier once there are the required module parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const BICTCP_B: NonZeroU32 = unsafe { NonZeroU32::new_unchecked(4) };
const BICTCP_B: NonZeroU32 = match NonZeroU32::new_unchecked(4) {
Some(n) => n,
None => panic!()
};

We can use const panic instead of unsafe.

net/ipv4/tcp_bic_rust.rs Outdated Show resolved Hide resolved
net/ipv4/tcp_bic_rust.rs Outdated Show resolved Hide resolved
@@ -66,7 +66,7 @@ const __LOG_PREFIX: &[u8] = b"rust_kernel\0";
/// The top level entrypoint to implementing a kernel module.
///
/// For any teardown or cleanup operations, your type may implement [`Drop`].
pub trait Module: Sized + Sync {
pub trait Module: Sized + Sync + Send {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should add this, I think some modules have the requirement that they must be set up and torn down from the same context. (I know this comes from Wedson's work)

fn cong_avoid(sk: &mut Sock<Self>, ack: u32, acked: u32);

/// Called before the sender's congestion state is changed.
fn set_state(sk: &mut Sock<Self>, new_state: State) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These default implementations should have a kernel::build_error(VTABLE_DEFAULT_ERROR). My explanation at https://lore.kernel.org/rust-for-linux/CALNs47svLHD3=LFrhi=zf4hdr--e6hGjTzT5X_U9yd8q5r7G7g@mail.gmail.com/ may help

rust/kernel/net/tcp/cong.rs Outdated Show resolved Hide resolved
/// - `inet_csk_ca(sk)` points to a valid instance of `T::Data`, which belongs
/// to the instance of the algorithm used by this socket. A callback has
/// exclusive, mutable access to this data.
pub struct Sock<T: Algorithm + ?Sized> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct lives in net/sock.h right? It should be in the same module here.

I feel like the C side sock vs. socket might not be something we want to keep, but I don't have any better suggestion.

Making this type generic over a CCA algorithm seems somewhat specialized.

rust/kernel/net/tcp/cong.rs Show resolved Hide resolved
rust/kernel/time.rs Show resolved Hide resolved
@vobst vobst force-pushed the tcp-cca-rfc branch 2 times, most recently from 0ea9f23 to 8399bea Compare February 18, 2024 21:44
@vobst vobst changed the base branch from rust-next to rust-net February 18, 2024 21:50
@vobst vobst changed the base branch from rust-net to rust-next February 18, 2024 21:51
@vobst vobst changed the base branch from rust-next to rust-net February 19, 2024 09:35
@vobst vobst marked this pull request as draft February 19, 2024 09:36
Valentin Obst added 2 commits February 19, 2024 10:46
Signed-off-by: Valentin Obst <kernel@valentinobst.de>
When generating the `rust-project.json`, also search for modules in the
`net/` folder.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
@vobst vobst force-pushed the tcp-cca-rfc branch 3 times, most recently from 257334e to 1986ac1 Compare February 25, 2024 20:35
wedsonaf and others added 11 commits March 2, 2024 15:41
This allows modules to be initialised in-place in pinned memory, which
enables the usage of pinned types (e.g., mutexes, spinlocks, driver
registrations, etc.) in modules without any extra allocations.

Drivers that don't need this may continue to implement `Module` without
any changes.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
[kernel@valentinobst.de: remove feature return_position_impl_trait_in_trait
as it is now stabilised]
[kernel@valentinobst.de: remove `Send` trait bound on `Module` and
`InPlaceModule`]
We'll need it, for example, when calling `register_filesystem` to
initialise a file system registration.

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
In net/tcp time values are usually 32bit wide unsigned integers, and
either in units of jiffies, microseconds or milliseconds. Add types,
constants, and functions to work with 32bit time values.

This is, for example, used in the CUBIC and BIC CCAs.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Add a macro to determine the size of a structure field at compile time.
This is used by the CCA abstractions to ensure that the private data of
every CCA will fit into the space that the kernel provides for it.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
# Describe the purpose of this series. The information you put here
# will be used by the project maintainer to make a decision whether
# your patches should be reviewed, and in what priority order. Please be
# very detailed and link to any relevant discussions or sites that the
# maintainer can review to better understand your proposed changes. If you
# only have a single patch in your series, the contents of the cover
# letter will be appended to the "under-the-cut" portion of the patch.

# Lines starting with # will be removed from the cover letter. You can
# use them to add notes or reminders to yourself. If you want to use
# markdown headers in your cover letter, start the line with ">#".

# You can add trailers to the cover letter. Any email addresses found in
# these trailers will be added to the addresses specified/generated
# during the b4 send stage. You can also run "b4 prep --auto-to-cc" to
# auto-populate the To: and Cc: trailers based on the code being
# modified.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 1,
    "change-id": "20240301-tcp-cca-rfc-1a0fbcaa533c",
    "prefixes": []
  }
}
Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Add an example that uses the `module_cca` macro and the `Algorithm`
trait to implement a minimal CCA.

IMPORTANT: This CCA is not compliant with the relevant RFCs and must not
be used outside of test environments.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reimplement the Binary Increase Congestion (BIC) control algorithm in
Rust. BIC is one of the smallest CCAs in the kernel and this mainly
serves as a minimal example for a real-world algorithm.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Signed-off-by: Valentin Obst <kernel@valentinobst.de>
CUBIC is the default CCA since 2.6.18.

Missing features compared to the C implementation:
- configuration via module parameters,
- exporting callbacks to BPF programs as kfuncs.

Changes compared to the C implementation:
- uses only SI units for time, i.e., no jiffies and `BICTCP_HZ`,

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants