From 0accab56fa0a75f8efe60590f61edc196cd7bf24 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 9 Sep 2024 12:22:19 +0100 Subject: [PATCH 01/45] Add epa-howto --- docs/src/snap/howto/epa.md | 1016 ++++++++++++++++++++++++++++++++++++ 1 file changed, 1016 insertions(+) create mode 100644 docs/src/snap/howto/epa.md diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md new file mode 100644 index 000000000..3e79486e6 --- /dev/null +++ b/docs/src/snap/howto/epa.md @@ -0,0 +1,1016 @@ +# How to set up Enhanced Platform Awareness + +This section explains how to set up the EPA features in a {{product}} cluster. + +The content starts with the setup of the environment (including steps for using [MAAS][]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK networking components. Finally, the steps needed to test every EPA feature: HugePages, Real-time Kernel, CPU Pinning / Numa Topology Awareness and SR-IOV/DPDK. + +## What you'll need + +- An Ubuntu Pro subscription (required for real-time kernel) +- Ubuntu instances **or** a MAAS environment to run {product} on + + +## Prepare the Environment + + +`````{tabs} +````{group-tab} Ubuntu + +First, run the `numactl` command to get the number of CPUs available for NUMA: + +``` +numactl -s +``` + +This example output shows that there are 32 CPUs available for NUMA: + +``` +policy: default +preferred node: current +physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 +cpubind: 0 1 +nodebind: 0 1 +membind: 0 1 +``` + +```{dropdown} Detailed explanation of output + +- `policy: default`: indicates that the system is using the default NUMA policy. The default policy typically tries to allocate memory on the same node as the processor executing a task, but it can fall back to other nodes if necessary. +- `preferred node: current`: processes will prefer to allocate memory from the current node (the node where the process is running). However, if memory is not available on the current node, it can be allocated from other nodes. +- `physcpubind: 0 1 2 3 ... 31 `: shows the physical CPUs that processes are allowed to run on. In this case, the system has 32 physical CPUs enabled for NUMA, and processes can use any of them. +- `cpubind: 0 1 `: indicates the specific CPUs that the current process (meaning the process “numactl \-s”) is bound to. It's currently using CPUs 0 and 1. +- `nodebind: 0 1 `: shows the NUMA nodes that the current process (meaning the process “numactl \-s”) is allowed to use for memory allocation. It has access to both node 0 and node 1. +- `membind`: 0 1 `: confirms that the current process (meaning the process “numactl \-s”) can allocate memory from both node 0 and node 1. +``` + +### Enable the real-time kernel + +``` +sudo pro attach +sudo apt update && sudo apt install ubuntu-advantage-tools +sudo pro enable realtime-kernel +``` + +This should produce output similar to: + +``` +One moment, checking your subscription first +Real-time kernel cannot be enabled with Livepatch. +Disable Livepatch and proceed to enable Real-time kernel? (y/N) y +Disabling incompatible service: Livepatch +The Real-time kernel is an Ubuntu kernel with PREEMPT_RT patches integrated. + +This will change your kernel. To revert to your original kernel, you will need +to make the change manually. + +Do you want to continue? [ default = Yes ]: (Y/n) Y +Updating Real-time kernel package lists +Updating standard Ubuntu package lists +Installing Real-time kernel packages +Real-time kernel enabled +A reboot is required to complete install. +``` + +First the Ubuntu system is attached to an Ubuntu Pro subscription +(needed to use the real-time kernel), requiring you to enter a token +associated with the subscription. After successful attachment, your +system gains access to the Ubuntu Pro repositories, including the one +containing the real-time kernel packages. Once the tools and +real-time kernel are installed, a reboot is required to start using +the new kernel. + +### Create a configuration file to enable HugePages and CPU isolation + +The bootloader will need a configuration file to enable the recommended +boot options (explained below) to enable HugePages and CPU isolation. + +In this example, the host has 128 CPUs, and 2M / 1G HugePages are enabled. +This is the command to update the boot options and reboot the system: + +``` +cat < /etc/default/grub.d/epa_kernel_options.cfg +GRUB_CMDLINE_LINUX_DEFAULT="${GRUB_CMDLINE_LINUX_DEFAULT} intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 crashkernel=auto softlockup_panic=0 audit=0 tsc=nowatchdog intel_pstate=disable mce=off hugepagesz=1G hugepages=1000 hugepagesz=2M hugepages=0 default_hugepagesz=1G kthread_cpus=0-31 irqaffinity=0-31 nohz=on nosoftlockup nohz_full=32-127 rcu_nocbs=32-127 rcu_nocb_poll skew_tick=1 isolcpus=managed_irq,32-127 console=tty0 console=ttyS0,115200n8" +EOF +sudo chmod 0644 /etc/netplan/99-sriov_vfs.yaml +update-grub +reboot +``` + +```{dropdown} Explanation of boot options + +- `intel_iommu=on`: Enables Intel's Input-Output Memory Management Unit (IOMMU), which is used for device virtualization and Direct Memory Access (DMA) remapping. +- `iommu=pt`: Sets the IOMMU to passthrough mode, allowing devices to directly access physical memory without translation. +- `usbcore.autosuspend=-1`: Disables USB autosuspend, preventing USB devices from being automatically suspended to save power. +- `selinux=0`: Disables Security-Enhanced Linux (SELinux), a security module that provides mandatory access control. +- `enforcing=0`: If SELinux is enabled, this option sets it to permissive mode, where policies are not enforced but violations are logged. +- `nmi_watchdog=0`: Disables the Non-Maskable Interrupt (NMI) watchdog, which is used to detect and respond to system hangs. +- `crashkernel=auto`: Reserves a portion of memory for capturing a crash dump in the event of a kernel crash. +- `softlockup_panic=0`: Prevents the kernel from panicking (crashing) on detecting a soft lockup, where a CPU appears to be stuck. +- `audit=0`: Disables the kernel auditing system, which logs security-relevant events. +- `tsc=nowatchdog`: Disables the Time Stamp Counter (TSC) watchdog, which checks for issues with the TSC. +- `intel_pstate=disable`: Disables the Intel P-state driver, which controls CPU frequency scaling. +- `mce=off`: Disables Machine Check Exception (MCE) handling, which detects and reports hardware errors. +- `hugepagesz=1G hugepages=1000`: Allocates 1000 huge pages of 1GB each. +- `hugepagesz=2M hugepages=0`: Configures huge pages of 2MB size but sets their count to 0\. +- `default_hugepagesz=1G`: Sets the default size for huge pages to 1GB. +- `kthread_cpus=0-31`: Restricts kernel threads to run on CPUs 0-31. +- `irqaffinity=0-31`: Restricts interrupt handling to CPUs 0-31. +- `nohz=on`: Enables the nohz (no timer tick) mode, reducing timer interrupts on idle CPUs. +- `nosoftlockup`: Disables the detection of soft lockups. +- `nohz_full=32-127`: Enables nohz\_full (full tickless) mode on CPUs 32-127, reducing timer interrupts during application processing. +- `rcu_nocbs=32-127`: Offloads RCU (Read-Copy-Update) callbacks to CPUs 32-127, preventing them from running on these CPUs. +- `rcu_nocb_poll`: Enables polling for RCU callbacks instead of using interrupts. +- `skew_tick=1`: Skews the timer tick across CPUs to reduce contention. +- `isolcpus=managed_irq,32-127`: Isolates CPUs 32-127 and assigns managed IRQs to them, reducing their involvement in system processes and dedicating them to specific workloads. +- `console=tty0`: Sets the console output to the first virtual terminal. +- `console=ttyS0,115200n8`: Sets the console output to the serial port ttyS0 with a baud rate of 115200, 8 data bits, no parity, and 1 stop bit. +``` + +Once the reboot has taken place, ensure the HugePages configuration has been applied: + +``` +grep HugePages /proc/meminfo +``` +This should generate output indicating the number of pages allocated + +``` +HugePages_Total: 1000 +HugePages_Free: 1000 +HugePages_Rsvd: 0 +HugePages_Surp: 0 +``` + + +Next, create a configuration file to configure the network interface +to use SR-IOV (so it can create virtual functions afterwards) using +Netplan. In the example below the file is created first, then the configuration is +applied and then the 128 virtual functions are available for use in the environment: + +``` +cat < /etc/netplan/99-sriov_vfs.yaml + network: + ethernets: + enp152s0f1: + virtual-function-count: 128 +EOF +sudo chmod 0600 /etc/netplan/99-sriov_vfs.yaml +sudo netplan apply +ip link show enp152s0f1 +``` +The output of the last command should indicate the device is working and has generated the expected +virtual functions. + +``` +5: enp152s0f1: mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000 + link/ether 40:a6:b7:96:d8:89 brd ff:ff:ff:ff:ff:ff + vf 0 link/ether ae:31:7f:91:09:97 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off + vf 1 link/ether 32:09:8b:f7:07:4b brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off + vf 2 link/ether 12:b9:c6:08:fc:36 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off + .......... + vf 125 link/ether 92:10:ff:8a:e5:0c brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off + vf 126 link/ether 66:fe:ad:f2:d3:05 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off + vf 127 link/ether ca:20:00:c6:83:dd brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off +``` +```{dropdown} Explanation of steps + * Breakdown of the content of the file /etc/netplan/99-sriov\_vfs.yaml : + * path: /etc/netplan/99-sriov\_vfs.yaml: This specifies the location of the configuration file. The "99" prefix in the filename usually indicates that it will be processed last, potentially overriding other configurations. + * enp152s0f1: This is the name of the physical network interface you want to create VFs on. This name may vary depending on your system. + * virtual-function-count: 128: This is the key line that instructs Netplan to create 128 virtual functions on the specified physical interface. Each of these VFs can be assigned to a different virtual machine or container, effectively allowing them to share the physical adapter's bandwidth. + * permissions: "0600": This is an optional line that sets the file permissions to 600 (read and write access only for the owner). + * Breakdown of the output of ip link show enp152s0f1 command: + * Main interface: + * 5: The index number of the network interface in the system. + * enp152s0f1: The name of the physical network interface. + * \: The interface's flags indicating its capabilities (e.g., broadcast, multicast) and current status (UP). + * mtu 9000: The maximum transmission unit (MTU) is set to 9000 bytes, larger than the typical 1500 bytes, likely for jumbo frames. + * qdisc mq: The queuing discipline (qdisc) is set to "mq" (multi-queue), designed for multi-core systems. + * state UP: The interface is currently active and operational. + * mode DEFAULT: The interface is in the default mode of operation. + * qlen 1000: The maximum number of packets allowed in the transmit queue. + * link/ether 40:a6:b7:96:d8:89: The interface's MAC address (a unique hardware identifier). + * Virtual functions: + * vf \: The index number of the virtual function. + * link/ether \: The MAC address assigned to the virtual function. + * spoof checking on: A security feature to prevent MAC address spoofing (pretending to be another device). + * link-state auto: The link state (up/down) is determined automatically based on the physical connection. + * trust off: The interface doesn't trust the incoming VLAN (Virtual LAN) tags. + * Results: + * Successful VF Creation: The output confirms a success creation of 128 VFs (numbered 0 through 127\) on the enp152s0f1 interface. + * VF Availability: Each VF is ready for use, and they can be assigned i.e. to {{product}} containers to give them direct access to the network through this physical network interface. + * MAC Addresses: Each VF has its own unique MAC address, which is essential for network communication. +``` + + +* Now let’s enable DPDK, first by cloning the DPDK repo, and then placing the script that will bind the VFs to the VFIO-PCI driver in the location that will run automatically each time the system boots up, so the VFIO (Virtual Function I/O) bindings are applied consistently: + +``` +git clone https://github.com/DPDK/dpdk.git /home/ubuntu/dpdk +cat < /var/lib/cloud/scripts/per-boot/dpdk_bind.sh + #!/bin/bash + if [ -d /home/ubuntu/dpdk ]; then + modprobe vfio-pci + vfs=$(python3 /home/ubuntu/dpdk/usertools/dpdk-devbind.py -s | grep drv=iavf | awk '{print $1}' | tail -n +11) + python3 /home/ubuntu/dpdk/usertools/dpdk-devbind.py --bind=vfio-pci $vfs + fi +sudo chmod 0755 /var/lib/cloud/scripts/per-boot/dpdk_bind.sh +``` +```{dropdown} Explanation + * Load VFIO Module (modprobe vfio-pci): If the DPDK directory exists, the script loads the VFIO-PCI kernel module. This module is necessary for the VFIO driver to function. + * The script uses the dpdk-devbind.py tool (included with DPDK) to list the available network devices and their drivers. + * It filters this output using grep drv=iavf to find devices that are currently using the iavf driver (a common driver for Intel network adapters), excluding the physical network interface itself and just focusing on the virtual functions (VFs). + * Bind VFs to VFIO: The script uses dpdk-devbind.py again, this time with the \--bind=vfio-pci option, to bind the identified VFs to the VFIO-PCI driver. This step essentially tells the kernel to relinquish control of these devices to DPDK. +``` + +To test that the VFIO Kernel Module and DPDK are enabled: + +``` +lsmod | grep -E 'vfio' +``` + +...should indicate the kernel module is loaded + +``` +vfio_pci 16384 0 +vfio_pci_core 94208 1 vfio_pci +vfio_iommu_type1 53248 0 +vfio 73728 3 vfio_pci_core,vfio_iommu_type1,vfio_pci +iommufd 98304 1 vfio +irqbypass 12288 2 vfio_pci_core,kvm + +``` + +Running the helper script: + +``` +python3 /home/ubuntu/dpdk/usertools/dpdk-devbind.py -s +``` + +...should return a list of network devices using DPDK: + +``` +Network devices using DPDK-compatible driver +============================================ +0000:98:12.2 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf +0000:98:12.3 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf +0000:98:12.4 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf +.... +``` + +With these preparation steps we have enabled the features of EPA: + +* NUMA and CPU Pinning are available to the first 32 CPUs +* Real-Time Kernel is enabled +* HugePages are enabled and 1000 1G huge pages are available +* SRIOV is enabled in the enp152s0f1 interface, with 128 virtual function interfaces bound to the vfio-pci driver (that could also use the iavf driver) +* DPDK is enabled in all the 128 virtual function interfaces + +```` + +````{group-tab} MAAS + +To prepare a machine for CPU isolation, Hugepages, real-time kernel, SRIOV and DPDK we leverage cloud-init through MAAS. + +``` +#cloud-config + +apt: + sources: + rtk.list: + source: "deb https://:@private-ppa.launchpadcontent.net/canonical-kernel-rt/ppa/ubuntu jammy main" + +write_files: + # set kernel option with hugepages and cpu isolation + - path: /etc/default/grub.d/100-telco_kernel_options.cfg + content: | + GRUB_CMDLINE_LINUX_DEFAULT="${GRUB_CMDLINE_LINUX_DEFAULT} intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 crashkernel=auto softlockup_panic=0 audit=0 tsc=nowatchdog intel_pstate=disable mce=off hugepagesz=1G hugepages=1000 hugepagesz=2M hugepages=0 default_hugepagesz=1G kthread_cpus=0-31 irqaffinity=0-31 nohz=on nosoftlockup nohz_full=32-127 rcu_nocbs=32-127 rcu_nocb_poll skew_tick=1 isolcpus=managed_irq,32-127 console=tty0 console=ttyS0,115200n8" + permissions: "0644" + + # create sriov VFs + - path: /etc/netplan/99-sriov_vfs.yaml + content: | + network: + ethernets: + enp152s0f1: + virtual-function-count: 128 + permissions: "0600" + + # ensure VFs are bound to vfio-pci driver (so they can be consumed by pods) + - path: /var/lib/cloud/scripts/per-boot/dpdk_bind.sh + content: | + #!/bin/bash + if [ -d /home/ubuntu/dpdk ]; then + modprobe vfio-pci + vfs=$(python3 /home/ubuntu/dpdk/usertools/dpdk-devbind.py -s | grep drv=iavf | awk '{print $1}' | tail -n +11) + python3 /home/ubuntu/dpdk/usertools/dpdk-devbind.py --bind=vfio-pci $vfs + fi + permissions: "0755" + + # set proxy variables + - path: /etc/environment + content: | + HTTPS_PROXY=http://10.18.2.1:3128 + HTTP_PROXY=http://10.18.2.1:3128 + NO_PROXY=10.0.0.0/8,192.168.0.0/16,127.0.0.1,172.16.0.0/16,.svc,localhost + https_proxy=http://10.18.2.1:3128 + http_proxy=http://10.18.2.1:3128 + no_proxy=10.0.0.0/8,192.168.0.0/16,127.0.0.1,172.16.0.0/16,.svc,localhost + append: true + + # add rtk ppa key + - path: /etc/apt/trusted.gpg.d/rtk.asc + content: | + -----BEGIN PGP PUBLIC KEY BLOCK----- + Comment: Hostname: + Version: Hockeypuck 2.2 + + xsFNBGAervwBEADHCeEuR7WKRiEII+uFOu8J+W47MZOcVhfNpu4rdcveL4qe4gj4 + nsROMHaINeUPCmv7/4EXdXtTm1VksXeh4xTeqH6ZaQre8YZ9Hf4OYNRcnFOn0KR+ + aCk0OWe9xkoDbrSYd3wmx8NG/Eau2C7URzYzYWwdHgZv6elUKk6RDbDh6XzIaChm + kLsErCP1SiYhKQvD3Q0qfXdRG908lycCxgejcJIdYxgxOYFFPcyC+kJy2OynnvQr + 4Yw6LJ2LhwsA7bJ5hhQDCYZ4foKCXX9I59G71dO1fFit5O/0/oq0xe7yUYCejf7Z + OqD+TzEK4lxLr1u8j8lXoQyUXzkKIL0SWEFT4tzOFpWQ2IBs/sT4X2oVA18dPDoZ + H2SGxCUcABfne5zrEDgkUkbnQRihBtTyR7QRiE3GpU19RNVs6yAu+wA/hti8Pg9O + U/5hqifQrhJXiuEoSmmgNb9QfbR3tc0ZhKevz4y+J3vcnka6qlrP1lAirOVm2HA7 + STGRnaEJcTama85MSIzJ6aCx4omCgUIfDmsi9nAZRkmeomERVlIAvcUYxtqprLfu + 6plDs+aeff/MAmHbak7yF+Txj8+8F4k6FcfNBT51oVSZuqFwyLswjGVzWol6aEY7 + akVIrn3OdN2u6VWlU4ZO5+sjP4QYsf5K2oVnzFVIpYvqtO2fGbxq/8dRJQARAQAB + zSVMYXVuY2hwYWQgUFBBIGZvciBDYW5vbmljYWwgS2VybmVsIFJUwsGOBBMBCgA4 + FiEEc4Tsv+pcopCX6lNfLz1Vl/FsjCEFAmAervwCGwMFCwkIBwIGFQoJCAsCBBYC + AwECHgECF4AACgkQLz1Vl/FsjCF9WhAAnwfx9njs1M3rfsMMuhvPxx0WS65HDlq8 + SRgl9K2EHtZIcS7lHmcjiTR5RD1w+4rlKZuE5J3EuMnNX1PdCYLSyMQed+7UAtX6 + TNyuiuVZVxuzJ5iS7L2ZoX05ASgyoh/Loipc+an6HzHqQnNC16ZdrBL4AkkGhDgP + ZbYjM3FbBQkL2T/08NcwTrKuVz8DIxgH7yPAOpBzm91n/pV248eK0a46sKauR2DB + zPKjcc180qmaVWyv9C60roSslvnkZsqe/jYyDFuSsRWqGgE5jNyIb8EY7K7KraPv + 3AkusgCh4fqlBxOvF6FJkiYeZZs5YXvGQ296HTfVhPLOqctSFX2kuUKGIq2Z+H/9 + qfJFGS1iaUsoDEUOaU27lQg5wsYa8EsCm9otroH2P3g7435JYRbeiwlwfHMS9EfK + dwD38d8UzZj7TnxGG4T1aLb3Lj5tNG6DSko69+zqHhuknjkRuAxRAZfHeuRbACgE + nIa7Chit8EGhC2GB12pr5XFWzTvNFdxFhbG+ed7EiGn/v0pVQc0ZfE73FXltg7et + bkoC26o5Ksk1wK2SEs/f8aDZFtG01Ys0ASFICDGW2tusFvDs6LpPUUggMjf41s7j + 4tKotEE1Hzr38EdY+8faRaAS9teQdH5yob5a5Bp5F5wgmpqZom/gjle4JBVaV5dI + N5rcnHzcvXw= + =asqr + -----END PGP PUBLIC KEY BLOCK----- + permissions: "0644" + +# install the snap +snap: + commands: + 00: 'snap install k8s --classic --channel=1.30-moonray/beta' + +runcmd: +# fetch dpdk driver binding script +- su ubuntu -c "git config --global http.proxy http://10.18.2.1:3128" +- su ubuntu -c "git clone https://github.com/DPDK/dpdk.git /home/ubuntu/dpdk" +- apt update +- DEBIAN_FRONTEND=noninteractive apt install -y linux-headers-6.8.1-1004-realtime linux-image-6.8.1-1004-realtime linux-modules-6.8.1-1004-realtime linux-modules-extra-6.8.1-1004-realtime + +# enable kernel options +- update-grub + +# reboot to activate realtime-kernel and kernel options +power_state: + mode: reboot +``` + +Notes: + +* In the above, realtime kernel 6.8 is installed from a private ppa. It was recently backported from 24.04 to 22.04 and is still going through some validation stages. Once it is officially released, it will be installable via the Ubuntu Pro cli. + + +```` +````` + + + + + +## {{product}} setup + +{{product}} is delivered as a [snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/). + +This chapter explains how to set up a dual node {{product}} cluster for testing EPA capabilities. + +### Control plane and worker node + +1. [Install the snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/snap/) from the moonray track. The [beta channel](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/) is used at this point as the end configuration of the k8s snap is not finalised yet. + +``` +sudo snap install k8s --classic --channel=1.30-moonray/beta +``` + +2. Create a file called *configuration.yaml*. In this configuration file we let the snap start with its default CNI (calico), with CoreDNS deployed and we also point k8s to the external etcd. + +``` +cluster-config: + network: + enabled: true + dns: + enabled: true local-storage: + enabled: true +extra-node-kubelet-args: + --reserved-cpus: "0-31" + --cpu-manager-policy: "static" + --topology-manager-policy: "best-effort" +``` + +3. Bootstrap {{product}} using the above configuration file. + +``` +sudo k8s bootstrap --file configuration.yaml +``` + +#### Verify control plane node is up + +After a few seconds you can query the API server with: + +``` +sudo k8s kubectl get all -A +``` + +### Second k8s node as worker + +1. Install the k8s snap on the second node + +``` +sudo snap install k8s --classic --channel=1.30-moonray/beta +``` + +2. On the control plane node generate a join token to be used for joining the + second node + +``` +sudo k8s get-join-token --worker +``` + +3. On the worker node create the configuration.yaml file + +``` +extra-node-kubelet-args: + --reserved-cpus: "0-31" + --cpu-manager-policy: "static" + --topology-manager-policy: "best-effort" +``` + +4. On the worker node use the token to join the cluster + +``` +sudo k8s join-cluster --file configuration.yaml +``` + + +### Verify the two node cluster is ready + +After a few seconds the second worker node will register with the control +plane. You can query the available workers from the first node: + +``` +sudo k8s kubectl get nodes +``` + +The output should list the connected nodes: + +``` +NAME STATUS ROLES AGE VERSION +pc6b-rb4-n1 Ready control-plane,worker 22h v1.30.2 +pc6b-rb4-n3 Ready worker 22h v1.30.2 +``` + +### Multus and SRIOV setup + +Get the thick plugin (in case of resource scarcity we can consider deploying +the thin flavor) + +``` +sudo k8s kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml +``` + +Note: the memory limits for the multus pod spec in the DaemonSet should be +increased (i.e. to 500Mi instead 50Mi) to avoid OOM issues when deploying +multiple workload pods in parallel. + +#### SRIOV Network Device Plugin + +* Create sriov-dp.yaml configMap + +``` +cat <TAbort- SERR- + Kernel driver in use: vfio-pci + Kernel modules: iavf +``` + +Now, let’s create a test pod that will claim a network interface from the DPDK network: + +``` +$ cat < + +[MAAS]: \ No newline at end of file From 34347b2db6d07d9dd1cb2cd08c6dade41bd1c037 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 9 Sep 2024 12:23:57 +0100 Subject: [PATCH 02/45] add to index --- docs/src/snap/howto/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/src/snap/howto/index.md b/docs/src/snap/howto/index.md index 3d78213aa..c7bf9da3b 100644 --- a/docs/src/snap/howto/index.md +++ b/docs/src/snap/howto/index.md @@ -20,6 +20,7 @@ networking/dualstack storage/index external-datastore proxy +epa backup-restore refresh-certs restore-quorum From 2db591946c576e23f6c216c4e7b33d0e54219b19 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 9 Sep 2024 14:12:45 +0100 Subject: [PATCH 03/45] add maas link --- docs/src/snap/howto/epa.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 3e79486e6..19b9fb9fe 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -1002,7 +1002,7 @@ $ sudo k8s kubectl describe pod sriov-test-pod -# References +## References * [How to enable Real-time Ubuntu](https://canonical-ubuntu-pro-client.readthedocs-hosted.com/en/latest/howtoguides/enable\_realtime\_kernel/\#how-to-enable-real-time-ubuntu) * [Manage HugePages](https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/) @@ -1013,4 +1013,4 @@ $ sudo k8s kubectl describe pod sriov-test-pod -[MAAS]: \ No newline at end of file +[MAAS]: maas.io \ No newline at end of file From b19c2226a18db17c97dc0ae0739f5fd9d32b97e9 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 9 Sep 2024 14:23:07 +0100 Subject: [PATCH 04/45] code update --- docs/src/snap/howto/epa.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 19b9fb9fe..66578224b 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -966,13 +966,15 @@ EOF ``` Finally, describe the pod to confirm the DPDK network assignment and also the -virtual function PCI ID (in this case, 0000:98:1f.2) that was assigned -automatically to the net1 interface: +virtual function PCI ID (in this case, `0000:98:1f.2`) that was assigned +automatically to the `net1` interface: ``` -$ sudo k8s kubectl describe pod sriov-test-pod +sudo k8s kubectl describe pod sriov-test-pod +``` -### Expected Output ### + +``` ... k8s.v1.cni.cncf.io/network-status: [{ @@ -1002,7 +1004,7 @@ $ sudo k8s kubectl describe pod sriov-test-pod -## References +## Further reading * [How to enable Real-time Ubuntu](https://canonical-ubuntu-pro-client.readthedocs-hosted.com/en/latest/howtoguides/enable\_realtime\_kernel/\#how-to-enable-real-time-ubuntu) * [Manage HugePages](https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/) From 50c59745761c1309895f1fb627efeacb732cf529 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Tue, 10 Sep 2024 11:46:21 +0100 Subject: [PATCH 05/45] testing format --- docs/src/snap/howto/epa.md | 51 +++++++++++++++++++++++++++++--------- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 66578224b..5ae483a34 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -597,20 +597,27 @@ sudo k8s kubectl create -f ./dpdk-nad.yaml ## Testing -## Testing HugePages in {{product}} +It is important to verify that all of these enabled features are working as +expected before relying on them. This section deals with how to verify +everything is working as expected. -Verify that HugePages are allocated on your Kubernetes nodes. You can do this by checking the node's capacity and allocatable resources: +### Testing HugePages +Verify that HugePages are allocated on your Kubernetes nodes. You can do this +by checking the node's capacity and allocatable resources: + +``` +sudo k8s kubectl get nodes ``` -$ sudo k8s kubectl get nodes -### Expected Output ### +``` +``` NAME STATUS ROLES AGE VERSION pc6b-rb4-n1 Ready control-plane,worker 22h v1.30.2 pc6b-rb4-n3 Ready worker 22h v1.30.2 -$ sudo k8s kubectl describe node pc6b-rb4-n3 | grep -E 'hugepages' +``` -### Expected Output ### +``` hugepages-1Gi: 1000Gi hugepages-2Mi: 0 hugepages-1Gi: 1000Gi @@ -871,7 +878,7 @@ Based on the output, the sleep infinity process (PID 1\) is indeed being pinned to specific CPU cores (0 and 32). This indicates that the CPU pinning is working correctly. -## Testing SR-IOV & DPDK in {{product}} +### Testing SR-IOV & DPDK First check if SR-IOV Device Plugin pod is running and healthy in the cluster, if SR-IOV is allocatable in the worker node and the PCI IDs of the VFs @@ -879,14 +886,24 @@ available in the node (describing one of them to get further details): ``` sudo k8s kubectl get pods -n kube-system | grep sriov-device-plugin +``` -### Expected Output ### +This should indicate some running pods: + +``` kube-sriov-device-plugin-7mxz5 1/1 Running 0 7m31s kube-sriov-device-plugin-fjzgt 1/1 Running 0 7m31s +``` +Now check the VFs: + +``` sudo k8s kubectl describe node pc6b-rb4-n3 +``` -### Expected Output ### +This should indicate the presence of the SRIOV device: + +``` ... Allocatable: cpu: 96 @@ -898,8 +915,15 @@ Allocatable: memory: 1064530020Ki pods: 110 .... +``` + +The virtual functions should also appear on th +``` lspci | grep Virtual +``` + +``` 98:11.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) 98:11.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) 98:11.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) @@ -907,8 +931,10 @@ lspci | grep Virtual 99:00.5 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) 99:00.6 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) 99:00.7 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) +``` -$ lspci -s 98:1f.2 -vv +``` +lspci -s 98:1f.2 -vv 98:1f.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) Subsystem: Intel Corporation Ethernet Adaptive Virtual Function Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- @@ -922,10 +948,10 @@ $ lspci -s 98:1f.2 -vv Kernel modules: iavf ``` -Now, let’s create a test pod that will claim a network interface from the DPDK network: +Now, create a test pod that will claim a network interface from the DPDK network: ``` -$ cat < Date: Tue, 10 Sep 2024 15:04:22 +0100 Subject: [PATCH 06/45] linting --- docs/src/snap/howto/epa.md | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 5ae483a34..ada939c9b 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -2,7 +2,11 @@ This section explains how to set up the EPA features in a {{product}} cluster. -The content starts with the setup of the environment (including steps for using [MAAS][]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK networking components. Finally, the steps needed to test every EPA feature: HugePages, Real-time Kernel, CPU Pinning / Numa Topology Awareness and SR-IOV/DPDK. +The content starts with the setup of the environment (including steps for using +[MAAS][MAAS]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK +networking components. Finally, the steps needed to test every EPA feature: +HugePages, Real-time Kernel, CPU Pinning / Numa Topology Awareness and +SR-IOV/DPDK. ## What you'll need @@ -386,19 +390,28 @@ Notes: ## {{product}} setup -{{product}} is delivered as a [snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/). +{{product}} is delivered as a +[snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/). -This chapter explains how to set up a dual node {{product}} cluster for testing EPA capabilities. +This section explains how to set up a dual node {{product}} cluster for testing +EPA capabilities. ### Control plane and worker node -1. [Install the snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/snap/) from the moonray track. The [beta channel](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/) is used at this point as the end configuration of the k8s snap is not finalised yet. +1. [Install the + snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/snap/) + from the moonray track. The [beta + channel](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/) + is used at this point as the end configuration of the k8s snap is not + finalised yet. ``` sudo snap install k8s --classic --channel=1.30-moonray/beta ``` -2. Create a file called *configuration.yaml*. In this configuration file we let the snap start with its default CNI (calico), with CoreDNS deployed and we also point k8s to the external etcd. +2. Create a file called *configuration.yaml*. In this configuration file we let + the snap start with its default CNI (calico), with CoreDNS deployed and we + also point k8s to the external etcd. ``` cluster-config: @@ -617,6 +630,7 @@ pc6b-rb4-n1 Ready control-plane,worker 22h v1.30.2 pc6b-rb4-n3 Ready worker 22h v1.30.2 ``` + ``` hugepages-1Gi: 1000Gi hugepages-2Mi: 0 @@ -703,10 +717,10 @@ The output of cyclictest will provide statistics like: Create a pod that will run cyclictest tool with specific options: -* -l 1000000: Sets the number of test iterations to 1 million. -* -m: Measures the maximum latency. -* -p 80: Sets the real-time scheduling priority to 80 (a high priority, typically used for real-time tasks). -* -t 1: Specifies CPU core 1 to be used for the test. +- `-l 1000000`: Sets the number of test iterations to 1 million. +- `-m`: Measures the maximum latency. +- `-p 80`: Sets the real-time scheduling priority to 80 (a high priority, typically used for real-time tasks). +- `-t 1`: Specifies CPU core 1 to be used for the test. ``` cat < Date: Tue, 10 Sep 2024 15:07:16 +0100 Subject: [PATCH 07/45] format fixes --- docs/src/snap/howto/epa.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index ada939c9b..82d02544a 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -744,7 +744,9 @@ Get the pod logs to verify that the test is running: ``` sudo k8s kubectl logs realtime-kernel-test -f -### Expected Output ### +``` + +``` ... # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 7.92 8.34 9.32 1/3698 2965 @@ -754,15 +756,15 @@ T: 0 ( 2965) P:80 I:1000 C: 241486 Min: 3 Act: 4 Avg: 3 Max: 18 ```{dropdown} Explanation of output -* /dev/cpu\_dma\_latency set to 0us: This line indicates that the CPU DMA (Direct Memory Access) latency has been set to 0 microseconds. This setting is relevant for real-time systems as it controls how long a device can hold the CPU bus during a DMA transfer. -* policy: fifo: This means the scheduling policy for the cyclictest thread is set to FIFO (First In, First Out). In FIFO scheduling, the highest priority task that is ready to run gets the CPU first and continues running until it is blocked or voluntarily yields the CPU. -* loadavg: 7.92 8.34 9.32 1/3698 2965: This shows the load average of your system over the last 1, 5, and 15 minutes. The numbers are quite high, indicating that your system is under significant load. This can potentially affect the latency measurements. -* T: 0 ( 2965\) P:80 I:1000 C: 241486: - * T: 0: The number of the CPU core the test was run on (CPU 0 in this case). - * (2965): The PID (Process ID) of the cyclictest process. - * P:80: The priority of the cyclictest thread. - * I:1000: The number of iterations (loops) the test ran for (1000 in this case). - * C: 241486: The number of cycles per second that the test has aimed for. +- `/dev/cpu_dma\_latency set to 0us`: This line indicates that the CPU DMA (Direct Memory Access) latency has been set to 0 microseconds. This setting is relevant for real-time systems as it controls how long a device can hold the CPU bus during a DMA transfer. +- `policy: fifo`: This means the scheduling policy for the cyclictest thread is set to FIFO (First In, First Out). In FIFO scheduling, the highest priority task that is ready to run gets the CPU first and continues running until it is blocked or voluntarily yields the CPU. +- `loadavg: 7.92 8.34 9.32 1/3698 2965:` This shows the load average of your system over the last 1, 5, and 15 minutes. The numbers are quite high, indicating that your system is under significant load. This can potentially affect the latency measurements. +- `T: 0 ( 2965) P:80 I:1000 C: 241486`: + - T: 0: The number of the CPU core the test was run on (CPU 0 in this case). + - (2965): The PID (Process ID) of the cyclictest process. + - P:80: The priority of the cyclictest thread. + - I:1000: The number of iterations (loops) the test ran for (1000 in this case). + - C: 241486: The number of cycles per second that the test has aimed for. * Min: 3 Act: 4 Avg: 3 Max: 18: These are the key latency statistics in microseconds (us): * Min: The minimum latency observed during the test (3 us). * Act: The actual average latency (4 us). From ad0fd417f631a4326ec44d8c1dccaaabb3b67106 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Tue, 10 Sep 2024 18:59:01 +0100 Subject: [PATCH 08/45] Update docs/src/snap/howto/epa.md Co-authored-by: Louise K. Schmidtgen --- docs/src/snap/howto/epa.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 82d02544a..df14dc792 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -11,7 +11,7 @@ SR-IOV/DPDK. ## What you'll need - An Ubuntu Pro subscription (required for real-time kernel) -- Ubuntu instances **or** a MAAS environment to run {product} on +- Ubuntu instances **or** a MAAS environment to run {{product}} on ## Prepare the Environment From 69676269cf682adf96953ebd23e9495c36aa190f Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Tue, 10 Sep 2024 19:51:37 +0100 Subject: [PATCH 09/45] Update docs/src/snap/howto/epa.md Co-authored-by: Louise K. Schmidtgen --- docs/src/snap/howto/epa.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index df14dc792..64265c08d 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -205,7 +205,7 @@ virtual functions. ``` -* Now let’s enable DPDK, first by cloning the DPDK repo, and then placing the script that will bind the VFs to the VFIO-PCI driver in the location that will run automatically each time the system boots up, so the VFIO (Virtual Function I/O) bindings are applied consistently: +Now let’s enable DPDK, first by cloning the DPDK repo, and then placing the script that will bind the VFs to the VFIO-PCI driver in the location that will run automatically each time the system boots up, so the VFIO (Virtual Function I/O) bindings are applied consistently: ``` git clone https://github.com/DPDK/dpdk.git /home/ubuntu/dpdk From 936bd09bbd8cfe54e60fa5794b0ecec3d601ad59 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Tue, 10 Sep 2024 19:52:02 +0100 Subject: [PATCH 10/45] Update docs/src/snap/howto/epa.md Co-authored-by: Louise K. Schmidtgen --- docs/src/snap/howto/epa.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 64265c08d..363c3789e 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -1,6 +1,6 @@ # How to set up Enhanced Platform Awareness -This section explains how to set up the EPA features in a {{product}} cluster. +This section explains how to set up the Enhanced Platform Awareness (EPA) features in a {{product}} cluster. The content starts with the setup of the environment (including steps for using [MAAS][MAAS]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK From b21af46dda1b4f0ceaa7ca5763acb99f54345f1c Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Fri, 13 Sep 2024 14:43:03 +0100 Subject: [PATCH 11/45] review fixes --- docs/src/snap/howto/epa.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 363c3789e..0e799877d 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -611,7 +611,7 @@ sudo k8s kubectl create -f ./dpdk-nad.yaml ## Testing It is important to verify that all of these enabled features are working as -expected before relying on them. This section deals with how to verify +expected before relying on them. This section confirms that everything is working as expected. ### Testing HugePages @@ -623,7 +623,8 @@ by checking the node's capacity and allocatable resources: sudo k8s kubectl get nodes ``` -``` +This should return the available nodes + ``` NAME STATUS ROLES AGE VERSION pc6b-rb4-n1 Ready control-plane,worker 22h v1.30.2 @@ -640,9 +641,9 @@ pc6b-rb4-n3 Ready worker 22h v1.30.2 hugepages-2Mi 0 (0%) 0 (0%) ``` -So we have 1000 huge pages of 1Gi size each and we have a worker node labelled -properly. Then you can create a Pod that explicitly requests one 1G Huge Page -in its resource limits: +So this example has 1000 huge pages of 1Gi size each and we have a worker node +labelled properly. Then you can create a Pod that explicitly requests one 1G +Huge Page in its resource limits: ``` cat < Date: Fri, 13 Sep 2024 15:40:34 +0100 Subject: [PATCH 12/45] formatting/comment fixes --- docs/src/snap/howto/epa.md | 104 +++++++++++++++++++++---------------- 1 file changed, 60 insertions(+), 44 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 0e799877d..1a69b9299 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -49,6 +49,8 @@ membind: 0 1 ### Enable the real-time kernel +The real-time kernel enablement requires an ubuntu pro subscription and some additional tools to be available. + ``` sudo pro attach sudo apt update && sudo apt install ubuntu-advantage-tools @@ -135,6 +137,7 @@ Once the reboot has taken place, ensure the HugePages configuration has been app ``` grep HugePages /proc/meminfo ``` + This should generate output indicating the number of pages allocated ``` @@ -148,19 +151,20 @@ HugePages_Surp: 0 Next, create a configuration file to configure the network interface to use SR-IOV (so it can create virtual functions afterwards) using Netplan. In the example below the file is created first, then the configuration is -applied and then the 128 virtual functions are available for use in the environment: +applied, making 128 virtual functions available for use in the environment: ``` cat < /etc/netplan/99-sriov_vfs.yaml - network: - ethernets: - enp152s0f1: - virtual-function-count: 128 + network: + ethernets: + enp152s0f1: + virtual-function-count: 128 EOF sudo chmod 0600 /etc/netplan/99-sriov_vfs.yaml sudo netplan apply ip link show enp152s0f1 ``` + The output of the last command should indicate the device is working and has generated the expected virtual functions. @@ -175,6 +179,7 @@ virtual functions. vf 126 link/ether 66:fe:ad:f2:d3:05 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off vf 127 link/ether ca:20:00:c6:83:dd brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off ``` + ```{dropdown} Explanation of steps * Breakdown of the content of the file /etc/netplan/99-sriov\_vfs.yaml : * path: /etc/netplan/99-sriov\_vfs.yaml: This specifies the location of the configuration file. The "99" prefix in the filename usually indicates that it will be processed last, potentially overriding other configurations. @@ -205,7 +210,10 @@ virtual functions. ``` -Now let’s enable DPDK, first by cloning the DPDK repo, and then placing the script that will bind the VFs to the VFIO-PCI driver in the location that will run automatically each time the system boots up, so the VFIO (Virtual Function I/O) bindings are applied consistently: +Now enable DPDK, first by cloning the DPDK repo, and then placing the script which +will bind the VFs to the VFIO-PCI driver in the location that will run +automatically each time the system boots up, so the VFIO +(Virtual Function I/O) bindings are applied consistently: ``` git clone https://github.com/DPDK/dpdk.git /home/ubuntu/dpdk @@ -218,6 +226,7 @@ cat < /var/lib/cloud/scripts/per-boot/dpdk_bind.sh fi sudo chmod 0755 /var/lib/cloud/scripts/per-boot/dpdk_bind.sh ``` + ```{dropdown} Explanation * Load VFIO Module (modprobe vfio-pci): If the DPDK directory exists, the script loads the VFIO-PCI kernel module. This module is necessary for the VFIO driver to function. * The script uses the dpdk-devbind.py tool (included with DPDK) to list the available network devices and their drivers. @@ -262,17 +271,19 @@ Network devices using DPDK-compatible driver With these preparation steps we have enabled the features of EPA: -* NUMA and CPU Pinning are available to the first 32 CPUs -* Real-Time Kernel is enabled -* HugePages are enabled and 1000 1G huge pages are available -* SRIOV is enabled in the enp152s0f1 interface, with 128 virtual function interfaces bound to the vfio-pci driver (that could also use the iavf driver) -* DPDK is enabled in all the 128 virtual function interfaces +- NUMA and CPU Pinning are available to the first 32 CPUs +- Real-Time Kernel is enabled +- HugePages are enabled and 1000 1G huge pages are available +- SRIOV is enabled in the enp152s0f1 interface, with 128 virtual + function interfaces bound to the vfio-pci driver (that could also use the iavf driver) +- DPDK is enabled in all the 128 virtual function interfaces ```` ````{group-tab} MAAS -To prepare a machine for CPU isolation, Hugepages, real-time kernel, SRIOV and DPDK we leverage cloud-init through MAAS. +To prepare a machine for CPU isolation, Hugepages, real-time kernel, +SRIOV and DPDK we leverage cloud-init through MAAS. ``` #cloud-config @@ -400,7 +411,7 @@ EPA capabilities. 1. [Install the snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/snap/) - from the moonray track. The [beta + from the relevant track, currently `{{track}}`. The [beta channel](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/) is used at this point as the end configuration of the k8s snap is not finalised yet. @@ -413,12 +424,13 @@ sudo snap install k8s --classic --channel=1.30-moonray/beta the snap start with its default CNI (calico), with CoreDNS deployed and we also point k8s to the external etcd. -``` +```yaml cluster-config: network: enabled: true dns: - enabled: true local-storage: + enabled: true + local-storage: enabled: true extra-node-kubelet-args: --reserved-cpus: "0-31" @@ -497,45 +509,47 @@ the thin flavor) sudo k8s kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml ``` -Note: the memory limits for the multus pod spec in the DaemonSet should be +```{note} +The memory limits for the multus pod spec in the DaemonSet should be increased (i.e. to 500Mi instead 50Mi) to avoid OOM issues when deploying multiple workload pods in parallel. +``` #### SRIOV Network Device Plugin -* Create sriov-dp.yaml configMap +Create sriov-dp.yaml configMap: ``` cat < Date: Fri, 13 Sep 2024 15:53:59 +0100 Subject: [PATCH 13/45] fix headings --- docs/src/snap/howto/epa.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 1a69b9299..72f3c0bbf 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -630,7 +630,7 @@ It is important to verify that all of these enabled features are working as expected before relying on them. This section confirms that everything is working as expected. -### Testing HugePages +### Test HugePages Verify that HugePages are allocated on your Kubernetes nodes. You can do this by checking the node's capacity and allocatable resources: @@ -708,7 +708,7 @@ The output should reflect the HugePage request: ``` -## Test the real-time kernel +### Test the real-time kernel First, verify that real-time kernel is enabled in the worker node by checking if “PREEMPT RT” appears after running the `uname -a` command: @@ -796,7 +796,7 @@ T: 0 ( 2965) P:80 I:1000 C: 241486 Min: 3 Act: 4 Avg: 3 Max: 18 ``` -## Testing CPU Pinning and NUMA Topology Awareness in {{product}} +### Test CPU Pinning and NUMA First check if CPU Manager and NUMA Topology Manager is set up in the worker node: @@ -835,9 +835,13 @@ Now let’s label the node with information about the available CPU/NUMA nodes, ``` sudo k8s kubectl label node pc6b-rb4-n3 topology.kubernetes.io/zone=NUMA -### Expected Output ### +``` + +``` node/pc6b-rb4-n3 labeled +``` +``` $ cat < + +``` sudo k8s kubectl exec -ti cpu-pinning-test -- /bin/bash root@cpu-pinning-test:/# ps -ef UID PID PPID C STIME TTY TIME CMD @@ -914,7 +922,7 @@ Based on the output, the sleep infinity process (PID 1\) is indeed being pinned to specific CPU cores (0 and 32). This indicates that the CPU pinning is working correctly. -### Testing SR-IOV & DPDK +### Test SR-IOV & DPDK First check if SR-IOV Device Plugin pod is running and healthy in the cluster, if SR-IOV is allocatable in the worker node and the PCI IDs of the VFs @@ -1066,7 +1074,6 @@ sudo k8s kubectl describe pod sriov-test-pod - ## Further reading * [How to enable Real-time Ubuntu](https://canonical-ubuntu-pro-client.readthedocs-hosted.com/en/latest/howtoguides/enable\_realtime\_kernel/\#how-to-enable-real-time-ubuntu) @@ -1078,4 +1085,4 @@ sudo k8s kubectl describe pod sriov-test-pod -[MAAS]: maas.io \ No newline at end of file +[MAAS]: https://maas.io \ No newline at end of file From c6539cb3a386d24787f0aac2b1769d281bcb73df Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Fri, 13 Sep 2024 18:23:28 +0100 Subject: [PATCH 14/45] formatting --- docs/src/snap/howto/epa.md | 40 +++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 72f3c0bbf..945c309fd 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -452,7 +452,7 @@ After a few seconds you can query the API server with: sudo k8s kubectl get all -A ``` -### Second k8s node as worker +### Add second k8s node as worker 1. Install the k8s snap on the second node @@ -483,7 +483,7 @@ sudo k8s join-cluster --file configuration.yaml Date: Fri, 13 Sep 2024 18:29:19 +0100 Subject: [PATCH 15/45] formatting --- docs/src/snap/howto/epa.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 945c309fd..b9db9de2a 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -808,20 +808,22 @@ ps -ef | grep /snap/k8s/678/bin/kubelet root 9139 1 1 Jul17 ? 00:20:03 /snap/k8s/678/bin/kubelet --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/client-ca.crt --cluster-dns=10.152.183.97 --cluster-domain=cluster.local --container-runtime-endpoint=/var/snap/k8s/common/run/containerd.sock --containerd=/var/snap/k8s/common/run/containerd.sock --cpu-manager-policy=static --eviction-hard=memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi --fail-swap-on=false --kubeconfig=/etc/kubernetes/kubelet.conf --node-ip=10.18.2.153 --node-labels=node-role.kubernetes.io/worker=,k8sd.io/role=worker --read-only-port=0 --register-with-taints= --reserved-cpus=0-31 --root-dir=/var/lib/kubelet --serialize-image-pulls=false --tls-cert-file=/etc/kubernetes/pki/kubelet.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-private-key-file=/etc/kubernetes/pki/kubelet.key --topology-manager-policy=best-effort ``` -Breakdown: +```{dropdown} Explanation of output * \--cpu-manager-policy=static : This flag within the Kubelet command line arguments explicitly tells us that the CPU Manager is active and using the static policy. Here's what this means: * CPU Manager: This is a component of Kubelet that manages how CPU resources are allocated to pods running on a node. * Static Policy: This policy is designed to provide stricter control over CPU allocation. With the static policy, you can request integer CPUs for your containers (e.g., 1, 2, etc.), and {{product}} will try to assign them to dedicated CPU cores on the node, providing a greater degree of isolation and predictability. * \--reserved-cpus=0-31: This line indicates that no CPUs are reserved for the Kubelet or system processes. This implies that all CPUs might be available for pod scheduling, depending on the cluster's overall resource allocation strategy. * \--topology-manager-policy=best-effort: This flag sets the topology manager policy to "best-effort." The topology manager helps optimise pod placement on nodes by considering factors like NUMA nodes, CPU cores, and devices. The "best-effort" policy tries to place pods optimally, but it doesn't enforce strict requirements. +``` You can also confirm the total number of NUMA CPUs available in the worker node: ``` lscpu +``` -### Expected Output ### +``` .... NUMA: NUMA node(s): 2 From 08814fb16988e7710f9a4f320ac67ba05455089a Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Fri, 13 Sep 2024 18:55:53 +0100 Subject: [PATCH 16/45] add version and track --- docs/tools/reuse/substitutions.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/tools/reuse/substitutions.yaml b/docs/tools/reuse/substitutions.yaml index 41fd9e08e..c88e1f482 100644 --- a/docs/tools/reuse/substitutions.yaml +++ b/docs/tools/reuse/substitutions.yaml @@ -1,4 +1,6 @@ product: 'Canonical Kubernetes' +version: '1.31' +track: '1.31/edge' multi_line_example: |- *Multi-line* text that uses basic **markup**. From 5b725d4807c0216a09e1e9576ad095dd16520fe5 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Sun, 15 Sep 2024 18:27:28 +0100 Subject: [PATCH 17/45] fix moonray refs --- docs/src/_parts/install.md | 3 + docs/src/snap/howto/epa.md | 97 +++++++++++++++-------------- docs/tools/reuse/substitutions.yaml | 2 +- 3 files changed, 55 insertions(+), 47 deletions(-) create mode 100644 docs/src/_parts/install.md diff --git a/docs/src/_parts/install.md b/docs/src/_parts/install.md new file mode 100644 index 000000000..be6b47317 --- /dev/null +++ b/docs/src/_parts/install.md @@ -0,0 +1,3 @@ +``` +sudo snap install k8s --classic --channel=1.31/edge +``` \ No newline at end of file diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index b9db9de2a..62f47d4e1 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -370,7 +370,7 @@ write_files: # install the snap snap: commands: - 00: 'snap install k8s --classic --channel=1.30-moonray/beta' + 00: 'snap install k8s --classic --channel=1.31/beta' runcmd: # fetch dpdk driver binding script @@ -411,40 +411,43 @@ EPA capabilities. 1. [Install the snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/snap/) - from the relevant track, currently `{{track}}`. The [beta - channel](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/) - is used at this point as the end configuration of the k8s snap is not - finalised yet. + from the relevant [channel][channel]. + ```{note} + A pre-release channel is required currently until there is a finalised release of {{product}}. + ``` -``` -sudo snap install k8s --classic --channel=1.30-moonray/beta -``` + For example: + + + ```{include} ../../_parts/install.md + ``` 2. Create a file called *configuration.yaml*. In this configuration file we let the snap start with its default CNI (calico), with CoreDNS deployed and we also point k8s to the external etcd. -```yaml -cluster-config: - network: - enabled: true - dns: - enabled: true - local-storage: - enabled: true -extra-node-kubelet-args: - --reserved-cpus: "0-31" - --cpu-manager-policy: "static" - --topology-manager-policy: "best-effort" -``` + ```yaml + cluster-config: + network: + enabled: true + dns: + enabled: true + local-storage: + enabled: true + extra-node-kubelet-args: + --reserved-cpus: "0-31" + --cpu-manager-policy: "static" + --topology-manager-policy: "best-effort" + ``` 3. Bootstrap {{product}} using the above configuration file. -``` -sudo k8s bootstrap --file configuration.yaml -``` + ``` + sudo k8s bootstrap --file configuration.yaml + ``` -#### Verify control plane node is up +#### Verify the control plane node is running After a few seconds you can query the API server with: @@ -452,35 +455,34 @@ After a few seconds you can query the API server with: sudo k8s kubectl get all -A ``` -### Add second k8s node as worker +### Add a second k8s node as a worker 1. Install the k8s snap on the second node -``` -sudo snap install k8s --classic --channel=1.30-moonray/beta -``` + ```{include} ../../_parts/install.md + ``` 2. On the control plane node generate a join token to be used for joining the second node -``` -sudo k8s get-join-token --worker -``` + ``` + sudo k8s get-join-token --worker + ``` 3. On the worker node create the configuration.yaml file -``` -extra-node-kubelet-args: - --reserved-cpus: "0-31" - --cpu-manager-policy: "static" - --topology-manager-policy: "best-effort" -``` + ``` + extra-node-kubelet-args: + --reserved-cpus: "0-31" + --cpu-manager-policy: "static" + --topology-manager-policy: "best-effort" + ``` 4. On the worker node use the token to join the cluster -``` -sudo k8s join-cluster --file configuration.yaml -``` + ``` + sudo k8s join-cluster --file configuration.yaml + ``` #### Verify the two node cluster is ready @@ -496,8 +498,8 @@ The output should list the connected nodes: ``` NAME STATUS ROLES AGE VERSION -pc6b-rb4-n1 Ready control-plane,worker 22h v1.30.2 -pc6b-rb4-n3 Ready worker 22h v1.30.2 +pc6b-rb4-n1 Ready control-plane,worker 22h v1.31.0 +pc6b-rb4-n3 Ready worker 22h v1.31.0 ``` ### Multus and SRIOV setup @@ -844,7 +846,7 @@ node/pc6b-rb4-n3 labeled ``` ``` -$ cat < -[MAAS]: https://maas.io \ No newline at end of file +[MAAS]: https://maas.io +[channel]: https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/ \ No newline at end of file diff --git a/docs/tools/reuse/substitutions.yaml b/docs/tools/reuse/substitutions.yaml index c88e1f482..51923e7f0 100644 --- a/docs/tools/reuse/substitutions.yaml +++ b/docs/tools/reuse/substitutions.yaml @@ -1,6 +1,6 @@ product: 'Canonical Kubernetes' version: '1.31' -track: '1.31/edge' +channel: '1.31/edge' multi_line_example: |- *Multi-line* text that uses basic **markup**. From 7ec4e8af248f0358d9464042c298c3df80f44413 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Sun, 15 Sep 2024 18:43:12 +0100 Subject: [PATCH 18/45] Update docs/src/snap/howto/epa.md Co-authored-by: Louise K. Schmidtgen --- docs/src/snap/howto/epa.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 62f47d4e1..ea9d2d417 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -928,7 +928,7 @@ working correctly. ### Test SR-IOV & DPDK -First check if SR-IOV Device Plugin pod is running and healthy in the cluster, +Ensure that the SR-IOV Device Plugin pod is running and healthy in the cluster, if SR-IOV is allocatable in the worker node and the PCI IDs of the VFs available in the node (describing one of them to get further details): From 631911341586eb587284787923389daf32ef6dc1 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Sun, 15 Sep 2024 18:44:40 +0100 Subject: [PATCH 19/45] fix linter whinges --- docs/src/snap/howto/epa.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index ea9d2d417..f6998e6b8 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -1,6 +1,7 @@ # How to set up Enhanced Platform Awareness -This section explains how to set up the Enhanced Platform Awareness (EPA) features in a {{product}} cluster. +This section explains how to set up the Enhanced Platform Awareness (EPA) +features in a {{product}} cluster. The content starts with the setup of the environment (including steps for using [MAAS][MAAS]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK @@ -741,7 +742,8 @@ Create a pod that will run the cyclictest tool with specific options: - `-l 1000000`: Sets the number of test iterations to 1 million. - `-m`: Measures the maximum latency. -- `-p 80`: Sets the real-time scheduling priority to 80 (a high priority, typically used for real-time tasks). +- `-p 80`: Sets the real-time scheduling priority to 80 (a high priority, + typically used for real-time tasks). - `-t 1`: Specifies CPU core 1 to be used for the test. ``` @@ -800,7 +802,8 @@ T: 0 ( 2965) P:80 I:1000 C: 241486 Min: 3 Act: 4 Avg: 3 Max: 18 ### Test CPU Pinning and NUMA -First check if CPU Manager and NUMA Topology Manager is set up in the worker node: +First check if CPU Manager and NUMA Topology Manager is set up in the worker +node: ``` ps -ef | grep /snap/k8s/678/bin/kubelet @@ -834,7 +837,8 @@ NUMA: ... ``` -Now let’s label the node with information about the available CPU/NUMA nodes, and then create a pod selecting that label: +Now let’s label the node with information about the available CPU/NUMA nodes, +and then create a pod selecting that label: ``` sudo k8s kubectl label node pc6b-rb4-n3 topology.kubernetes.io/zone=NUMA @@ -996,7 +1000,8 @@ lspci -s 98:1f.2 -vv Kernel modules: iavf ``` -Now, create a test pod that will claim a network interface from the DPDK network: +Now, create a test pod that will claim a network interface from the DPDK +network: ``` cat < Date: Sun, 15 Sep 2024 18:47:04 +0100 Subject: [PATCH 20/45] fix note --- docs/src/snap/howto/epa.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index f6998e6b8..e74ae2ef8 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -388,11 +388,13 @@ power_state: mode: reboot ``` -Notes: - -* In the above, realtime kernel 6.8 is installed from a private ppa. It was recently backported from 24.04 to 22.04 and is still going through some validation stages. Once it is officially released, it will be installable via the Ubuntu Pro cli. - +```{note} +In the above file, the realtime kernel 6.8 is installed from a private PPA. +It was recently backported from 24.04 to 22.04 and is still going through +some validation stages. Once it is officially released, it will be +installable via the Ubuntu Pro cli. +``` ```` ````` From c20232964d71a2a16ef4eed78854f45b93c625c4 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Sun, 15 Sep 2024 18:56:49 +0100 Subject: [PATCH 21/45] snap reference --- docs/src/snap/howto/epa.md | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index e74ae2ef8..888752564 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -398,23 +398,17 @@ installable via the Ubuntu Pro cli. ```` ````` - - - - ## {{product}} setup -{{product}} is delivered as a -[snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/). +{{product}} is delivered as a [snap][]. This section explains how to set up a dual node {{product}} cluster for testing EPA capabilities. ### Control plane and worker node -1. [Install the - snap](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/snap/) - from the relevant [channel][channel]. +1. [Install the snap][install-link] from the relevant [channel][channel]. + ```{note} A pre-release channel is required currently until there is a finalised release of {{product}}. ``` @@ -1099,4 +1093,6 @@ the correct PCI address: [MAAS]: https://maas.io -[channel]: https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/ \ No newline at end of file +[channel]: https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/ +[install-link]: /snap/howto/install/snap +[snap]: https://snapcraft.io/docs \ No newline at end of file From da1a0ab1d253cbfce24af73989760b56139079f2 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 16 Sep 2024 00:11:39 +0100 Subject: [PATCH 22/45] language and format fixes --- docs/src/snap/howto/epa.md | 47 ++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 888752564..0fef39fdc 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -501,8 +501,8 @@ pc6b-rb4-n3 Ready worker 22h v1.31.0 ### Multus and SRIOV setup -Get the thick plugin (in case of resource scarcity we can consider deploying -the thin flavor) +Apply the 'thick' Multus plugin (in case of resource scarcity we can consider +deploying the thin flavour) ``` sudo k8s kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml @@ -710,7 +710,7 @@ The output should reflect the HugePage request: ### Test the real-time kernel First, verify that real-time kernel is enabled in the worker node by checking -if “PREEMPT RT” appears after running the `uname -a` command: +the output from the `uname -a` command: ``` uname -a @@ -722,7 +722,7 @@ The output should show the “PREEMPT RT” identifier: Linux pc6b-rb4-n3 6.8.1-1004-realtime #4~22.04.1-Ubuntu SMP PREEMPT_RT Mon Jun 24 16:45:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux ``` -The test will use cyclictest, commonly used to assess the real-time performance +The test will use [cyclictest][], commonly used to assess the real-time performance of a system, especially when running a real-time kernel. It measures the time it takes for a thread to cycle between high and low priority states, giving you an indication of the system's responsiveness to real-time events. Lower @@ -811,19 +811,22 @@ root 9139 1 1 Jul17 ? 00:20:03 /snap/k8s/678/bin/kubelet -- ```{dropdown} Explanation of output -* \--cpu-manager-policy=static : This flag within the Kubelet command line arguments explicitly tells us that the CPU Manager is active and using the static policy. Here's what this means: - * CPU Manager: This is a component of Kubelet that manages how CPU resources are allocated to pods running on a node. - * Static Policy: This policy is designed to provide stricter control over CPU allocation. With the static policy, you can request integer CPUs for your containers (e.g., 1, 2, etc.), and {{product}} will try to assign them to dedicated CPU cores on the node, providing a greater degree of isolation and predictability. -* \--reserved-cpus=0-31: This line indicates that no CPUs are reserved for the Kubelet or system processes. This implies that all CPUs might be available for pod scheduling, depending on the cluster's overall resource allocation strategy. -* \--topology-manager-policy=best-effort: This flag sets the topology manager policy to "best-effort." The topology manager helps optimise pod placement on nodes by considering factors like NUMA nodes, CPU cores, and devices. The "best-effort" policy tries to place pods optimally, but it doesn't enforce strict requirements. + - `--cpu-manager-policy=static` : This flag within the Kubelet command line arguments explicitly tells us that the CPU Manager is active and using the static policy. Here's what this means: + - `CPU Manager`: This is a component of Kubelet that manages how CPU resources are allocated to pods running on a node. + - `Static Policy`: This policy is designed to provide stricter control over CPU allocation. With the static policy, you can request integer CPUs for your containers (e.g., 1, 2, etc.), and {{product}} will try to assign them to dedicated CPU cores on the node, providing a greater degree of isolation and predictability. + - `--reserved-cpus=0-31`: This line indicates that no CPUs are reserved for the Kubelet or system processes. This implies that all CPUs might be available for pod scheduling, depending on the cluster's overall resource allocation strategy. + - `--topology-manager-policy=best-effort`: This flag sets the topology manager policy to "best-effort." The topology manager helps optimise pod placement on nodes by considering factors like NUMA nodes, CPU cores, and devices. The "best-effort" policy tries to place pods optimally, but it doesn't enforce strict requirements. ``` -You can also confirm the total number of NUMA CPUs available in the worker node: +You can also confirm the total number of NUMA CPUs available in the worker node. +Run the command: ``` lscpu ``` +The ouptut should include information on the CPUs like this example: + ``` .... NUMA: @@ -833,18 +836,21 @@ NUMA: ... ``` -Now let’s label the node with information about the available CPU/NUMA nodes, -and then create a pod selecting that label: +Label the node with information about the available CPU/NUMA nodes: ``` sudo k8s kubectl label node pc6b-rb4-n3 topology.kubernetes.io/zone=NUMA ``` +The output should indicate the label has been applied: + ``` node/pc6b-rb4-n3 labeled ``` +Now create a pod applying that label: + ``` cat < @@ -1095,4 +1101,5 @@ the correct PCI address: [MAAS]: https://maas.io [channel]: https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/ [install-link]: /snap/howto/install/snap -[snap]: https://snapcraft.io/docs \ No newline at end of file +[snap]: https://snapcraft.io/docs +[cyclictest]: https://github.com/jlelli/rt-tests \ No newline at end of file From d319af7b2050cf2416f7d49a22fcc02d0994b27b Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 16 Sep 2024 11:48:22 +0100 Subject: [PATCH 23/45] added some explanations --- docs/src/snap/howto/epa.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 0fef39fdc..3c88c2402 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -722,11 +722,11 @@ The output should show the “PREEMPT RT” identifier: Linux pc6b-rb4-n3 6.8.1-1004-realtime #4~22.04.1-Ubuntu SMP PREEMPT_RT Mon Jun 24 16:45:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux ``` -The test will use [cyclictest][], commonly used to assess the real-time performance -of a system, especially when running a real-time kernel. It measures the time -it takes for a thread to cycle between high and low priority states, giving you -an indication of the system's responsiveness to real-time events. Lower -latencies typically indicate better real-time performance. +The test will use [cyclictest][], commonly used to assess the real-time +performance of a system, especially when running a real-time kernel. It +measures the time it takes for a thread to cycle between high and low priority +states, giving you an indication of the system's responsiveness to real-time +events. Lower latencies typically indicate better real-time performance. The output of cyclictest will provide statistics including: @@ -971,12 +971,14 @@ Allocatable: .... ``` -The virtual functions should also appear on th +The virtual functions are created on the PCI bus, which can also be verified: ``` lspci | grep Virtual ``` +...should list the presence of the virtual functions: + ``` 98:11.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) 98:11.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) @@ -987,8 +989,15 @@ lspci | grep Virtual 99:00.7 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) ``` +Examine a specific VF from the list: + ``` lspci -s 98:1f.2 -vv +``` + +The output should confirm the correct kernel drivers in use: + +``` 98:1f.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02) Subsystem: Intel Corporation Ethernet Adaptive Virtual Function Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- @@ -1085,8 +1094,6 @@ the correct PCI address: ``` - - ## Further reading - [How to enable real-time Ubuntu](https://canonical-ubuntu-pro-client.readthedocs-hosted.com/en/latest/howtoguides/enable\_realtime\_kernel/\#how-to-enable-real-time-ubuntu) From a88848e2f79a09ff6cabe14c54782e74a9213f44 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 16 Sep 2024 12:07:28 +0100 Subject: [PATCH 24/45] explain taskset --- docs/src/snap/howto/epa.md | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 3c88c2402..c3a32e3b6 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -874,7 +874,7 @@ spec: EOF ``` -Dscribing the node and the pod will confirm that the pod is running +Describing the node and the pod will confirm that the pod is running on the intended node and that its CPU requests are being met. Running taskset inside the pod will identify the pod pinned to the process running inside the pod: @@ -893,10 +893,14 @@ sudo k8s kubectl describe node pc6b-rb4-n3 .... ``` +We can then describe the pod itself: + ``` sudo k8s kubectl describe pod cpu-pinning-test ``` +The output should confirm the limits and requests: + ``` ... Limits: @@ -908,16 +912,37 @@ sudo k8s kubectl describe pod cpu-pinning-test ... ``` - +To determine the CPUS in use are valid, open a shell on the pod: ``` sudo k8s kubectl exec -ti cpu-pinning-test -- /bin/bash +``` + +On this shell, confirm the running processes: + +``` root@cpu-pinning-test:/# ps -ef +``` + +which will list the running commands: + +``` UID PID PPID C STIME TTY TIME CMD root 1 0 0 08:51 ? 00:00:00 sleep infinity root 17 0 0 08:58 pts/0 00:00:00 /bin/bash root 25 17 0 08:58 pts/0 00:00:00 ps -ef +``` + +The first of these is the `sleep` command we instructed the pod to run. We then +use `taskset` in the pod: + +``` root@cpu-pinning-test:/# taskset -p 1 +``` + +This returns the current affinity mask: + +``` pid 1's current affinity mask: 1000000000000000100000000 ``` From a701ed18387c6902574f0f3ce090233625ffca1d Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 16 Sep 2024 12:13:57 +0100 Subject: [PATCH 25/45] more explanations --- docs/src/snap/howto/epa.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index c3a32e3b6..7fe669633 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -766,6 +766,8 @@ sudo k8s kubectl logs realtime-kernel-test -f ``` +This should produce output including: + ``` ... # /dev/cpu_dma_latency set to 0us @@ -805,6 +807,8 @@ node: ps -ef | grep /snap/k8s/678/bin/kubelet ``` +The process output will indicate the arguments used when running the kubelet: + ``` root 9139 1 1 Jul17 ? 00:20:03 /snap/k8s/678/bin/kubelet --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/client-ca.crt --cluster-dns=10.152.183.97 --cluster-domain=cluster.local --container-runtime-endpoint=/var/snap/k8s/common/run/containerd.sock --containerd=/var/snap/k8s/common/run/containerd.sock --cpu-manager-policy=static --eviction-hard=memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi --fail-swap-on=false --kubeconfig=/etc/kubernetes/kubelet.conf --node-ip=10.18.2.153 --node-labels=node-role.kubernetes.io/worker=,k8sd.io/role=worker --read-only-port=0 --register-with-taints= --reserved-cpus=0-31 --root-dir=/var/lib/kubelet --serialize-image-pulls=false --tls-cert-file=/etc/kubernetes/pki/kubelet.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-private-key-file=/etc/kubernetes/pki/kubelet.key --topology-manager-policy=best-effort ``` From 70cfa7d0bf76177a42aaaba99be771faecb2688c Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Mon, 16 Sep 2024 17:55:35 +0100 Subject: [PATCH 26/45] add inline note about version --- docs/src/snap/howto/epa.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 7fe669633..466e592db 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -395,6 +395,11 @@ It was recently backported from 24.04 to 22.04 and is still going through some validation stages. Once it is officially released, it will be installable via the Ubuntu Pro cli. ``` + + + ```` ````` From d0eef6203b484ca4d1a8aa7206218346fbc447b3 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Wed, 18 Sep 2024 11:01:20 +0100 Subject: [PATCH 27/45] add link to explanation --- docs/src/snap/howto/epa.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index 466e592db..c3e92f49b 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -1,7 +1,7 @@ # How to set up Enhanced Platform Awareness This section explains how to set up the Enhanced Platform Awareness (EPA) -features in a {{product}} cluster. +features in a {{product}} cluster. Please see the [EPA explanation page][explain-epa] for details about how EPA applies to {{product}}. The content starts with the setup of the environment (including steps for using [MAAS][MAAS]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK @@ -1143,4 +1143,5 @@ the correct PCI address: [channel]: https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/channels/ [install-link]: /snap/howto/install/snap [snap]: https://snapcraft.io/docs -[cyclictest]: https://github.com/jlelli/rt-tests \ No newline at end of file +[cyclictest]: https://github.com/jlelli/rt-tests +[explain-epa]: /snap/explanation/epa \ No newline at end of file From ab7bfbac4321ec807205e56eedeed523aa170469 Mon Sep 17 00:00:00 2001 From: "Louise K. Schmidtgen" Date: Mon, 9 Sep 2024 14:49:32 +0200 Subject: [PATCH 28/45] Docs Ingress default tls secret extra moonray comment (#656) --- docs/moonray/howto/index.md | 2 +- .../howto/networking/default-ingress-mr.md | 117 ++++++++++++++++++ docs/moonray/howto/networking/index.md | 15 +++ docs/src/snap/howto/index.md | 1 - .../snap/howto/networking/default-ingress.md | 17 ++- docs/src/snap/howto/networking/index.md | 1 + 6 files changed, 145 insertions(+), 8 deletions(-) create mode 100644 docs/moonray/howto/networking/default-ingress-mr.md create mode 100644 docs/moonray/howto/networking/index.md diff --git a/docs/moonray/howto/index.md b/docs/moonray/howto/index.md index 041e13e6c..11c35f452 100644 --- a/docs/moonray/howto/index.md +++ b/docs/moonray/howto/index.md @@ -14,7 +14,7 @@ Overview :glob: :titlesonly: install - +networking/index ``` --- diff --git a/docs/moonray/howto/networking/default-ingress-mr.md b/docs/moonray/howto/networking/default-ingress-mr.md new file mode 100644 index 000000000..24f9b435a --- /dev/null +++ b/docs/moonray/howto/networking/default-ingress-mr.md @@ -0,0 +1,117 @@ +# How to use default Ingress + +{{product}} allows you to configure Ingress into your cluster. When +enabled, it tells your cluster how external HTTP and HTTPS traffic should be +routed to its services. + +## What you'll need + +This guide assumes the following: + +- You have root or sudo access to the machine +- You have a bootstrapped {{product}} cluster (see the [Getting + Started][getting-started-guide] guide). + +## Check Ingress status + +Find out whether Ingress is enabled or disabled with the following command: + +``` +sudo k8s status +``` + +Please ensure that Ingress is enabled on your cluster. + +## Enable Ingress + +To enable Ingress, run: + +``` +sudo k8s enable ingress +``` + +For more information on the command, execute: + +``` +sudo k8s help enable +``` + +## Configure Ingress + +Discover your configuration options by running: + +``` +sudo k8s get ingress +``` + +You should see three options: + + +- `default-tls-secret`: Name of the TLS (Transport Layer Security) Secret that + will be used as the default Ingress certificate. The + `TLSCertificateDelegation` is created in the `projectcontour-root` namespace. + When defining an Ingress object, specify this secret as the default + certificate by setting the `secretName` field under `spec.tls`. + For further information, see the + [TLS Certificate Delegation guide][tls-delegation] guide. +- `enable-proxy-protocol`: If set, proxy protocol will be enabled for the + Ingress. + +### TLS Secret + +You can create a TLS secret by following the official +[Kubernetes documentation][kubectl-create-secret-tls/]. +Please remember to use `sudo k8s kubectl` (See the [kubectl-guide]). + +Tell Ingress to use your new Ingress certificate: + +``` +sudo k8s set ingress.default-tls-secret= +``` + +Replace `` with the desired value for your Ingress +configuration. + +### Proxy Protocol + +Enabling the proxy protocol allows passing client connection information to the +backend service. + +Consult the official +[Kubernetes documentation on the proxy protocol][proxy-protocol]. + +Use the following command to enable the proxy protocol: + +``` +sudo k8s set ingress.enable-proxy-protocol= +``` + +Adjust the value of `` with your proxy protocol +requirements. + +## Disable Ingress + +You can `disable` the built-in ingress: + +``` {warning} Disabling Ingress may impact external access to services within + your cluster. + Ensure that you have alternative configurations in place before disabling Ingress. +``` + +``` +sudo k8s disable ingress +``` + +For more information on this command, run: + +``` +sudo k8s help disable +``` + + + +[kubectl-create-secret-tls/]: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/ +[proxy-protocol]: https://kubernetes.io/docs/reference/networking/service-protocols/#protocol-proxy-special +[getting-started-guide]: /snap/tutorial/getting-started +[kubectl-guide]: /snap/tutorial/kubectl +[tls-delegation]: https://projectcontour.io/docs/main/config/tls-delegation/ diff --git a/docs/moonray/howto/networking/index.md b/docs/moonray/howto/networking/index.md new file mode 100644 index 000000000..f2982ead8 --- /dev/null +++ b/docs/moonray/howto/networking/index.md @@ -0,0 +1,15 @@ +# Networking + +```{toctree} +:hidden: +Networking +``` + +Networking is a core part of a working Kubernetes cluster. These topics cover +how to configure and use key capabilities of {{product}}. + +```{toctree} +:titlesonly: + +default-ingress-mr.md +``` diff --git a/docs/src/snap/howto/index.md b/docs/src/snap/howto/index.md index c7bf9da3b..45c18485c 100644 --- a/docs/src/snap/howto/index.md +++ b/docs/src/snap/howto/index.md @@ -16,7 +16,6 @@ Overview install/index networking/index -networking/dualstack storage/index external-datastore proxy diff --git a/docs/src/snap/howto/networking/default-ingress.md b/docs/src/snap/howto/networking/default-ingress.md index ce9f792d0..90498d910 100644 --- a/docs/src/snap/howto/networking/default-ingress.md +++ b/docs/src/snap/howto/networking/default-ingress.md @@ -20,7 +20,7 @@ Find out whether Ingress is enabled or disabled with the following command: sudo k8s status ``` -The default state for the cluster is `ingress disabled`. +Please ensure that Ingress is enabled on your cluster. ## Enable Ingress @@ -46,6 +46,7 @@ sudo k8s get ingress You should see three options: +- `enabled`: If set to true, Ingress is enabled - `default-tls-secret`: Name of the TLS (Transport Layer Security) Secret in the kube-system namespace that will be used as the default Ingress certificate @@ -53,8 +54,9 @@ You should see three options: ### TLS Secret -You can create a TLS secret by following the official [Kubernetes documentation][kubectl-create-secret-tls/]. -Note: remember to use `sudo k8s kubectl` (See the [kubectl-guide]). +You can create a TLS secret by following the official +[Kubernetes documentation][kubectl-create-secret-tls/]. +Please remember to use `sudo k8s kubectl` (See the [kubectl-guide]). Tell Ingress to use your new Ingress certificate: @@ -62,14 +64,16 @@ Tell Ingress to use your new Ingress certificate: sudo k8s set ingress.default-tls-secret= ``` -Replace `` with the desired value for your Ingress configuration. +Replace `` with the desired value for your Ingress +configuration. ### Proxy Protocol Enabling the proxy protocol allows passing client connection information to the backend service. -Consult the official [Kubernetes documentation on the proxy protocol][proxy-protocol]. +Consult the official +[Kubernetes documentation on the proxy protocol][proxy-protocol]. Use the following command to enable the proxy protocol: @@ -77,7 +81,8 @@ Use the following command to enable the proxy protocol: sudo k8s set ingress.enable-proxy-protocol= ``` -Adjust the value of `` with your proxy protocol requirements. +Adjust the value of `` with your proxy protocol +requirements. ## Disable Ingress diff --git a/docs/src/snap/howto/networking/index.md b/docs/src/snap/howto/networking/index.md index 2f62c6b3a..98d42bd55 100644 --- a/docs/src/snap/howto/networking/index.md +++ b/docs/src/snap/howto/networking/index.md @@ -15,4 +15,5 @@ how to configure and use key capabilities of {{product}}. /snap/howto/networking/default-network.md /snap/howto/networking/default-ingress.md /snap/howto/networking/default-loadbalancer.md +/snap/howto/networking/dualstack.md ``` From d97961de0a2e4972f4787d343cd51701b015b8dd Mon Sep 17 00:00:00 2001 From: Homayoon Alimohammadi Date: Mon, 9 Sep 2024 17:08:54 +0400 Subject: [PATCH 29/45] Add more sections to the CAPI docs (#655) * Add more sections to the CAPI docs * Address comments --- docs/src/capi/explanation/capi-ck8s.md | 42 ++++ docs/src/capi/explanation/capi-ck8s.svg | 4 + docs/src/capi/explanation/index.md | 1 + docs/src/capi/howto/custom-ck8s.md | 64 +++++++ docs/src/capi/howto/index.md | 3 + docs/src/capi/howto/migrate-management.md | 29 +++ docs/src/capi/howto/upgrade-providers.md | 53 +++++ docs/src/capi/reference/configs.md | 224 ++++++++++++++++++++++ docs/src/capi/reference/index.md | 1 + 9 files changed, 421 insertions(+) create mode 100644 docs/src/capi/explanation/capi-ck8s.md create mode 100644 docs/src/capi/explanation/capi-ck8s.svg create mode 100644 docs/src/capi/howto/custom-ck8s.md create mode 100644 docs/src/capi/howto/migrate-management.md create mode 100644 docs/src/capi/howto/upgrade-providers.md create mode 100644 docs/src/capi/reference/configs.md diff --git a/docs/src/capi/explanation/capi-ck8s.md b/docs/src/capi/explanation/capi-ck8s.md new file mode 100644 index 000000000..d75db76ac --- /dev/null +++ b/docs/src/capi/explanation/capi-ck8s.md @@ -0,0 +1,42 @@ +# Cluster API - {{product}} + +ClusterAPI (CAPI) is an open-source Kubernetes project that provides a declarative API for cluster creation, configuration, and management. It is designed to automate the creation and management of Kubernetes clusters in various environments, including on-premises data centers, public clouds, and edge devices. + +CAPI abstracts away the details of infrastructure provisioning, networking, and other low-level tasks, allowing users to define their desired cluster configuration using simple YAML manifests. This makes it easier to create and manage clusters in a repeatable and consistent manner, regardless of the underlying infrastructure. In this way a wide range of infrastructure providers has been made available, including but not limited to Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and OpenStack. + +CAPI also abstracts the provisioning and management of Kubernetes clusters allowing for a variety of Kubernetes distributions to be delivered in all of the supported infrastructure providers. {{product}} is one such Kubernetes distribution that seamlessly integrates with Cluster API. + +With {{product}} CAPI you can: +- provision a cluster with: + - Kubernetes version 1.31 onwards + - risk level of the track you want to follow (stable, candidate, beta, edge) + - deploy behind proxies +- upgrade clusters with no downtime: + - rolling upgrades for HA clusters and worker nodes + - in-place upgrades for non-HA control planes and worker nodes + +Please refer to the “Tutorial” section for concrete examples on CAPI deployments: + + +## CAPI architecture + +Being a cloud-native framework, CAPI implements all its components as controllers that run within a Kubernetes cluster. There is a separate controller, called a ‘provider’, for each supported infrastructure substrate. The infrastructure providers are responsible for provisioning physical or virtual nodes and setting up networking elements such as load balancers and virtual networks. In a similar way, each Kubernetes distribution that integrates with ClusterAPI is managed by two providers: the control plane provider and the bootstrap provider. The bootstrap provider is responsible for delivering and managing Kubernetes on the nodes, while the control plane provider handles the control plane’s specific lifecycle. + +The CAPI providers operate within a Kubernetes cluster known as the management cluster. The administrator is responsible for selecting the desired combination of infrastructure and Kubernetes distribution by instantiating the respective infrastructure, bootstrap, and control plane providers on the management cluster. + +The management cluster functions as the control plane for the ClusterAPI operator, which is responsible for provisioning and managing the infrastructure resources necessary for creating and managing additional Kubernetes clusters. It is important to note that the management cluster is not intended to support any other workload, as the workloads are expected to run on the provisioned clusters. As a result, the provisioned clusters are referred to as workload clusters. + +Typically, the management cluster runs in a separate environment from the clusters it manages, such as a public cloud or an on-premises data center. It serves as a centralized location for managing the configuration, policies, and security of multiple managed clusters. By leveraging the management cluster, users can easily create and manage a fleet of Kubernetes clusters in a consistent and repeatable manner. + +The {{product}} team maintains the two providers required for integrating with CAPI: + +- The Cluster API Bootstrap Provider {{product}} (**CABPCK**) responsible for provisioning the nodes in the cluster and preparing them to be joined to the Kubernetes control plane. When you use the CABPCK you define a Kubernetes Cluster object that describes the desired state of the new cluster and includes the number and type of nodes in the cluster, as well as any additional configuration settings. The Bootstrap Provider then creates the necessary resources in the Kubernetes API server to bring the cluster up to the desired state. Under the hood, the Bootstrap Provider uses cloud-init to configure the nodes in the cluster. This includes setting up SSH keys, configuring the network, and installing necessary software packages. + +- The Cluster API Control Plane Provider {{product}} (**CACPCK**) enables the creation and management of Kubernetes control planes using {{product}} as the underlying Kubernetes distribution. Its main tasks are to update the machine state and to generate the kubeconfig file used for accessing the cluster. The kubeconfig file is stored as a secret which the user can then retrieve using the `clusterctl` command. + +```{figure} ./capi-ck8s.svg + :width: 100% + :alt: Deployment of components + + Deployment of components +``` diff --git a/docs/src/capi/explanation/capi-ck8s.svg b/docs/src/capi/explanation/capi-ck8s.svg new file mode 100644 index 000000000..d7df80727 --- /dev/null +++ b/docs/src/capi/explanation/capi-ck8s.svg @@ -0,0 +1,4 @@ + + + +
Canonical Kubernetes Bootstrap Provider (CABCK)
CAPI Machine with Canonical Kubernetes Config
CA
Join Token
kubeconfig
Canonical Kubernetes Control Plane Provider (CACPCK)
Infrastructure Providder
Control Plane
Worker Nodes
VM  #1
VM  #2
VM  #3
VM  #N-2
VM  #N-1
VM  #N
...
Provisioned (Workload) Cluster
User
Cluster EP
clusterctl get config
Bootstrap (Management) Cluster
Bootstrap secret
Deliver cloudinit 
for nodes
User talks to cluster EP
Generate Secrets
- Join Token
- CA
diff --git a/docs/src/capi/explanation/index.md b/docs/src/capi/explanation/index.md index 61336858e..775dd26a3 100644 --- a/docs/src/capi/explanation/index.md +++ b/docs/src/capi/explanation/index.md @@ -15,6 +15,7 @@ Overview about security +capi-ck8s.md ``` diff --git a/docs/src/capi/howto/custom-ck8s.md b/docs/src/capi/howto/custom-ck8s.md new file mode 100644 index 000000000..d81191980 --- /dev/null +++ b/docs/src/capi/howto/custom-ck8s.md @@ -0,0 +1,64 @@ +# Install custom {{product}} on machines + +By default, the `version` field in the machine specifications will determine which {{product}} is downloaded from the `stable` rist level. While you can install different versions of the `stable` risk level by changing the `version` field, extra steps should be taken if you're willing to install a specific risk level. +This guide walks you through the process of installing custom {{product}} on workload cluster machines. + +## Prerequisites + +To follow this guide, you will need: + +- A Kubernetes management cluster with Cluster API and providers installed and configured. +- A generated cluster spec manifest + +Please refer to the [getting-started guide][getting-started] for further +details on the required setup. + +In this guide we call the generated cluster spec manifrst `cluster.yaml`. + +## Overwrite the existing `install.sh` script + +The installation of the {{product}} snap is done via running the `install.sh` script in the cloud-init. +While this file is automatically placed in every workload cluster machine which hard-coded content by {{product}} providers, you can overwrite this file to make sure your desired content is available in the script. + +As an example, let's overwrite the `install.sh` for our control plane nodes. Inside the `cluster.yaml`, add the new file content: +```yaml +apiVersion: controlplane.cluster.x-k8s.io/v1beta2 +kind: CK8sControlPlane +... +spec: + ... + spec: + files: + - content: | + #!/bin/bash -xe + snap install k8s --classic --channel=latest/edge + owner: root:root + path: /capi/scripts/install.sh + permissions: "0500" +``` + +Now the new control plane nodes that are created using this manifest will have the `latest/edge` {{product}} snap installed on them! + +## Use `preRunCommands` + +As mentioned above, the `install.sh` script is responsible for installing {{product}} snap on machines. `preRunCommands` are executed before `install.sh`. You can also add an install command to the `preRunCommands` in order to install your desired {{product}} version. + +```{note} +Installing the {{product}} snap via the `preRunCommands`, does not prevent the `install.sh` script from running. Instead, the installation process in the `install.sh` will fail with a message indicating that `k8s` is already installed. +This is not considered a standard way and overwriting the `install.sh` script is recommended. +``` + +Edit the `cluster.yaml` to add the installation command: +```yaml +apiVersion: controlplane.cluster.x-k8s.io/v1beta2 +kind: CK8sControlPlane +... +spec: + ... + spec: + preRunCommands: + - snap install k8s --classic --channel=latest/edge +``` + + +[getting-started]: ../tutorial/getting-started.md diff --git a/docs/src/capi/howto/index.md b/docs/src/capi/howto/index.md index f013e6b35..375a5025a 100644 --- a/docs/src/capi/howto/index.md +++ b/docs/src/capi/howto/index.md @@ -16,6 +16,9 @@ Overview external-etcd rollout-upgrades +upgrade-providers +migrate-management +custom-ck8s ``` --- diff --git a/docs/src/capi/howto/migrate-management.md b/docs/src/capi/howto/migrate-management.md new file mode 100644 index 000000000..11a1474f3 --- /dev/null +++ b/docs/src/capi/howto/migrate-management.md @@ -0,0 +1,29 @@ +# Migrate the managment cluster + +Management cluster migration is a really powerful operation in the cluster’s lifecycle as it allows admins +to move the management cluster in a more reliable substrate or perform maintenance tasks without disruptions. +In this guide we will walk through the migration of a management cluster. + +## Prerequisites + +In the [Cluster provisioning with CAPI and {{product}} tutorial] we showed how to provision a workloads cluster. Here, we start from the point where the workloads cluster is available and we will migrate the management cluster to the one cluster we just provisioned. + +## Install the same set of providers to the provisioned cluster + +Before migrating a cluster, we must make sure that both the target and source management clusters run the same version of providers (infrastructure, bootstrap, control plane). To do so, `clusterctl init` should be called against the target cluster: + +``` +clusterctl get kubeconfig > targetconfig +clusterctl init --kubeconfig=$PWD/targetconfig --bootstrap ck8s --control-plane ck8s --infrastructure +``` + +## Move the cluster + +Simply call: + +``` +clusterctl move --to-kubeconfig=$PWD/targetconfig +``` + + +[Cluster provisioning with CAPI and {{product}} tutorial]: ../tutorial/getting-started.md diff --git a/docs/src/capi/howto/upgrade-providers.md b/docs/src/capi/howto/upgrade-providers.md new file mode 100644 index 000000000..5188c2413 --- /dev/null +++ b/docs/src/capi/howto/upgrade-providers.md @@ -0,0 +1,53 @@ +# Upgrading the providers of a management cluster + +In this guide we will go through the process of upgrading providers of a management cluster. + +## Prerequisites + +We assume we already have a management cluster and the infrastructure provider configured as described in the [Cluster provisioning with CAPI and {{product}} tutorial]. The selected infrastructure provider is AWS. We have not yet called `clusterctl init` to initialise the cluster. + +## Initialise the cluster + +To demonstrate the steps of upgrading the management cluster, we will begin by initialising a desired version of the {{product}} CAPI providers. + +To set the version of the providers to be installed we use the following notation: + +``` +clusterctl init --bootstrap ck8s:v0.1.2 --control-plane ck8s:v0.1.2 --infrastructure +``` + +## Check for updates + +With `clusterctl` we can check if there are any new versions of the running providers: + +``` +clusterctl upgrade plan +``` + +The output shows the existing version of each provider as well as the version that we can upgrade into: + +```text +NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION +bootstrap-ck8s cabpck-system BootstrapProvider v0.1.2 v0.2.0 +control-plane-ck8s cacpck-system ControlPlaneProvider v0.1.2 v0.2.0 +cluster-api capi-system CoreProvider v1.8.1 Already up to date +infrastructure-aws capa-system InfrastructureProvider v2.6.1 Already up to date +``` + +## Trigger providers upgrade + +To apply the upgrade plan recommended by `clusterctl upgrade plan`, simply: + +``` +clusterctl upgrade apply --contract v1beta1 +``` + +To upgrade each provider one by one, issue: + +``` +clusterctl upgrade apply --bootstrap cabpck-system/ck8s:v0.2.0 +clusterctl upgrade apply --control-plane cacpck-system/ck8s:v0.2.0 +``` + + +[Cluster provisioning with CAPI and {{product}} tutorial]: ../tutorial/getting-started.md diff --git a/docs/src/capi/reference/configs.md b/docs/src/capi/reference/configs.md new file mode 100644 index 000000000..60ce9bebe --- /dev/null +++ b/docs/src/capi/reference/configs.md @@ -0,0 +1,224 @@ +# Providers Configurations + +{{product}} bootstrap and control plane providers (CABPCK and CACPCK) can be configured to aid the cluster admin in reaching the desired state for the workload cluster. In this section we will go through different configurations that each one of these providers expose. + +## Common Configurations + +The following configurations are available for both bootstrap and control plane providers. + +### `version` +**Type:** `string` + +**Required:** yes + +`version` is used to specify the {{product}} version installed on the nodes. + +```{note} +The {{product}} providers will install the latest patch in the `stable` risk level by default, e.g. `1.30/stable`. Patch versions specified in this configuration will be ignored. + +To install a specific track or risk level, see [Install custom {{product}} on machines] guide. +``` + +**Example Usage:** +```yaml +spec: + version: 1.30 +``` + +### `files` +**Type:** `struct` + +**Required:** no + +`files` can be used to add new files to the machines or overwrite existing files. + +**Fields:** + +| Name | Type | Description | Default | +|------|------|-------------|---------| +| `path` | `string` | Where the file should be created | `""` | +| `content` | `string` | Content of the created file | `""` | +| `permissions` | `string` | Permissions of the file to create, e.g. "0600" | `""` | +| `owner` | `string` | Owner of the file to create, e.g. "root:root" | `""` | + +**Example Usage:** +```yaml +spec: + files: + path: "/path/to/my-file" + content: | + #!/bin/bash -xe + echo "hello from my-file + permissions: "0500" + owner: root:root +``` + +### `bootCommands` +**Type:** `[]string` + +**Required:** no + +`bootCommands` specifies extra commands to run in cloud-init early in the boot process. + +**Example Usage:** +```yaml +spec: + bootCommands: + - echo "first-command" + - echo "second-command" +``` + +### `preRunCommands` +**Type:** `[]string` + +**Required:** no + +`preRunCommands` specifies extra commands to run in cloud-init before k8s-snap setup runs. + +```{note} +`preRunCommands` can also be used to install custom {{product}} versions on machines. See [Install custom {{product}} on machines] guide for more info. +``` + +**Example Usage:** +```yaml +spec: + preRunCommands: + - echo "first-command" + - echo "second-command" +``` + +### `postRunCommands` +**Type:** `[]string` + +**Required:** no + +`postRunCommands` specifies extra commands to run in cloud-init after k8s-snap setup runs. + +**Example Usage:** +```yaml +spec: + postRunCommands: + - echo "first-command" + - echo "second-command" +``` + +### `airGapped` +**Type:** `bool` + +**Required:** no + +`airGapped` is used to signal that we are deploying to an airgap environment. In this case, the provider will not attempt to install k8s-snap on the machine. The user is expected to install k8s-snap manually with [`preRunCommands`](#preRunCommands), or provide an image with k8s-snap pre-installed. + +**Example Usage:** +```yaml +spec: + airGapped: true +``` + +### `initConfig` +**Type:** `struct` + +**Required:** no + +`initConfig` is configuration for the initializing the cluster features + +**Fields:** + +| Name | Type | Description | Default | +|------|------|-------------|---------| +| `annotations` | `map[string]string` | Are used to configure the behaviour of the built-in features. | `nil` | +| `enableDefaultDNS` | `bool` | Specifies whether to enable the default DNS configuration. | `true` | +| `enableDefaultLocalStorage` | `bool` | Specifies whether to enable the default local storage. | `true` | +| `enableDefaultMetricsServer` | `bool` | Specifies whether to enable the default metrics server. | `true` | +| `enableDefaultNetwork` | `bool` | Specifies whether to enable the default CNI. | `true` | + + +**Example Usage:** +```yaml +spec: + initConfig: + annotations: + annotationKey: "annotationValue" + enableDefaultDNS: false + enableDefaultLocalStorage: true + enableDefaultMetricsServer: false + enableDefaultNetwork: true +``` + +### `nodeName` +**Type:** `string` + +**Required:** no + +`nodeName` is the name to use for the kubelet of this node. It is needed for clouds where the cloud-provider has specific pre-requisites about the node names. It is typically set in Jinja template form, e.g. `"{{ ds.meta_data.local_hostname }}"`. + +**Example Usage:** +```yaml +spec: + nodeName: "{{ ds.meta_data.local_hostname }}" +``` + +## Control plane provider (CACPCK) + +The following configurations are only available for the control plane provider. + +### `replicas` +**Type:** `int32` + +**Required:** no + +`replicas` is the number of desired machines. Defaults to 1. When stacked etcd is used only odd numbers are permitted, as per [etcd best practice]. + +**Example Usage:** +```yaml +spec: + replicas: 2 +``` + +### `controlPlane` +**Type:** `struct` + +**Required:** no + +`controlPlane` is configuration for control plane nodes. + +**Fields:** + +| Name | Type | Description | Default | +|------|------|-------------|---------| +| `extraSANs` | `[]string` | A list of SANs to include in the server certificates. | `[]` | +| `cloudProvider` | `string` | The cloud-provider configuration option to set. | `""` | +| `nodeTaints` | `[]string` | Taints to add to the control plane kubelet nodes. | `[]` | +| `datastoreType` | `string` | The type of datastore to use for the control plane. | `""` | +| `datastoreServersSecretRef` | `struct{name:str, key:str}` | A reference to a secret containing the datastore servers. | `{}` | +| `k8sDqlitePort` | `int` | The port to use for k8s-dqlite. If unset, 2379 (etcd) will be used. | `2379` | +| `microclusterAddress` | `string` | The address (or CIDR) to use for microcluster. If unset, the default node interface is chosen. | `""` | +| `microclusterPort` | `int` | The port to use for microcluster. If unset, ":2380" (etcd peer) will be used. | `":2380"` | +| `extraKubeAPIServerArgs` | `map[string]string` | Extra arguments to add to kube-apiserver. | `map[]` | + +**Example Usage:** +```yaml +spec: + controlPlane: + extraSANs: + - extra.san + cloudProvider: external + nodeTaints: + - myTaint + datastoreType: k8s-dqlite + datastoreServersSecretRef: + name: sfName + key: sfKey + k8sDqlitePort: 2379 + microclusterAddress: my.address + microclusterPort: ":2380" + extraKubeAPIServerArgs: + argKey: argVal +``` + + +[Install custom {{product}} on machines]: ../howto/custom-ck8s.md +[etcd best practices]: https://etcd.io/docs/v3.5/faq/#why-an-odd-number-of-cluster-members + + + diff --git a/docs/src/capi/reference/index.md b/docs/src/capi/reference/index.md index b98291faf..1712305ec 100644 --- a/docs/src/capi/reference/index.md +++ b/docs/src/capi/reference/index.md @@ -12,6 +12,7 @@ Overview :titlesonly: releases community +configs ``` From 2273e4a2629cc3525419f5283ea6d49d2a06fa67 Mon Sep 17 00:00:00 2001 From: Benjamin Schimke Date: Tue, 10 Sep 2024 13:33:26 +0200 Subject: [PATCH 30/45] Add annotations docs (#652) --------- Co-authored-by: Nick Veitch --- docs/src/capi/reference/annotations.md | 15 +++++++++++++++ docs/src/capi/reference/index.md | 1 + docs/src/snap/reference/annotations.md | 13 +++++++++++++ docs/src/snap/reference/index.md | 1 + 4 files changed, 30 insertions(+) create mode 100644 docs/src/capi/reference/annotations.md create mode 100644 docs/src/snap/reference/annotations.md diff --git a/docs/src/capi/reference/annotations.md b/docs/src/capi/reference/annotations.md new file mode 100644 index 000000000..8f9e87fa9 --- /dev/null +++ b/docs/src/capi/reference/annotations.md @@ -0,0 +1,15 @@ +# Annotations + +Like annotations for other Kubernetes objects, CAPI annotations are key-value +pairs that can be used to reflect additional metadata for CAPI resources. + +## Machine + +The following annotations can be set on CAPI `Machine` resources. + +| Name | Description | Values | Set by user | +|-----------------------------------------------|------------------------------------------------------|------------------------------|-------------| +| `v1beta2.k8sd.io/in-place-upgrade-to` | Trigger a Kubernetes version upgrade on that machine | snap version e.g.:
- `localPath=/full/path/to/k8s.snap`
- `revision=123`
- `channel=latest/edge` | yes | +| `v1beta2.k8sd.io/in-place-upgrade-status` | The status of the version upgrade | in-progress\|done\|failed | no | +| `v1beta2.k8sd.io/in-place-upgrade-release` | The current version on the machine | snap version e.g.:
- `localPath=/full/path/to/k8s.snap`
- `revision=123`
- `channel=latest/edge` | no | +| `v1beta2.k8sd.io/in-place-upgrade-change-id` | The ID of the currently running upgrade | ID string | no | diff --git a/docs/src/capi/reference/index.md b/docs/src/capi/reference/index.md index 1712305ec..cd239300c 100644 --- a/docs/src/capi/reference/index.md +++ b/docs/src/capi/reference/index.md @@ -11,6 +11,7 @@ Overview ```{toctree} :titlesonly: releases +annotations community configs diff --git a/docs/src/snap/reference/annotations.md b/docs/src/snap/reference/annotations.md new file mode 100644 index 000000000..b5e4404d8 --- /dev/null +++ b/docs/src/snap/reference/annotations.md @@ -0,0 +1,13 @@ +# Annotations + +This page outlines the annotations that can be configured during cluster +[bootstrap]. To do this, set the cluster-config/annotations parameter in +the bootstrap configuration. + +| Name | Description | Values | +|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------| +| `k8sd/v1alpha/lifecycle/skip-cleanup-kubernetes-node-on-remove` | If set, only microcluster and file cleanup are performed. This is helpful when an external controller (e.g., CAPI) manages the Kubernetes node lifecycle. By default, k8sd will remove the Kubernetes node when it is removed from the cluster. | "true"\|"false" | + + + +[bootstrap]: /snap/reference/bootstrap-config-reference diff --git a/docs/src/snap/reference/index.md b/docs/src/snap/reference/index.md index bb2a5735a..f1720e760 100644 --- a/docs/src/snap/reference/index.md +++ b/docs/src/snap/reference/index.md @@ -13,6 +13,7 @@ Overview releases commands +annotations certificates bootstrap-config-reference proxy From 494f4f80abb5361dc387ae519d8c8e172a596169 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Berkay=20Tekin=20=C3=96z?= Date: Wed, 11 Sep 2024 08:55:54 +0300 Subject: [PATCH 31/45] Change default cgroup driver to systemd (#661) --- src/k8s/pkg/k8sd/setup/containerd.go | 3 +++ src/k8s/pkg/k8sd/setup/kubelet.go | 1 + src/k8s/pkg/k8sd/setup/kubelet_test.go | 6 ++++++ 3 files changed, 10 insertions(+) diff --git a/src/k8s/pkg/k8sd/setup/containerd.go b/src/k8s/pkg/k8sd/setup/containerd.go index b756de190..dbf1545ec 100644 --- a/src/k8s/pkg/k8sd/setup/containerd.go +++ b/src/k8s/pkg/k8sd/setup/containerd.go @@ -68,6 +68,9 @@ func defaultContainerdConfig( "runtimes": map[string]any{ "runc": map[string]any{ "runtime_type": "io.containerd.runc.v2", + "options": map[string]any{ + "SystemdCgroup": true, + }, }, }, }, diff --git a/src/k8s/pkg/k8sd/setup/kubelet.go b/src/k8s/pkg/k8sd/setup/kubelet.go index cdfa82f97..ff4cd9e0f 100644 --- a/src/k8s/pkg/k8sd/setup/kubelet.go +++ b/src/k8s/pkg/k8sd/setup/kubelet.go @@ -54,6 +54,7 @@ func kubelet(snap snap.Snap, hostname string, nodeIP net.IP, clusterDNS string, "--client-ca-file": filepath.Join(snap.KubernetesPKIDir(), "client-ca.crt"), "--container-runtime-endpoint": filepath.Join(snap.ContainerdSocketDir(), "containerd.sock"), "--containerd": filepath.Join(snap.ContainerdSocketDir(), "containerd.sock"), + "--cgroup-driver": "systemd", "--eviction-hard": "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'", "--fail-swap-on": "false", "--kubeconfig": filepath.Join(snap.KubernetesConfigDir(), "kubelet.conf"), diff --git a/src/k8s/pkg/k8sd/setup/kubelet_test.go b/src/k8s/pkg/k8sd/setup/kubelet_test.go index b8e1656bc..99129e3d5 100644 --- a/src/k8s/pkg/k8sd/setup/kubelet_test.go +++ b/src/k8s/pkg/k8sd/setup/kubelet_test.go @@ -59,6 +59,7 @@ func TestKubelet(t *testing.T) { {key: "--client-ca-file", expectedVal: filepath.Join(s.Mock.KubernetesPKIDir, "client-ca.crt")}, {key: "--container-runtime-endpoint", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, {key: "--containerd", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, + {key: "--cgroup-driver", expectedVal: "systemd"}, {key: "--eviction-hard", expectedVal: "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'"}, {key: "--fail-swap-on", expectedVal: "false"}, {key: "--hostname-override", expectedVal: "dev"}, @@ -116,6 +117,7 @@ func TestKubelet(t *testing.T) { {key: "--client-ca-file", expectedVal: filepath.Join(s.Mock.KubernetesPKIDir, "client-ca.crt")}, {key: "--container-runtime-endpoint", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, {key: "--containerd", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, + {key: "--cgroup-driver", expectedVal: "systemd"}, {key: "--eviction-hard", expectedVal: "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'"}, {key: "--fail-swap-on", expectedVal: "false"}, {key: "--hostname-override", expectedVal: "dev"}, @@ -173,6 +175,7 @@ func TestKubelet(t *testing.T) { {key: "--client-ca-file", expectedVal: filepath.Join(s.Mock.KubernetesPKIDir, "client-ca.crt")}, {key: "--container-runtime-endpoint", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, {key: "--containerd", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, + {key: "--cgroup-driver", expectedVal: "systemd"}, {key: "--eviction-hard", expectedVal: "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'"}, {key: "--fail-swap-on", expectedVal: "false"}, {key: "--hostname-override", expectedVal: "dev"}, @@ -221,6 +224,7 @@ func TestKubelet(t *testing.T) { {key: "--client-ca-file", expectedVal: filepath.Join(s.Mock.KubernetesPKIDir, "client-ca.crt")}, {key: "--container-runtime-endpoint", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, {key: "--containerd", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, + {key: "--cgroup-driver", expectedVal: "systemd"}, {key: "--eviction-hard", expectedVal: "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'"}, {key: "--fail-swap-on", expectedVal: "false"}, {key: "--hostname-override", expectedVal: "dev"}, @@ -278,6 +282,7 @@ func TestKubelet(t *testing.T) { {key: "--client-ca-file", expectedVal: filepath.Join(s.Mock.KubernetesPKIDir, "client-ca.crt")}, {key: "--container-runtime-endpoint", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, {key: "--containerd", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, + {key: "--cgroup-driver", expectedVal: "systemd"}, {key: "--eviction-hard", expectedVal: "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'"}, {key: "--fail-swap-on", expectedVal: "false"}, {key: "--hostname-override", expectedVal: "dev"}, @@ -334,6 +339,7 @@ func TestKubelet(t *testing.T) { {key: "--client-ca-file", expectedVal: filepath.Join(s.Mock.KubernetesPKIDir, "client-ca.crt")}, {key: "--container-runtime-endpoint", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, {key: "--containerd", expectedVal: filepath.Join(s.Mock.ContainerdSocketDir, "containerd.sock")}, + {key: "--cgroup-driver", expectedVal: "systemd"}, {key: "--eviction-hard", expectedVal: "'memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi'"}, {key: "--fail-swap-on", expectedVal: "false"}, {key: "--hostname-override", expectedVal: "dev"}, From 5966752a7421bd7d322c9226279008e66de82a54 Mon Sep 17 00:00:00 2001 From: Adam Dyess Date: Thu, 12 Sep 2024 13:45:30 -0500 Subject: [PATCH 32/45] Introduce tests to ensure branches and lp-recipes exist (#663) --- .github/workflows/integration.yaml | 16 ++++ tests/branch_management/.copyright.tmpl | 1 + tests/branch_management/requirements-dev.txt | 5 + tests/branch_management/requirements-test.txt | 7 ++ tests/branch_management/tests/conftest.py | 27 ++++++ .../branch_management/tests/test_branches.py | 93 +++++++++++++++++++ tests/branch_management/tox.ini | 52 +++++++++++ 7 files changed, 201 insertions(+) create mode 100644 tests/branch_management/.copyright.tmpl create mode 100644 tests/branch_management/requirements-dev.txt create mode 100644 tests/branch_management/requirements-test.txt create mode 100644 tests/branch_management/tests/conftest.py create mode 100644 tests/branch_management/tests/test_branches.py create mode 100644 tests/branch_management/tox.ini diff --git a/.github/workflows/integration.yaml b/.github/workflows/integration.yaml index 8f5b0eb40..6a58c1194 100644 --- a/.github/workflows/integration.yaml +++ b/.github/workflows/integration.yaml @@ -45,6 +45,22 @@ jobs: name: k8s.snap path: k8s.snap + test-branches: + name: Test Branch Management + runs-on: ubuntu-20.04 + steps: + - name: Check out code + uses: actions/checkout@v4 + - name: Setup Python + uses: actions/setup-python@v5 + with: + python-version: '3.8' + - name: Install tox + run: pip install tox + - name: Run branch_management tests + run: | + tox -c tests/branch_management -e integration + test-integration: name: Test ${{ matrix.os }} strategy: diff --git a/tests/branch_management/.copyright.tmpl b/tests/branch_management/.copyright.tmpl new file mode 100644 index 000000000..ecbed6c7a --- /dev/null +++ b/tests/branch_management/.copyright.tmpl @@ -0,0 +1 @@ +Copyright ${years} ${owner}. diff --git a/tests/branch_management/requirements-dev.txt b/tests/branch_management/requirements-dev.txt new file mode 100644 index 000000000..a66721ae0 --- /dev/null +++ b/tests/branch_management/requirements-dev.txt @@ -0,0 +1,5 @@ +black==24.3.0 +codespell==2.2.4 +flake8==6.0.0 +isort==5.12.0 +licenseheaders==0.8.8 diff --git a/tests/branch_management/requirements-test.txt b/tests/branch_management/requirements-test.txt new file mode 100644 index 000000000..b702086cc --- /dev/null +++ b/tests/branch_management/requirements-test.txt @@ -0,0 +1,7 @@ +coverage[toml]==7.2.5 +pytest==7.3.1 +PyYAML==6.0.1 +tenacity==8.2.3 +pylint==3.2.5 +requests==2.32.3 +semver==3.0.2 \ No newline at end of file diff --git a/tests/branch_management/tests/conftest.py b/tests/branch_management/tests/conftest.py new file mode 100644 index 000000000..f745828fb --- /dev/null +++ b/tests/branch_management/tests/conftest.py @@ -0,0 +1,27 @@ +# +# Copyright 2024 Canonical, Ltd. +# +from pathlib import Path + +import pytest +import requests +import semver + + +@pytest.fixture +def upstream_release() -> semver.VersionInfo: + """Return the latest stable k8s in the release series""" + release_url = "https://dl.k8s.io/release/stable.txt" + r = requests.get(release_url) + r.raise_for_status() + return semver.Version.parse(r.content.decode().lstrip("v")) + + +@pytest.fixture +def current_release() -> semver.VersionInfo: + """Return the current branch k8s version""" + ver_file = ( + Path(__file__).parent / "../../../build-scripts/components/kubernetes/version" + ) + version = ver_file.read_text().strip() + return semver.Version.parse(version.lstrip("v")) diff --git a/tests/branch_management/tests/test_branches.py b/tests/branch_management/tests/test_branches.py new file mode 100644 index 000000000..426fde599 --- /dev/null +++ b/tests/branch_management/tests/test_branches.py @@ -0,0 +1,93 @@ +# +# Copyright 2024 Canonical, Ltd. +# +from pathlib import Path +from subprocess import check_output + +import requests + + +def _get_max_minor(major): + """Get the latest minor release of the provided major. + For example if you use 1 as major you will get back X where X gives you latest 1.X release. + """ + minor = 0 + while _upstream_release_exists(major, minor): + minor += 1 + return minor - 1 + + +def _upstream_release_exists(major, minor): + """Return true if the major.minor release exists""" + release_url = "https://dl.k8s.io/release/stable-{}.{}.txt".format(major, minor) + r = requests.get(release_url) + return r.status_code == 200 + + +def _confirm_branch_exists(branch): + cmd = f"git ls-remote --heads https://github.com/canonical/k8s-snap.git/ {branch}" + output = check_output(cmd.split()).decode("utf-8") + assert branch in output, f"Branch {branch} does not exist" + + +def _branch_flavours(branch: str = None): + patch_dir = Path("build-scripts/patches") + branch = "HEAD" if branch is None else branch + cmd = f"git ls-tree --full-tree -r --name-only {branch} {patch_dir}" + output = check_output(cmd.split()).decode("utf-8") + patches = set( + Path(f).relative_to(patch_dir).parents[0] for f in output.splitlines() + ) + return [p.name for p in patches] + + +def _confirm_recipe(track, flavour): + recipe = f"https://launchpad.net/~containers/k8s/+snap/k8s-snap-{track}-{flavour}" + r = requests.get(recipe) + return r.status_code == 200 + + +def test_branches(upstream_release): + """Ensures git branches exist for prior releases. + + We need to make sure the LP builders pointing to the main github branch are only pushing + to the latest and current k8s edge snap tracks. An indication that this is not enforced is + that we do not have a branch for the k8s release for the previous stable release. Let me + clarify with an example. + + Assuming upstream stable k8s release is v1.12.x, there has to be a 1.11 github branch used + by the respective LP builders for building the v1.11.y. + """ + if upstream_release.minor != 0: + major = upstream_release.major + minor = upstream_release.minor - 1 + else: + major = int(upstream_release.major) - 1 + minor = _get_max_minor(major) + + prior_branch = f"release-{major}.{minor}" + print(f"Current stable is {upstream_release}") + print(f"Checking {prior_branch} branch exists") + _confirm_branch_exists(prior_branch) + flavours = _branch_flavours(prior_branch) + for flavour in flavours: + prior_branch = f"autoupdate/{prior_branch}-{flavour}" + print(f"Checking {prior_branch} branch exists") + _confirm_branch_exists(prior_branch) + + +def test_launchpad_recipe(current_release): + """Ensures the current recipes are available. + + We should ensure that a launchpad recipe exists for this release to be build with + """ + track = f"{current_release.major}.{current_release.minor}" + print(f"Checking {track} recipe exists") + flavours = ["classic"] + _branch_flavours() + recipe_exists = {flavour: _confirm_recipe(track, flavour) for flavour in flavours} + if missing_recipes := [ + flavour for flavour, exists in recipe_exists.items() if not exists + ]: + assert ( + not missing_recipes + ), f"LP Recipes do not exist for {track} {missing_recipes}" diff --git a/tests/branch_management/tox.ini b/tests/branch_management/tox.ini new file mode 100644 index 000000000..371ad51e4 --- /dev/null +++ b/tests/branch_management/tox.ini @@ -0,0 +1,52 @@ +[tox] +no_package = True +skip_missing_interpreters = True +env_list = format, lint, integration +min_version = 4.0.0 + +[testenv] +set_env = + PYTHONBREAKPOINT=pdb.set_trace + PY_COLORS=1 +pass_env = + PYTHONPATH + +[testenv:format] +description = Apply coding style standards to code +deps = -r {tox_root}/requirements-dev.txt +commands = + licenseheaders -t {tox_root}/.copyright.tmpl -cy -o 'Canonical, Ltd' -d {tox_root}/tests + isort {tox_root}/tests --profile=black + black {tox_root}/tests + +[testenv:lint] +description = Check code against coding style standards +deps = -r {tox_root}/requirements-dev.txt +commands = + codespell {tox_root}/tests + flake8 {tox_root}/tests + licenseheaders -t {tox_root}/.copyright.tmpl -cy -o 'Canonical, Ltd' -d {tox_root}/tests --dry + isort {tox_root}/tests --profile=black --check + black {tox_root}/tests --check --diff + +[testenv:test] +description = Run integration tests +deps = + -r {tox_root}/requirements-test.txt +commands = + pytest -v \ + --maxfail 1 \ + --tb native \ + --log-cli-level DEBUG \ + --disable-warnings \ + {posargs} \ + {tox_root}/tests +pass_env = + TEST_* + +[flake8] +max-line-length = 120 +select = E,W,F,C,N +ignore = W503 +exclude = venv,.git,.tox,.tox_env,.venv,build,dist,*.egg_info +show-source = true From 1838222c126ccc3f20320ede7b18e36cd30d3307 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 13 Sep 2024 11:02:12 -0500 Subject: [PATCH 33/45] [main] Update component versions (#660) Co-authored-by: neoaggelos <1888650+neoaggelos@users.noreply.github.com> --- build-scripts/components/containerd/version | 2 +- build-scripts/components/runc/version | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/build-scripts/components/containerd/version b/build-scripts/components/containerd/version index a2ead008e..201df8aec 100644 --- a/build-scripts/components/containerd/version +++ b/build-scripts/components/containerd/version @@ -1 +1 @@ -v1.6.35 +v1.6.36 diff --git a/build-scripts/components/runc/version b/build-scripts/components/runc/version index a829bcbe4..6a99dbb7f 100644 --- a/build-scripts/components/runc/version +++ b/build-scripts/components/runc/version @@ -1 +1 @@ -v1.1.13 +v1.1.14 From 0c30c02584702ad69fa6345774252a3366e265b3 Mon Sep 17 00:00:00 2001 From: Adam Dyess Date: Fri, 13 Sep 2024 18:35:25 -0500 Subject: [PATCH 34/45] Auto-update components in release-1.31 branch (#668) --- .github/workflows/update-components.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/update-components.yaml b/.github/workflows/update-components.yaml index e46bd55df..7f9e43745 100644 --- a/.github/workflows/update-components.yaml +++ b/.github/workflows/update-components.yaml @@ -21,6 +21,7 @@ jobs: # Keep main branch up to date - main # Supported stable release branches + - release-1.31 - release-1.30 steps: From 44aa199b8998a788af5ba296e974a267d2369234 Mon Sep 17 00:00:00 2001 From: Kevin W Monroe Date: Sat, 14 Sep 2024 10:27:40 -0500 Subject: [PATCH 35/45] use lxd 5.21/stable snap (#670) --- .github/workflows/nightly-test.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/nightly-test.yaml b/.github/workflows/nightly-test.yaml index 915d6c3f3..cd661474f 100644 --- a/.github/workflows/nightly-test.yaml +++ b/.github/workflows/nightly-test.yaml @@ -31,7 +31,7 @@ jobs: pip3 install tox==4.13 - name: Install lxd run: | - sudo snap refresh lxd --channel 5.19/stable + sudo snap refresh lxd --channel 5.21/stable sudo lxd init --auto sudo usermod --append --groups lxd $USER sg lxd -c 'lxc version' From cf1daad482798b1774be07dc1dfae7243215613d Mon Sep 17 00:00:00 2001 From: Homayoon Alimohammadi Date: Mon, 16 Sep 2024 13:31:22 +0400 Subject: [PATCH 36/45] Add unit tests for local storage (#665) --- src/k8s/pkg/k8sd/features/localpv/chart.go | 8 +- src/k8s/pkg/k8sd/features/localpv/localpv.go | 14 +- .../pkg/k8sd/features/localpv/localpv_test.go | 155 ++++++++++++++++++ src/k8s/pkg/k8sd/features/localpv/register.go | 2 +- 4 files changed, 167 insertions(+), 12 deletions(-) create mode 100644 src/k8s/pkg/k8sd/features/localpv/localpv_test.go diff --git a/src/k8s/pkg/k8sd/features/localpv/chart.go b/src/k8s/pkg/k8sd/features/localpv/chart.go index 5a655fc57..8dd2248af 100644 --- a/src/k8s/pkg/k8sd/features/localpv/chart.go +++ b/src/k8s/pkg/k8sd/features/localpv/chart.go @@ -7,8 +7,8 @@ import ( ) var ( - // chart represents manifests to deploy Rawfile LocalPV CSI. - chart = helm.InstallableChart{ + // Chart represents manifests to deploy Rawfile LocalPV CSI. + Chart = helm.InstallableChart{ Name: "ck-storage", Namespace: "kube-system", ManifestPath: filepath.Join("charts", "rawfile-csi-0.9.0.tgz"), @@ -16,8 +16,8 @@ var ( // imageRepo is the repository to use for Rawfile LocalPV CSI. imageRepo = "ghcr.io/canonical/rawfile-localpv" - // imageTag is the image tag to use for Rawfile LocalPV CSI. - imageTag = "0.8.0-ck4" + // ImageTag is the image tag to use for Rawfile LocalPV CSI. + ImageTag = "0.8.0-ck4" // csiNodeDriverImage is the image to use for the CSI node driver. csiNodeDriverImage = "ghcr.io/canonical/k8s-snap/sig-storage/csi-node-driver-registrar:v2.10.1" diff --git a/src/k8s/pkg/k8sd/features/localpv/localpv.go b/src/k8s/pkg/k8sd/features/localpv/localpv.go index bd812b443..8555ff088 100644 --- a/src/k8s/pkg/k8sd/features/localpv/localpv.go +++ b/src/k8s/pkg/k8sd/features/localpv/localpv.go @@ -38,13 +38,13 @@ func ApplyLocalStorage(ctx context.Context, snap snap.Snap, cfg types.LocalStora "csiDriverArgs": []string{"--args", "rawfile", "csi-driver", "--disable-metrics"}, "image": map[string]any{ "repository": imageRepo, - "tag": imageTag, + "tag": ImageTag, }, }, "node": map[string]any{ "image": map[string]any{ "repository": imageRepo, - "tag": imageTag, + "tag": ImageTag, }, "storage": map[string]any{ "path": cfg.GetLocalPath(), @@ -58,19 +58,19 @@ func ApplyLocalStorage(ctx context.Context, snap snap.Snap, cfg types.LocalStora }, } - if _, err := m.Apply(ctx, chart, helm.StatePresentOrDeleted(cfg.GetEnabled()), values); err != nil { + if _, err := m.Apply(ctx, Chart, helm.StatePresentOrDeleted(cfg.GetEnabled()), values); err != nil { if cfg.GetEnabled() { err = fmt.Errorf("failed to install rawfile-csi helm package: %w", err) return types.FeatureStatus{ Enabled: false, - Version: imageTag, + Version: ImageTag, Message: fmt.Sprintf(deployFailedMsgTmpl, err), }, err } else { err = fmt.Errorf("failed to delete rawfile-csi helm package: %w", err) return types.FeatureStatus{ Enabled: false, - Version: imageTag, + Version: ImageTag, Message: fmt.Sprintf(deleteFailedMsgTmpl, err), }, err } @@ -79,13 +79,13 @@ func ApplyLocalStorage(ctx context.Context, snap snap.Snap, cfg types.LocalStora if cfg.GetEnabled() { return types.FeatureStatus{ Enabled: true, - Version: imageTag, + Version: ImageTag, Message: fmt.Sprintf(enabledMsg, cfg.GetLocalPath()), }, nil } else { return types.FeatureStatus{ Enabled: false, - Version: imageTag, + Version: ImageTag, Message: disabledMsg, }, nil } diff --git a/src/k8s/pkg/k8sd/features/localpv/localpv_test.go b/src/k8s/pkg/k8sd/features/localpv/localpv_test.go new file mode 100644 index 000000000..783422cbd --- /dev/null +++ b/src/k8s/pkg/k8sd/features/localpv/localpv_test.go @@ -0,0 +1,155 @@ +package localpv_test + +import ( + "context" + "errors" + "testing" + + . "github.com/onsi/gomega" + "k8s.io/utils/ptr" + + "github.com/canonical/k8s/pkg/client/helm" + helmmock "github.com/canonical/k8s/pkg/client/helm/mock" + "github.com/canonical/k8s/pkg/k8sd/features/localpv" + "github.com/canonical/k8s/pkg/k8sd/types" + snapmock "github.com/canonical/k8s/pkg/snap/mock" +) + +func TestDisabled(t *testing.T) { + t.Run("HelmApplyFails", func(t *testing.T) { + g := NewWithT(t) + + applyErr := errors.New("failed to apply") + helmM := &helmmock.Mock{ + ApplyErr: applyErr, + } + snapM := &snapmock.Snap{ + Mock: snapmock.Mock{ + HelmClient: helmM, + }, + } + cfg := types.LocalStorage{ + Enabled: ptr.To(false), + Default: ptr.To(true), + ReclaimPolicy: ptr.To("reclaim-policy"), + LocalPath: ptr.To("local-path"), + } + + status, err := localpv.ApplyLocalStorage(context.Background(), snapM, cfg, nil) + + g.Expect(err).To(MatchError(applyErr)) + g.Expect(status.Enabled).To(BeFalse()) + g.Expect(status.Message).To(ContainSubstring(applyErr.Error())) + g.Expect(status.Version).To(Equal(localpv.ImageTag)) + g.Expect(helmM.ApplyCalledWith).To(HaveLen(1)) + + callArgs := helmM.ApplyCalledWith[0] + g.Expect(callArgs.Chart).To(Equal(localpv.Chart)) + g.Expect(callArgs.State).To(Equal(helm.StateDeleted)) + + validateValues(g, callArgs.Values, cfg) + }) + t.Run("Success", func(t *testing.T) { + g := NewWithT(t) + + helmM := &helmmock.Mock{} + snapM := &snapmock.Snap{ + Mock: snapmock.Mock{ + HelmClient: helmM, + }, + } + cfg := types.LocalStorage{ + Enabled: ptr.To(false), + Default: ptr.To(true), + ReclaimPolicy: ptr.To("reclaim-policy"), + LocalPath: ptr.To("local-path"), + } + + status, err := localpv.ApplyLocalStorage(context.Background(), snapM, cfg, nil) + + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(status.Enabled).To(BeFalse()) + g.Expect(status.Version).To(Equal(localpv.ImageTag)) + g.Expect(helmM.ApplyCalledWith).To(HaveLen(1)) + + callArgs := helmM.ApplyCalledWith[0] + g.Expect(callArgs.Chart).To(Equal(localpv.Chart)) + g.Expect(callArgs.State).To(Equal(helm.StateDeleted)) + + validateValues(g, callArgs.Values, cfg) + }) +} + +func TestEnabled(t *testing.T) { + t.Run("HelmApplyFails", func(t *testing.T) { + g := NewWithT(t) + + applyErr := errors.New("failed to apply") + helmM := &helmmock.Mock{ + ApplyErr: applyErr, + } + snapM := &snapmock.Snap{ + Mock: snapmock.Mock{ + HelmClient: helmM, + }, + } + cfg := types.LocalStorage{ + Enabled: ptr.To(true), + Default: ptr.To(true), + ReclaimPolicy: ptr.To("reclaim-policy"), + LocalPath: ptr.To("local-path"), + } + + status, err := localpv.ApplyLocalStorage(context.Background(), snapM, cfg, nil) + + g.Expect(err).To(MatchError(applyErr)) + g.Expect(status.Enabled).To(BeFalse()) + g.Expect(status.Message).To(ContainSubstring(applyErr.Error())) + g.Expect(status.Version).To(Equal(localpv.ImageTag)) + g.Expect(helmM.ApplyCalledWith).To(HaveLen(1)) + + callArgs := helmM.ApplyCalledWith[0] + g.Expect(callArgs.Chart).To(Equal(localpv.Chart)) + g.Expect(callArgs.State).To(Equal(helm.StatePresent)) + + validateValues(g, callArgs.Values, cfg) + }) + t.Run("Success", func(t *testing.T) { + g := NewWithT(t) + + helmM := &helmmock.Mock{} + snapM := &snapmock.Snap{ + Mock: snapmock.Mock{ + HelmClient: helmM, + }, + } + cfg := types.LocalStorage{ + Enabled: ptr.To(true), + Default: ptr.To(true), + ReclaimPolicy: ptr.To("reclaim-policy"), + LocalPath: ptr.To("local-path"), + } + + status, err := localpv.ApplyLocalStorage(context.Background(), snapM, cfg, nil) + + g.Expect(err).ToNot(HaveOccurred()) + g.Expect(status.Enabled).To(BeTrue()) + g.Expect(status.Version).To(Equal(localpv.ImageTag)) + g.Expect(helmM.ApplyCalledWith).To(HaveLen(1)) + + callArgs := helmM.ApplyCalledWith[0] + g.Expect(callArgs.Chart).To(Equal(localpv.Chart)) + g.Expect(callArgs.State).To(Equal(helm.StatePresent)) + + validateValues(g, callArgs.Values, cfg) + }) +} + +func validateValues(g Gomega, values map[string]any, cfg types.LocalStorage) { + sc := values["storageClass"].(map[string]any) + g.Expect(sc["isDefault"]).To(Equal(cfg.GetDefault())) + g.Expect(sc["reclaimPolicy"]).To(Equal(cfg.GetReclaimPolicy())) + + storage := values["node"].(map[string]any)["storage"].(map[string]any) + g.Expect(storage["path"]).To(Equal(cfg.GetLocalPath())) +} diff --git a/src/k8s/pkg/k8sd/features/localpv/register.go b/src/k8s/pkg/k8sd/features/localpv/register.go index b9f5f644b..084f6a40b 100644 --- a/src/k8s/pkg/k8sd/features/localpv/register.go +++ b/src/k8s/pkg/k8sd/features/localpv/register.go @@ -9,7 +9,7 @@ import ( func init() { images.Register( // Rawfile LocalPV CSI driver images - fmt.Sprintf("%s:%s", imageRepo, imageTag), + fmt.Sprintf("%s:%s", imageRepo, ImageTag), // CSI images csiNodeDriverImage, csiProvisionerImage, From 5d689100b2db6440e072e8d98327d5810ce7de5d Mon Sep 17 00:00:00 2001 From: Lucian Petrut Date: Mon, 16 Sep 2024 13:56:30 +0300 Subject: [PATCH 37/45] k8sd cluster-recover: add non-interactive mode (#662) At the moment, the "k8sd cluster-recover" displays interactive prompts and text editors that assist the user in updating the dqlite configuration. We need to be able to run the command non-interactively in order to automate the cluster recovery procedure. This change adds a "--non-interactive" flag. If set, we'll no longer show confirmation prompts and we'll assume that the configuration files have already been updated, proceeding with the dqlite recovery. --- docs/src/snap/howto/restore-quorum.md | 74 +++++--- src/k8s/cmd/k8sd/k8sd_cluster_recover.go | 232 +++++++++++++---------- 2 files changed, 183 insertions(+), 123 deletions(-) diff --git a/docs/src/snap/howto/restore-quorum.md b/docs/src/snap/howto/restore-quorum.md index aeb15b721..99a8c4e8b 100755 --- a/docs/src/snap/howto/restore-quorum.md +++ b/docs/src/snap/howto/restore-quorum.md @@ -1,9 +1,9 @@ # Recovering a Cluster After Quorum Loss Highly available {{product}} clusters can survive losing one or more -nodes. [Dqlite], the default datastore, implements a [Raft] based protocol where -an elected leader holds the definitive copy of the database, which is then -replicated on two or more secondary nodes. +nodes. [Dqlite], the default datastore, implements a [Raft] based protocol +where an elected leader holds the definitive copy of the database, which is +then replicated on two or more secondary nodes. When the a majority of the nodes are lost, the cluster becomes unavailable. If at least one database node survived, the cluster can be recovered using the @@ -64,8 +64,8 @@ sudo snap stop k8s ## Recover the Database -Choose one of the remaining alive cluster nodes that has the most recent version -of the Raft log. +Choose one of the remaining alive cluster nodes that has the most recent +version of the Raft log. Update the ``cluster.yaml`` files, changing the role of the lost nodes to "spare" (2). Additionally, double check the addresses and IDs specified in @@ -73,7 +73,8 @@ Update the ``cluster.yaml`` files, changing the role of the lost nodes to files were moved across nodes. The following command guides us through the recovery process, prompting a text -editor with informative inline comments for each of the dqlite configuration files. +editor with informative inline comments for each of the dqlite configuration +files. ``` sudo /snap/k8s/current/bin/k8sd cluster-recover \ @@ -82,29 +83,40 @@ sudo /snap/k8s/current/bin/k8sd cluster-recover \ --log-level 0 ``` -Please adjust the log level for additional debug messages by increasing its value. -The command creates database backups before making any changes. +Please adjust the log level for additional debug messages by increasing its +value. The command creates database backups before making any changes. -The above command will reconfigure the Raft members and create recovery tarballs -that are used to restore the lost nodes, once the Dqlite configuration is updated. +The above command will reconfigure the Raft members and create recovery +tarballs that are used to restore the lost nodes, once the Dqlite +configuration is updated. ```{note} -By default, the command will recover both Dqlite databases. If one of the databases -needs to be skipped, use the ``--skip-k8sd`` or ``--skip-k8s-dqlite`` flags. -This can be useful when using an external Etcd database. +By default, the command will recover both Dqlite databases. If one of the +databases needs to be skipped, use the ``--skip-k8sd`` or ``--skip-k8s-dqlite`` +flags. This can be useful when using an external Etcd database. ``` -Once the "cluster-recover" command completes, restart the k8s services on the node: +```{note} +Non-interactive mode can be requested using the ``--non-interactive`` flag. +In this case, no interactive prompts or text editors will be displayed and +the command will assume that the configuration files have already been updated. + +This allows automating the recovery procedure. +``` + +Once the "cluster-recover" command completes, restart the k8s services on the +node: ``` sudo snap start k8s ``` -Ensure that the services started successfully by using ``sudo snap services k8s``. -Use ``k8s status --wait-ready`` to wait for the cluster to become ready. +Ensure that the services started successfully by using +``sudo snap services k8s``. Use ``k8s status --wait-ready`` to wait for the +cluster to become ready. -You may notice that we have not returned to an HA cluster yet: ``high availability: no``. -This is expected as we need to recover +You may notice that we have not returned to an HA cluster yet: +``high availability: no``. This is expected as we need to recover ## Recover the remaining nodes @@ -113,28 +125,34 @@ nodes. For k8sd, copy ``recovery_db.tar.gz`` to ``/var/snap/k8s/common/var/lib/k8sd/state/recovery_db.tar.gz``. When the k8sd -service starts, it will load the archive and perform the necessary recovery steps. +service starts, it will load the archive and perform the necessary recovery +steps. The k8s-dqlite archive needs to be extracted manually. First, create a backup of the current k8s-dqlite state directory: ``` -sudo mv /var/snap/k8s/common/var/lib/k8s-dqlite /var/snap/k8s/common/var/lib/k8s-dqlite.bkp +sudo mv /var/snap/k8s/common/var/lib/k8s-dqlite \ + /var/snap/k8s/common/var/lib/k8s-dqlite.bkp ``` Then, extract the backup archive: ``` sudo mkdir /var/snap/k8s/common/var/lib/k8s-dqlite -sudo tar xf recovery-k8s-dqlite-$timestamp-post-recovery.tar.gz -C /var/snap/k8s/common/var/lib/k8s-dqlite +sudo tar xf recovery-k8s-dqlite-$timestamp-post-recovery.tar.gz \ + -C /var/snap/k8s/common/var/lib/k8s-dqlite ``` Node specific files need to be copied back to the k8s-dqlite state dir: ``` -sudo cp /var/snap/k8s/common/var/lib/k8s-dqlite.bkp/cluster.crt /var/snap/k8s/common/var/lib/k8s-dqlite -sudo cp /var/snap/k8s/common/var/lib/k8s-dqlite.bkp/cluster.key /var/snap/k8s/common/var/lib/k8s-dqlite -sudo cp /var/snap/k8s/common/var/lib/k8s-dqlite.bkp/info.yaml /var/snap/k8s/common/var/lib/k8s-dqlite +sudo cp /var/snap/k8s/common/var/lib/k8s-dqlite.bkp/cluster.crt \ + /var/snap/k8s/common/var/lib/k8s-dqlite +sudo cp /var/snap/k8s/common/var/lib/k8s-dqlite.bkp/cluster.key \ + /var/snap/k8s/common/var/lib/k8s-dqlite +sudo cp /var/snap/k8s/common/var/lib/k8s-dqlite.bkp/info.yaml \ + /var/snap/k8s/common/var/lib/k8s-dqlite ``` Once these steps are completed, restart the k8s services: @@ -143,13 +161,15 @@ Once these steps are completed, restart the k8s services: sudo snap start k8s ``` -Repeat these steps for all remaining nodes. Once a quorum is achieved, the cluster -will be reported as "highly available": +Repeat these steps for all remaining nodes. Once a quorum is achieved, +the cluster will be reported as "highly available": ``` $ sudo k8s status cluster status: ready -control plane nodes: 10.80.130.168:6400 (voter), 10.80.130.167:6400 (voter), 10.80.130.164:6400 (voter) +control plane nodes: 10.80.130.168:6400 (voter), + 10.80.130.167:6400 (voter), + 10.80.130.164:6400 (voter) high availability: yes datastore: k8s-dqlite network: enabled diff --git a/src/k8s/cmd/k8sd/k8sd_cluster_recover.go b/src/k8s/cmd/k8sd/k8sd_cluster_recover.go index 766acc49f..204dbbabd 100755 --- a/src/k8s/cmd/k8sd/k8sd_cluster_recover.go +++ b/src/k8s/cmd/k8sd/k8sd_cluster_recover.go @@ -28,7 +28,7 @@ import ( "github.com/canonical/k8s/pkg/utils" ) -const recoveryConfirmation = `You should only run this command if: +const preRecoveryMessage = `You should only run this command if: - A quorum of cluster members is permanently lost - You are *absolutely* sure all k8s daemons are stopped (sudo snap stop k8s) - This instance has the most up to date database @@ -36,8 +36,17 @@ const recoveryConfirmation = `You should only run this command if: Note that before applying any changes, a database backup is created at: * k8sd (microcluster): /var/snap/k8s/common/var/lib/k8sd/state/db_backup..tar.gz * k8s-dqlite: /var/snap/k8s/common/recovery-k8s-dqlite--pre-recovery.tar.gz +` + +const recoveryConfirmation = "Do you want to proceed? (yes/no): " + +const nonInteractiveMessage = `Non-interactive mode requested. -Do you want to proceed? (yes/no): ` +The command will assume that the dqlite configuration files have already been +modified with the updated cluster member roles and addresses. + +Initiating the dqlite database recovery. +` const clusterK8sdYamlRecoveryComment = `# Member roles can be modified. Unrecoverable nodes should be given the role "spare". # @@ -75,6 +84,7 @@ const yamlHelperCommentFooter = "# ------- everything below will be written ---- var clusterRecoverOpts struct { K8sDqliteStateDir string + NonInteractive bool SkipK8sd bool SkipK8sDqlite bool } @@ -145,6 +155,8 @@ func newClusterRecoverCmd() *cobra.Command { cmd.Flags().StringVar(&clusterRecoverOpts.K8sDqliteStateDir, "k8s-dqlite-state-dir", "", "k8s-dqlite datastore location") + cmd.Flags().BoolVar(&clusterRecoverOpts.NonInteractive, "non-interactive", + false, "disable interactive prompts, assume that the configs have been updated") cmd.Flags().BoolVar(&clusterRecoverOpts.SkipK8sd, "skip-k8sd", false, "skip k8sd recovery") cmd.Flags().BoolVar(&clusterRecoverOpts.SkipK8sDqlite, "skip-k8s-dqlite", @@ -171,8 +183,8 @@ func recoveryCmdPrechecks(ctx context.Context) error { log.V(1).Info("Running prechecks.") - if !termios.IsTerminal(unix.Stdin) { - return fmt.Errorf("this command is meant to be run in an interactive terminal") + if !termios.IsTerminal(unix.Stdin) && !clusterRecoverOpts.NonInteractive { + return fmt.Errorf("interactive mode requested in a non-interactive terminal") } if clusterRecoverOpts.K8sDqliteStateDir == "" { @@ -182,21 +194,31 @@ func recoveryCmdPrechecks(ctx context.Context) error { return fmt.Errorf("k8sd state dir not specified") } - reader := bufio.NewReader(os.Stdin) - fmt.Print(recoveryConfirmation) + fmt.Print(preRecoveryMessage) + fmt.Print("\n") - input, err := reader.ReadString('\n') - if err != nil { - return fmt.Errorf("couldn't read user input, error: %w", err) - } - input = strings.TrimSuffix(input, "\n") + if clusterRecoverOpts.NonInteractive { + fmt.Print(nonInteractiveMessage) + fmt.Print("\n") + } else { + reader := bufio.NewReader(os.Stdin) + fmt.Print(recoveryConfirmation) + + input, err := reader.ReadString('\n') + if err != nil { + return fmt.Errorf("couldn't read user input, error: %w", err) + } + input = strings.TrimSuffix(input, "\n") - if strings.ToLower(input) != "yes" { - return fmt.Errorf("cluster edit aborted; no changes made") + if strings.ToLower(input) != "yes" { + return fmt.Errorf("cluster edit aborted; no changes made") + } + + fmt.Print("\n") } if !clusterRecoverOpts.SkipK8sDqlite { - if err = ensureK8sDqliteMembersStopped(ctx); err != nil { + if err := ensureK8sDqliteMembersStopped(ctx); err != nil { return err } } @@ -376,59 +398,64 @@ func recoverK8sd() (string, error) { clusterYamlPath := path.Join(m.FileSystem.DatabaseDir, "cluster.yaml") clusterYamlCommentHeader := fmt.Sprintf("# K8sd cluster configuration\n# (based on the trust store and %s)\n", clusterYamlPath) - clusterYamlContent, err := yamlEditorGuide( - "", - false, - slices.Concat( - []byte(clusterYamlCommentHeader), - []byte("#\n"), - []byte(clusterK8sdYamlRecoveryComment), - []byte(yamlHelperCommentFooter), - []byte("\n"), - oldMembersYaml, - ), - false, - ) - if err != nil { - return "", fmt.Errorf("interactive text editor failed, error: %w", err) - } + clusterYamlContent := oldMembersYaml + if !clusterRecoverOpts.NonInteractive { + // Interactive mode requested (default). + // Assist the user in configuring dqlite. + clusterYamlContent, err = yamlEditorGuide( + "", + false, + slices.Concat( + []byte(clusterYamlCommentHeader), + []byte("#\n"), + []byte(clusterK8sdYamlRecoveryComment), + []byte(yamlHelperCommentFooter), + []byte("\n"), + oldMembersYaml, + ), + false, + ) + if err != nil { + return "", fmt.Errorf("interactive text editor failed, error: %w", err) + } - infoYamlPath := path.Join(m.FileSystem.DatabaseDir, "info.yaml") - infoYamlCommentHeader := fmt.Sprintf("# K8sd info.yaml\n# (%s)\n", infoYamlPath) - _, err = yamlEditorGuide( - infoYamlPath, - true, - slices.Concat( - []byte(infoYamlCommentHeader), - []byte("#\n"), - []byte(infoYamlRecoveryComment), - utils.YamlCommentLines(clusterYamlContent), - []byte("\n"), - []byte(yamlHelperCommentFooter), - ), - true, - ) - if err != nil { - return "", fmt.Errorf("interactive text editor failed, error: %w", err) - } + infoYamlPath := path.Join(m.FileSystem.DatabaseDir, "info.yaml") + infoYamlCommentHeader := fmt.Sprintf("# K8sd info.yaml\n# (%s)\n", infoYamlPath) + _, err = yamlEditorGuide( + infoYamlPath, + true, + slices.Concat( + []byte(infoYamlCommentHeader), + []byte("#\n"), + []byte(infoYamlRecoveryComment), + utils.YamlCommentLines(clusterYamlContent), + []byte("\n"), + []byte(yamlHelperCommentFooter), + ), + true, + ) + if err != nil { + return "", fmt.Errorf("interactive text editor failed, error: %w", err) + } - daemonYamlPath := path.Join(m.FileSystem.StateDir, "daemon.yaml") - daemonYamlCommentHeader := fmt.Sprintf("# K8sd daemon.yaml\n# (%s)\n", daemonYamlPath) - _, err = yamlEditorGuide( - daemonYamlPath, - true, - slices.Concat( - []byte(daemonYamlCommentHeader), - []byte("#\n"), - []byte(daemonYamlRecoveryComment), - utils.YamlCommentLines(clusterYamlContent), - []byte("\n"), - []byte(yamlHelperCommentFooter), - ), - true, - ) - if err != nil { - return "", fmt.Errorf("interactive text editor failed, error: %w", err) + daemonYamlPath := path.Join(m.FileSystem.StateDir, "daemon.yaml") + daemonYamlCommentHeader := fmt.Sprintf("# K8sd daemon.yaml\n# (%s)\n", daemonYamlPath) + _, err = yamlEditorGuide( + daemonYamlPath, + true, + slices.Concat( + []byte(daemonYamlCommentHeader), + []byte("#\n"), + []byte(daemonYamlRecoveryComment), + utils.YamlCommentLines(clusterYamlContent), + []byte("\n"), + []byte(yamlHelperCommentFooter), + ), + true, + ) + if err != nil { + return "", fmt.Errorf("interactive text editor failed, error: %w", err) + } } newMembers := []cluster.DqliteMember{} @@ -465,40 +492,53 @@ func recoverK8sd() (string, error) { func recoverK8sDqlite() (string, string, error) { k8sDqliteStateDir := clusterRecoverOpts.K8sDqliteStateDir + var err error + clusterYamlContent := []byte{} clusterYamlPath := path.Join(k8sDqliteStateDir, "cluster.yaml") clusterYamlCommentHeader := fmt.Sprintf("# k8s-dqlite cluster configuration\n# (%s)\n", clusterYamlPath) - clusterYamlContent, err := yamlEditorGuide( - clusterYamlPath, - true, - slices.Concat( - []byte(clusterYamlCommentHeader), - []byte("#\n"), - []byte(clusterK8sDqliteRecoveryComment), - []byte(yamlHelperCommentFooter), - ), - true, - ) - if err != nil { - return "", "", fmt.Errorf("interactive text editor failed, error: %w", err) - } - infoYamlPath := path.Join(k8sDqliteStateDir, "info.yaml") - infoYamlCommentHeader := fmt.Sprintf("# k8s-dqlite info.yaml\n# (%s)\n", infoYamlPath) - _, err = yamlEditorGuide( - infoYamlPath, - true, - slices.Concat( - []byte(infoYamlCommentHeader), - []byte("#\n"), - []byte(infoYamlRecoveryComment), - utils.YamlCommentLines(clusterYamlContent), - []byte("\n"), - []byte(yamlHelperCommentFooter), - ), - true, - ) - if err != nil { - return "", "", fmt.Errorf("interactive text editor failed, error: %w", err) + if clusterRecoverOpts.NonInteractive { + clusterYamlContent, err = os.ReadFile(clusterYamlPath) + if err != nil { + return "", "", fmt.Errorf( + "could not read k8s-dqlite cluster.yaml, error: %w", err) + } + } else { + // Interactive mode requested (default). + // Assist the user in configuring dqlite. + clusterYamlContent, err = yamlEditorGuide( + clusterYamlPath, + true, + slices.Concat( + []byte(clusterYamlCommentHeader), + []byte("#\n"), + []byte(clusterK8sDqliteRecoveryComment), + []byte(yamlHelperCommentFooter), + ), + true, + ) + if err != nil { + return "", "", fmt.Errorf("interactive text editor failed, error: %w", err) + } + + infoYamlPath := path.Join(k8sDqliteStateDir, "info.yaml") + infoYamlCommentHeader := fmt.Sprintf("# k8s-dqlite info.yaml\n# (%s)\n", infoYamlPath) + _, err = yamlEditorGuide( + infoYamlPath, + true, + slices.Concat( + []byte(infoYamlCommentHeader), + []byte("#\n"), + []byte(infoYamlRecoveryComment), + utils.YamlCommentLines(clusterYamlContent), + []byte("\n"), + []byte(yamlHelperCommentFooter), + ), + true, + ) + if err != nil { + return "", "", fmt.Errorf("interactive text editor failed, error: %w", err) + } } newMembers := []dqlite.NodeInfo{} From f899f5c1ddcd920b17ab87f76d6eb538b96c1342 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 16 Sep 2024 12:52:44 -0500 Subject: [PATCH 38/45] [main] Update component versions (#674) Co-authored-by: addyess <10090033+addyess@users.noreply.github.com> --- build-scripts/components/cni/version | 2 +- build-scripts/components/kubernetes/version | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/build-scripts/components/cni/version b/build-scripts/components/cni/version index 2e7bd9108..53b5bbb12 100644 --- a/build-scripts/components/cni/version +++ b/build-scripts/components/cni/version @@ -1 +1 @@ -v1.5.0 +v1.5.1 diff --git a/build-scripts/components/kubernetes/version b/build-scripts/components/kubernetes/version index d3aa76971..085dad940 100644 --- a/build-scripts/components/kubernetes/version +++ b/build-scripts/components/kubernetes/version @@ -1 +1 @@ -v1.31.0 +v1.31.1 From bccb61677456342999fbba39f4d642c4cd6e968d Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Tue, 17 Sep 2024 15:38:12 +0100 Subject: [PATCH 39/45] Add epa explanation docs (#595) --------- Co-authored-by: Yanisa Haley Scherber --- docs/src/snap/explanation/epa.md | 544 +++++++++++++++++++++++++++++ docs/src/snap/explanation/index.md | 1 + 2 files changed, 545 insertions(+) create mode 100644 docs/src/snap/explanation/epa.md diff --git a/docs/src/snap/explanation/epa.md b/docs/src/snap/explanation/epa.md new file mode 100644 index 000000000..a0884df55 --- /dev/null +++ b/docs/src/snap/explanation/epa.md @@ -0,0 +1,544 @@ +# Enhanced Platform Awareness + +Enhanced Platform Awareness (EPA) is a methodology and a set of enhancements +across various layers of the orchestration stack. + +EPA focuses on discovering, scheduling and isolating server hardware +capabilities. This document provides a detailed guide of how EPA applies to +{{product}}, which centre around the following technologies: + +- **HugePage support**: In GA from Kubernetes v1.14, this feature enables the + discovery, scheduling and allocation of HugePages as a first-class + resource. +- **Real-time kernel**: Ensures that high-priority tasks are run within a + predictable time frame, providing the low latency and high determinism + essential for time-sensitive applications. +- **CPU pinning** (CPU Manager for Kubernetes (CMK)): In GA from Kubernetes + v1.26, provides mechanisms for CPU pinning and isolation of containerised + workloads. +- **NUMA topology awareness**: Ensures that CPU and memory allocation are + aligned according to the NUMA architecture, reducing memory latency and + increasing performance for memory-intensive applications. +- **Single Root I/O Virtualization (SR-IOV)**: Enhances networking by enabling + virtualisation of a single physical network device into multiple virtual + devices. +- **DPDK (Data Plane Development Kit)**: A set of libraries and drivers for + fast packet processing, designed to run in user space, optimising network + performance and reducing latency. + +This document provides relevant links to detailed instructions for setting up +and installing these technologies. It is designed for developers +and architects who wish to integrate these new technologies into their +{{product}}-based networking solutions. + +## HugePages + +HugePages are a feature in the Linux kernel which enables the allocation of +larger memory pages. This reduces the overhead of managing large amounts of +memory and can improve performance for applications that require significant +memory access. + +### Key features + +- **Larger memory pages**: HugePages provide larger memory pages (e.g., 2MB or + 1GB) compared to the standard 4KB pages, reducing the number of pages the + system must manage. +- **Reduced overhead**: By using fewer, larger pages, the system reduces the + overhead associated with page table entries, leading to improved memory + management efficiency. +- **Improved TLB performance**: The Translation Lookaside Buffer (TLB) stores + recent translations of virtual memory to physical memory addresses. Using + HugePages increases TLB hit rates, reducing the frequency of memory + translation lookups. +- **Enhanced application performance**: Applications that access large amounts + of memory can benefit from HugePages by experiencing lower latency and + higher throughput due to reduced page faults and better memory access + patterns. +- **Support for high-performance workloads**: Ideal for high-performance + computing (HPC) applications, databases and other memory-intensive + workloads that demand efficient and fast memory access. +- **Native Kubernetes integration**: Starting from Kubernetes v1.14, HugePages + are supported as a native, first-class resource, enabling their + discovery, scheduling and allocation within Kubernetes environments. + +### Application to Kubernetes + +The architecture for HugePages on Kubernetes integrates the management and +allocation of large memory pages into the Kubernetes orchestration system. Here +are the key architectural components and their roles: + +- **Node configuration**: Each Kubernetes node must be configured to reserve + HugePages. This involves setting the number of HugePages in the node's + kernel boot parameters. +- **Kubelet configuration**: The `kubelet` on each node must be configured to + recognise and manage HugePages. This is typically done through the `kubelet` + configuration file, specifying the size and number of HugePages. +- **Pod specification**: HugePages are requested and allocated at the pod + level through resource requests and limits in the pod specification. Pods + can request specific sizes of HugePages (e.g., 2MB or 1GB). +- **Scheduler awareness**: The Kubernetes scheduler is aware of HugePages as a + resource and schedules pods onto nodes that have sufficient HugePages + available. This ensures that pods with HugePages requirements are placed + appropriately. Scheduler configurations and policies can be adjusted to + optimise HugePages allocation and utilisation. +- **Node Feature Discovery (NFD)**: Node Feature Discovery can be used to + label nodes with their HugePages capabilities. This enables scheduling + decisions to be based on the available HugePages resources. +- **Resource quotas and limits**: Kubernetes enables the definition of resource + quotas and limits to control the allocation of HugePages across namespaces. + This helps in managing and isolating resource usage effectively. +- **Monitoring and metrics**: Kubernetes provides tools and integrations + (e.g., Prometheus, Grafana) to monitor and visualise HugePages usage across + the cluster. This helps in tracking resource utilisation and performance. + Metrics can include HugePages allocation, usage and availability on each + node, aiding in capacity planning and optimization. + +## Real-time kernel + +A real-time kernel ensures that high-priority tasks are run within a +predictable timeframe, crucial for applications requiring low latency and high +determinism. Note that this can also impede applications which were not +designed with these considerations. + +### Key features + +- **Predictable task execution**: A real-time kernel ensures that + high-priority tasks are run within a predictable and bounded timeframe, + reducing the variability in task execution time. +- **Low latency**: The kernel is optimised to minimise the time it takes to + respond to high-priority tasks, which is crucial for applications that + require immediate processing. +- **Priority-based scheduling**: Tasks are scheduled based on their priority + levels, with real-time tasks being given precedence over other types of + tasks to ensure they are processed promptly. +- **Deterministic behaviour**: The kernel guarantees deterministic behaviour, + meaning the same task will have the same response time every time it is + run, essential for time-sensitive applications. +- **Pre-emption:** The real-time kernel supports preemptive multitasking, + allowing high-priority tasks to interrupt lower-priority tasks to ensure + critical tasks are run without delay. +- **Resource reservation**: System resources (such as CPU and memory) can be + reserved by the kernel for real-time tasks, ensuring that these resources + are available when needed. +- **Enhanced interrupt handling**: Interrupt handling is optimised to ensure + minimal latency and jitter, which is critical for maintaining the + performance of real-time applications. +- **Real-time scheduling policies**: The kernel includes specific scheduling + policies (e.g., SCHED\_FIFO, SCHED\_RR) designed to manage real-time tasks + effectively and ensure they meet their deadlines. + +These features make a real-time kernel ideal for applications requiring precise +timing and high reliability. + +### Application to Kubernetes + +The architecture for integrating a real-time kernel into Kubernetes involves +several components and configurations to ensure that high-priority, low-latency +tasks can be managed effectively within a Kubernetes environment. Here are the +key architectural components and their roles: + +- **Real-time kernel installation**: Each Kubernetes node must run a real-time + kernel. This involves installing a real-time kernel package and configuring + the system to use it. +- **Kernel boot parameters**: The kernel boot parameters must be configured to + optimise for real-time performance. This includes isolating CPU cores and + configuring other kernel parameters for real-time behaviour. +- **Kubelet configuration**: The `kubelet` on each node must be configured to + recognise and manage real-time workloads. This can involve setting specific + `kubelet` flags and configurations. +- **Pod specification**: Real-time workloads are specified at the pod level + through resource requests and limits. Pods can request dedicated CPU cores + and other resources to ensure they meet real-time requirements. +- **CPU Manager**: Kubernetes’ CPU Manager is a critical component for + real-time workloads. It enables the static allocation of CPUs to + containers, ensuring that specific CPU cores are dedicated to particular + workloads. +- **Scheduler awareness**: The Kubernetes scheduler must be aware of real-time + requirements and prioritise scheduling pods onto nodes with available + real-time resources. +- **Priority and preemption**: Kubernetes supports priority and preemption to + ensure that critical real-time pods are scheduled and run as needed. This + involves defining pod priorities and enabling preemption to ensure + high-priority pods can displace lower-priority ones if necessary. +- **Resource quotas and limits**: Kubernetes can define resource quotas + and limits to control the allocation of resources for real-time workloads + across namespaces. This helps manage and isolate resource usage effectively. +- **Monitoring and metrics**: Monitoring tools such as Prometheus and Grafana + can be used to track the performance and resource utilisation of real-time + workloads. Metrics include CPU usage, latency and task scheduling times, + which help in optimising and troubleshooting real-time applications. +- **Security and isolation**: Security contexts and isolation mechanisms + ensure that real-time workloads are protected and run in a controlled + environment. This includes setting privileged containers and configuring + namespaces. + +## CPU pinning + +CPU pinning enables specific CPU cores to be dedicated to a particular process +or container, ensuring that the process runs on the same CPU core(s) every +time, which reduces context switching and cache invalidation. + +### Key features + +- **Dedicated CPU Cores**: CPU pinning allocates specific CPU cores to a + process or container, ensuring consistent and predictable CPU usage. +- **Reduced context switching**: By running a process or container on the same + CPU core(s), CPU pinning minimises the overhead associated with context + switching, leading to better performance. +- **Improved cache utilisation**: When a process runs on a dedicated CPU core, + it can take full advantage of the CPU cache, reducing the need to fetch data + from main memory and improving overall performance. +- **Enhanced application performance**: Applications that require low latency + and high performance benefit from CPU pinning as it ensures they have + dedicated processing power without interference from other processes. +- **Consistent performance**: CPU pinning ensures that a process or container + receives consistent CPU performance, which is crucial for real-time and + performance-sensitive applications. +- **Isolation of workloads**: CPU pinning isolates workloads on specific CPU + cores, preventing them from being affected by other workloads running on + different cores. This is especially useful in multi-tenant environments. +- **Improved predictability**: By eliminating the variability introduced by + sharing CPU cores, CPU pinning provides more predictable performance + characteristics for critical applications. +- **Integration with Kubernetes**: Kubernetes supports CPU pinning through the + CPU Manager (in GA since v1.26), which allows for the static allocation of + CPUs to containers. This ensures that containers with high CPU demands have + the necessary resources. + +### Application to Kubernetes + +The architecture for CPU pinning in Kubernetes involves several components and +configurations to ensure that specific CPU cores can be dedicated to particular +processes or containers, thereby enhancing performance and predictability. Here +are the key architectural components and their roles: + +- **Kubelet configuration**: The `kubelet` on each node must be configured to + enable CPU pinning. This involves setting specific `kubelet` flags to + activate the CPU Manager. +- **CPU manager**: Kubernetes’ CPU Manager is a critical component for CPU + pinning. It allows for the static allocation of CPUs to containers, ensuring + that specific CPU cores are dedicated to particular workloads. The CPU + Manager can be configured to either static or none. Static policy enables + exclusive CPU core allocation to Guaranteed QoS (Quality of Service) pods. +- **Pod specification**: Pods must be specified to request dedicated CPU + resources. This is done through resource requests and limits in the pod + specification. +- **Scheduler awareness**: The Kubernetes scheduler must be aware of the CPU + pinning requirements. It schedules pods onto nodes with available CPU + resources as requested by the pod specification. The scheduler ensures that + pods with specific CPU pinning requests are placed on nodes with sufficient + free dedicated CPUs. +- **NUMA Topology Awareness**: For optimal performance, CPU pinning should be + aligned with NUMA (Non-Uniform Memory Access) topology. This ensures that + memory accesses are local to the CPU, reducing latency. Kubernetes can be + configured to be NUMA-aware, using the Topology Manager to align CPU + and memory allocation with NUMA nodes. +- **Node Feature Discovery (NFD)**: Node Feature Discovery can be used to + label nodes with their CPU capabilities, including the availability of + isolated and reserved CPU cores. +- **Resource quotas and limits**: Kubernetes can define resource quotas + and limits to control the allocation of CPU resources across namespaces. + This helps in managing and isolating resource usage effectively. +- **Monitoring and metrics**: Monitoring tools such as Prometheus and Grafana + can be used to track the performance and resource utilisation of CPU-pinned + workloads. Metrics include CPU usage, core allocation and task scheduling + times, which help in optimising and troubleshooting performance-sensitive + applications. +- **Isolation and security**: Security contexts and isolation mechanisms + ensure that CPU-pinned workloads are protected and run in a controlled + environment. This includes setting privileged containers and configuring + namespaces to avoid resource contention. +- **Performance Tuning**: Additional performance tuning can be achieved by + isolating CPU cores at the OS level and configuring kernel parameters to + minimise interference from other processes. This includes setting CPU + isolation and `nohz_full` parameters (reduces the number of scheduling-clock + interrupts, improving energy efficiency and [reducing OS jitter][no_hz]). + +## NUMA topology awareness + +NUMA (Non-Uniform Memory Access) topology awareness ensures that the CPU and +memory allocation are aligned according to the NUMA architecture, which can +reduce memory latency and increase performance for memory-intensive +applications. + +The Kubernetes Memory Manager enables the feature of guaranteed memory (and +HugePages) allocation for pods in the Guaranteed QoS (Quality of Service) +class. + +The Memory Manager employs hint generation protocol to yield the most suitable +NUMA affinity for a pod. The Memory Manager feeds the central manager (Topology +Manager) with these affinity hints. Based on both the hints and Topology +Manager policy, the pod is rejected or admitted to the node. + +Moreover, the Memory Manager ensures that the memory which a pod requests is +allocated from a minimum number of NUMA nodes. + +### Key features + +- **Aligned CPU and memory allocation**: NUMA topology awareness ensures that + CPUs and memory are allocated in alignment with the NUMA architecture, + minimising cross-node memory access latency. +- **Reduced memory latency**: By ensuring that memory is accessed from the + same NUMA node as the CPU, NUMA topology awareness reduces memory latency, + leading to improved performance for memory-intensive applications. +- **Increased performance**: Applications benefit from increased performance + due to optimised memory access patterns, which is especially critical for + high-performance computing and data-intensive tasks. +- **Kubernetes Memory Manager**: The Kubernetes Memory Manager supports + guaranteed memory allocation for pods in the Guaranteed QoS (Quality of + Service) class, ensuring predictable performance. +- **Hint generation protocol**: The Memory Manager uses a hint generation + protocol to determine the most suitable NUMA affinity for a pod, helping to + optimise resource allocation based on NUMA topology. +- **Integration with Topology Manager**: The Memory Manager provides NUMA + affinity hints to the Topology Manager. The Topology Manager then decides + whether to admit or reject the pod based on these hints and the configured + policy. +- **Optimised resource allocation**: The Memory Manager ensures that the + memory requested by a pod is allocated from the minimum number of NUMA + nodes, thereby optimising resource usage and performance. +- **Enhanced scheduling decisions**: The Kubernetes scheduler, in conjunction + with the Topology Manager, makes informed decisions about pod placement to + ensure optimal NUMA alignment, improving overall cluster efficiency. +- **Support for HugePages**: The Memory Manager also supports the allocation + of HugePages, ensuring that large memory pages are allocated in a NUMA-aware + manner, further enhancing performance for applications that require large + memory pages. +- **Improved application predictability**: By aligning CPU and memory + allocation with NUMA topology, applications experience more predictable + performance characteristics, crucial for real-time and latency-sensitive + workloads. +- **Policy-Based Management**: NUMA topology awareness can be managed through + policies so that administrators can configure how resources should be + allocated based on the NUMA architecture, providing flexibility and control. + +### Application to Kubernetes + +The architecture for NUMA topology awareness in Kubernetes involves several +components and configurations to ensure that CPU and memory allocations are +optimised according to the NUMA architecture. This setup reduces memory latency +and enhances performance for memory intensive applications. Here are the key +architectural components and their roles: + +- **Node configuration**: Each Kubernetes node must have NUMA-aware hardware. + The system's NUMA topology can be inspected using tools such as `lscpu` or + `numactl`. +- **Kubelet configuration**: The `kubelet` on each node must be configured to + enable NUMA topology awareness. This involves setting specific `kubelet` + flags to activate the Topology Manager. +- **Topology Manager**: The Topology Manager is a critical component that + coordinates resource allocation based on NUMA topology. It receives NUMA + affinity hints from other managers (e.g., CPU Manager, Device Manager) and + makes informed scheduling decisions. +- **Memory Manager**: The Kubernetes Memory Manager is responsible for + managing memory allocation, including HugePages, in a NUMA-aware manner. It + ensures that memory is allocated from the minimum number of NUMA nodes + required. The Memory Manager uses a hint generation protocol to provide NUMA + affinity hints to the Topology Manager. +- **Pod specification**: Pods can be specified to request NUMA-aware resource + allocation through resource requests and limits, ensuring that they get + allocated in alignment with the NUMA topology. +- **Scheduler awareness**: The Kubernetes scheduler works in conjunction with + the Topology Manager to place pods on nodes that meet their NUMA affinity + requirements. The scheduler considers NUMA topology during the scheduling + process to optimise performance. +- **Node Feature Discovery (NFD)**: Node Feature Discovery can be used to + label nodes with their NUMA capabilities, providing the scheduler with + information to make more informed placement decisions. +- **Resource quotas and limits**: Kubernetes allows defining resource quotas + and limits to control the allocation of NUMA-aware resources across + namespaces. This helps in managing and isolating resource usage effectively. +- **Monitoring and metrics**: Monitoring tools such as Prometheus and Grafana + can be used to track the performance and resource utilisation of NUMA-aware + workloads. Metrics include CPU and memory usage per NUMA node, helping in + optimising and troubleshooting performance-sensitive applications. +- **Isolation and security**: Security contexts and isolation mechanisms + ensure that NUMA-aware workloads are protected and run in a controlled + environment. This includes setting privileged containers and configuring + namespaces to avoid resource contention. +- **Performance tuning**: Additional performance tuning can be achieved by + configuring kernel parameters and using tools like `numactl` to bind + processes to specific NUMA nodes. + +## SR-IOV (Single Root I/O Virtualization) + +SR-IOV enables a single physical network device to appear as multiple separate +virtual devices. This can be beneficial for network-intensive applications that +require direct access to the network hardware. + +### Key features + +- **Multiple Virtual Functions (VFs)**: SR-IOV enables a single physical + network device to be partitioned into multiple virtual functions (VFs), each + of which can be assigned to a virtual machine or container as a separate + network interface. +- **Direct hardware access**: By providing direct access to the physical + network device, SR-IOV bypasses the software-based network stack, reducing + overhead and improving network performance and latency. +- **Improved network throughput**: Applications can achieve higher network + throughput as SR-IOV enables high-speed data transfer directly + between the network device and the application. +- **Reduced CPU utilisation**: Offloading network processing to the hardware + reduces the CPU load on the host system, freeing up CPU resources for other + tasks and improving overall system performance. +- **Isolation and security**: Each virtual function (VF) is isolated from + others, providing security and stability. This isolation ensures that issues + in one VF do not affect other VFs or the physical function (PF). +- **Dynamic resource allocation**: SR-IOV supports dynamic allocation of + virtual functions, enabling resources to be adjusted based on application + demands without requiring changes to the physical hardware setup. +- **Enhanced virtualisation support**: SR-IOV is particularly beneficial in + virtualised environments, enabling better network performance for virtual + machines and containers by providing them with dedicated network interfaces. +- **Kubernetes integration**: Kubernetes supports SR-IOV through the use of + network device plugins, enabling the automatic discovery, allocation, + and management of virtual functions. +- **Compatibility with Network Functions Virtualization (NFV)**: SR-IOV is + widely used in NFV deployments to meet the high-performance networking + requirements of virtual network functions (VNFs), such as firewalls, + routers and load balancers. +- **Reduced network latency**: As network packets can bypass the + hypervisor's virtual switch, SR-IOV significantly reduces network latency, + making it ideal for latency-sensitive applications. + +### Application to Kubernetes + +The architecture for SR-IOV (Single Root I/O Virtualization) in Kubernetes +involves several components and configurations to ensure that virtual functions +(VFs) from a single physical network device can be managed and allocated +efficiently. This setup enhances network performance and provides direct access +to network hardware for applications requiring high throughput and low latency. +Here are the key architectural components and their roles: + +- **Node configuration**: Each Kubernetes node with SR-IOV capable hardware + must have the SR-IOV drivers and tools installed. This includes the SR-IOV + network device plugin and associated drivers. +- **SR-IOV enabled network interface**: The physical network interface card + (NIC) must be configured to support SR-IOV. This involves enabling SR-IOV in + the system BIOS and configuring the NIC to create virtual functions (VFs). +- **SR-IOV network device plugin**: The SR-IOV network device plugin is + deployed as a DaemonSet in Kubernetes. It discovers SR-IOV capable network + interfaces and manages the allocation of virtual functions (VFs) to pods. +- **Device Plugin Configuration**: The SR-IOV device plugin requires a + configuration file that specifies the network devices and the number of + virtual functions (VFs) to be managed. +- **Pod specification**: Pods can request SR-IOV virtual functions by + specifying resource requests and limits in the pod specification. The SR-IOV + device plugin allocates the requested VFs to the pod. +- **Scheduler awareness**: The Kubernetes scheduler must be aware of the + SR-IOV resources available on each node. The device plugin advertises the + available VFs as extended resources, which the scheduler uses to place pods + accordingly. Scheduler configuration ensures pods with SR-IOV requests are + scheduled on nodes with available VFs. +- **Resource quotas and limits**: Kubernetes enables the definition of + resource quotas and limits to control the allocation of SR-IOV resources + across namespaces. This helps manage and isolate resource usage effectively. +- **Monitoring and metrics**: Monitoring tools such as Prometheus and Grafana + can be used to track the performance and resource utilisation of + SR-IOV-enabled workloads. Metrics include VF allocation, network throughput, + and latency, helping optimise and troubleshoot performance-sensitive + applications. +- **Isolation and security**: SR-IOV provides isolation between VFs, ensuring + that each VF operates independently and securely. This isolation is critical + for multi-tenant environments where different workloads share the same + physical network device. +- **Dynamic resource allocation**: SR-IOV supports dynamic allocation and + deallocation of VFs, enabling Kubernetes to adjust resources based on + application demands without requiring changes to the physical hardware + setup. + +## DPDK (Data Plane Development Kit) + +The Data Plane Development Kit (DPDK) is a set of libraries and drivers for +fast packet processing. It is designed to run in user space, so that +applications can achieve high-speed packet processing by bypassing the kernel. +DPDK is used to optimise network performance and reduce latency, making it +ideal for applications that require high-throughput and low-latency networking, +such as telecommunications, cloud data centres and network functions +virtualisation (NFV). + +### Key features + +- **High performance**: DPDK can process millions of packets per second per + core, using multi-core CPUs to scale performance. +- **User-space processing**: By running in user space, DPDK avoids the + overhead of kernel context switches and uses HugePages for better + memory performance. +- **Poll Mode Drivers (PMD)**: DPDK uses PMDs that poll for packets instead of + relying on interrupts, which reduces latency. + +### DPDK architecture + +The main goal of the DPDK is to provide a simple, complete framework for fast +packet processing in data plane applications. Anyone can use the code to +understand some of the techniques employed, to build upon for prototyping or to +add their own protocol stacks. + +The framework creates a set of libraries for specific environments through the +creation of an Environment Abstraction Layer (EAL), which may be specific to a +mode of the Intel® architecture (32-bit or 64-bit), user space +compilers or a specific platform. These environments are created through the +use of Meson files (needed by Meson, the software tool for automating the +building of software that DPDK uses) and configuration files. Once the EAL +library is created, the user may link with the library to create their own +applications. Other libraries, outside of EAL, including the Hash, Longest +Prefix Match (LPM) and rings libraries are also provided. Sample applications +are provided to help show the user how to use various features of the DPDK. + +The DPDK implements a run-to-completion model for packet processing, where all +resources must be allocated prior to calling data plane applications, running +as execution units on logical processing cores. The model does not support a +scheduler and all devices are accessed by polling. The primary reason for not +using interrupts is the performance overhead imposed by interrupt processing. + +In addition to the run-to-completion model, a pipeline model may also be used +by passing packets or messages between cores via the rings. This enables work +to be performed in stages and is a potentially more efficient use of code on +cores. This is suitable for scenarios where each pipeline must be mapped to a +specific application thread or when multiple pipelines must be mapped to the +same thread. + +### Application to Kubernetes + +The architecture for integrating the Data Plane Development Kit (DPDK) into +Kubernetes involves several components and configurations to ensure high-speed +packet processing and low-latency networking. DPDK enables applications to +bypass the kernel network stack, providing direct access to network hardware +and significantly enhancing network performance. Here are the key architectural +components and their roles: + +- **Node configuration**: Each Kubernetes node must have the DPDK libraries + and drivers installed. This includes setting up HugePages and binding + network interfaces to DPDK-compatible drivers. +- **HugePages configuration**: DPDK requires HugePages for efficient memory + management. Configure the system to reserve HugePages. +- **Network interface binding**: Network interfaces must be bound to + DPDK-compatible drivers (e.g., vfio-pci) to be used by DPDK applications. +- **DPDK application container**: Create a Docker container image with the + DPDK application and necessary libraries. Ensure that the container runs + with appropriate privileges and mounts HugePages. +- **Pod specification**: Deploy the DPDK application in Kubernetes by + specifying the necessary resources, including CPU pinning and HugePages, in + the pod specification. +- **CPU pinning**: For optimal performance, DPDK applications should use + dedicated CPU cores. Configure CPU pinning in the pod specification. +- **SR-IOV for network interfaces**: Combine DPDK with SR-IOV to provide + high-performance network interfaces. Allocate SR-IOV virtual functions (VFs) + to DPDK pods. +- **Scheduler awareness**: The Kubernetes scheduler must be aware of the + resources required by DPDK applications, including HugePages and CPU + pinning, to place pods appropriately on nodes with sufficient resources. +- **Monitoring and metrics**: Use monitoring tools like Prometheus and Grafana + to track the performance of DPDK applications, including network throughput, + latency and CPU usage. +- **Resource quotas and limits**: Define resource quotas and limits to control + the allocation of resources for DPDK applications across namespaces, + ensuring fair resource distribution and preventing resource contention. +- **Isolation and security**: Ensure that DPDK applications run in isolated + and secure environments. Use security contexts to provide the necessary + privileges while maintaining security best practices. + + + + +[no_hz]: https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt diff --git a/docs/src/snap/explanation/index.md b/docs/src/snap/explanation/index.md index 3feb4fb83..2f8380ea1 100644 --- a/docs/src/snap/explanation/index.md +++ b/docs/src/snap/explanation/index.md @@ -16,6 +16,7 @@ certificates channels clustering ingress +epa /snap/explanation/security ``` From ef3ea74a4e011af9efb91093811260809e01e7e1 Mon Sep 17 00:00:00 2001 From: Adam Dyess Date: Tue, 17 Sep 2024 12:12:39 -0500 Subject: [PATCH 40/45] Update the issue template for creating release branches (#677) --- .../ISSUE_TEMPLATE/create_release_branch.md | 57 ++++++------------- 1 file changed, 18 insertions(+), 39 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/create_release_branch.md b/.github/ISSUE_TEMPLATE/create_release_branch.md index 9c5a8268a..c460a90ed 100644 --- a/.github/ISSUE_TEMPLATE/create_release_branch.md +++ b/.github/ISSUE_TEMPLATE/create_release_branch.md @@ -13,16 +13,16 @@ Make sure to follow the steps below and ensure all actions are completed and sig - **K8s version**: 1.xx -- **Owner**: +- **Owner**: `who plans to do the work` -- **Reviewer**: +- **Reviewer**: `who plans to review the work` -- **PR**: -- +- **PR**: https://github.com/canonical/k8s-snap/pull/`` + -- **PR**: +- **PR**: https://github.com/canonical/k8s-snap/pull/`` #### Actions @@ -53,7 +53,7 @@ The steps are to be followed in-order, each task must be completed by the person - [ ] **Owner**: Create `release-1.xx` branch from latest `master` in k8s-dqlite - `git clone git@github.com:canonical/k8s-dqlite.git ~/tmp/release-1.xx` - `pushd ~/tmp/release-1.xx` - - `git switch main` + - `git switch master` - `git pull` - `git checkout -b release-1.xx` - `git push origin release-1.xx` @@ -89,7 +89,7 @@ The steps are to be followed in-order, each task must be completed by the person - [ ] **Owner**: Create `release-1.xx` branch from latest `main` in rawfile-localpv - `git clone git@github.com:canonical/rawfile-localpv.git ~/tmp/release-1.xx` - `pushd ~/tmp/release-1.xx` - - `git switch main` + - `git switch rockcraft` - `git pull` - `git checkout -b release-1.xx` - `git push origin release-1.xx` @@ -98,7 +98,6 @@ The steps are to be followed in-order, each task must be completed by the person - [ ] **Reviewer**: Ensure `release-1.xx` branch is based on latest changes on `main` at the time of the release cut. - [ ] **Owner**: Create PR to initialize `release-1.xx` branch: - [ ] Update `KUBERNETES_RELEASE_MARKER` to `stable-1.xx` in [/build-scripts/hack/update-component-versions.py][] - - [ ] Update `master` to `release-1.xx` in [/build-scripts/components/k8s-dqlite/version][] - [ ] Update `"main"` to `"release-1.xx"` in [/build-scripts/hack/generate-sbom.py][] - [ ] `git commit -m 'Release 1.xx'` - [ ] Create PR against `release-1.xx` with the changes and request review from **Reviewer**. Make sure to update the issue `Information` section with a link to the PR. @@ -107,43 +106,22 @@ The steps are to be followed in-order, each task must be completed by the person - [ ] Add `release-1.xx` in [.github/workflows/update-components.yaml][] - [ ] Remove unsupported releases from the list (if applicable, consult with **Reviewer**) - [ ] Create PR against `main` with the changes and request review from **Reviewer**. Make sure to update the issue information with a link to the PR. -- [ ] **Reviewer**: On merge, confirm [Auto-update strict branch] action runs to completion and that the `autoupdate/release-1.xx-strict` branch is created. -- [ ] **Owner**: Create launchpad builders for `release-1.xx` - - [ ] Go to [lp:k8s][] and do **Import now** to pick up all latest changes. - - [ ] Under **Branches**, select `release-1.xx`, then **Create snap package** - - [ ] Set **Snap recipe name** to `k8s-snap-1.xx` - - [ ] Set **Owner** to `Canonical Kubernetes (containers)` - - [ ] Set **The project that this Snap is associated with** to `k8s` - - [ ] Set **Series** to Infer from snapcraft.yaml - - [ ] Set **Processors** to `AMD x86-64 (amd64)` and `ARM ARMv8 (arm64)` - - [ ] Enable **Automatically build when branch changes** - - [ ] Enable **Automatically upload to store** - - [ ] Set **Registered store name** to `k8s` - - [ ] In **Store Channels**, set **Track** to `1.xx-classic` and **Risk** to `edge`. Leave **Branch** empty - - [ ] Click **Create snap package** at the bottom of the page. -- [ ] **Owner**: Create launchpad builders for `release-1.xx-strict` - - [ ] Return to [lp:k8s][]. - - [ ] Under **Branches**, select `autoupdate/release-1.xx-strict`, then **Create snap package** - - [ ] Set **Snap recipe name** to `k8s-snap-1.xx-strict` - - [ ] Set **Owner** to `Canonical Kubernetes (containers)` - - [ ] Set **The project that this Snap is associated with** to `k8s` - - [ ] Set **Series** to Infer from snapcraft.yaml - - [ ] Set **Processors** to `AMD x86-64 (amd64)` and `ARM ARMv8 (arm64)` - - [ ] Enable **Automatically build when branch changes** - - [ ] Enable **Automatically upload to store** - - [ ] Set **Registered store name** to `k8s` - - [ ] In **Store Channels**, set **Track** to `1.xx` and **Risk** to `edge`. Leave **Branch** empty - - [ ] Click **Create snap package** at the bottom of the page. +- [ ] **Reviewer**: On merge, confirm [Auto-update strict branch] action runs to completion and that the `autoupdate/release-1.xx-*` flavor branches are created. + - [ ] autoupdate/release-1.xx-strict + - [ ] autoupdate/release-1.xx-moonray +- [ ] **Owner**: Create launchpad builders for `release-1.xx` and flavors + - [ ] Run the [Confirm Snap Builds][] Action - [ ] **Reviewer**: Ensure snap recipes are created in [lp:k8s/+snaps][] - - look for `k8s-snap-1.xx` - - look for `k8s-snap-1.xx-strict` + - [ ] look for `k8s-snap-1.xx-classic` + - [ ] look for `k8s-snap-1.xx-strict` + - [ ] look for `k8s-snap-1.xx-moonray` + - [ ] make sure each is "Authorized for Store Upload" #### After release - [ ] **Owner** follows up with the **Reviewer** and team about things to improve around the process. - [ ] **Owner**: After a few weeks of stable CI, update default track to `1.xx/stable` via - On the snap [releases page][], select `Track` > `1.xx` -- [ ] **Reviewer**: Ensure snap recipes are created in [lp:k8s/+snaps][] @@ -161,6 +139,7 @@ The steps are to be followed in-order, each task must be completed by the person [.github/workflows/update-components.yaml]: ../workflows/update-components.yaml [/build-scripts/components/hack/update-component-versions.py]: ../../build-scripts/components/hack/update-component-versions.py [/build-scripts/components/k8s-dqlite/version]: ../../build-scripts/components/k8s-dqlite/version -[/build-scripts/hack/generate-sbom.py]: ../..//build-scripts/hack/generate-sbom.py +[/build-scripts/hack/generate-sbom.py]: ../../build-scripts/hack/generate-sbom.py [lp:k8s]: https://code.launchpad.net/~cdk8s/k8s/+git/k8s-snap [lp:k8s/+snaps]: https://launchpad.net/k8s/+snaps +[Confirm Snap Builds]: https://github.com/canonical/canonical-kubernetes-release-ci/actions/workflows/create-release-branch.yaml From f1d1254de2299ea5bef139087ee411bb4f2eb826 Mon Sep 17 00:00:00 2001 From: Adam Dyess Date: Tue, 17 Sep 2024 12:50:21 -0500 Subject: [PATCH 41/45] Automerge every 4 hours any labeled PR with passing tests (#675) * Automerge every 4-hours any PR with passing tests labeled with 'automerge' * Make sure the bot can approve the PRs too * Update Bot information only if git email currently unset * consistently use private key secret to setup ssh git-remote * Rename secret to BOT_SSH_KEY * Reimagine auto-merge scripts as python --- .../workflows/auto-merge-successful-prs.yaml | 29 ++++++++++ .github/workflows/update-branches.yaml | 2 +- .github/workflows/update-components.yaml | 5 +- .../hack/auto-merge-successful-pr.py | 55 +++++++++++++++++++ build-scripts/patches/moonray/apply | 9 ++- build-scripts/patches/strict/apply | 7 ++- 6 files changed, 99 insertions(+), 8 deletions(-) create mode 100644 .github/workflows/auto-merge-successful-prs.yaml create mode 100755 build-scripts/hack/auto-merge-successful-pr.py diff --git a/.github/workflows/auto-merge-successful-prs.yaml b/.github/workflows/auto-merge-successful-prs.yaml new file mode 100644 index 000000000..e7f4fc096 --- /dev/null +++ b/.github/workflows/auto-merge-successful-prs.yaml @@ -0,0 +1,29 @@ +name: Auto-merge Successful PRs + +on: + workflow_dispatch: + schedule: + - cron: "0 */4 * * *" # Every 4 hours + +permissions: + contents: read + +jobs: + merge-successful-prs: + runs-on: ubuntu-latest + + steps: + - name: Harden Runner + uses: step-security/harden-runner@v2 + with: + egress-policy: audit + - name: Checking out repo + uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: '3.12' + - name: Auto-merge pull requests if all status checks pass + env: + GH_TOKEN: ${{ secrets.BOT_TOKEN }} + run: | + build-scripts/hack/auto-merge-successful-pr.py diff --git a/.github/workflows/update-branches.yaml b/.github/workflows/update-branches.yaml index 356bbce5b..b6ed8f38e 100644 --- a/.github/workflows/update-branches.yaml +++ b/.github/workflows/update-branches.yaml @@ -41,7 +41,7 @@ jobs: - name: Sync ${{ github.ref }} to ${{ steps.determine.outputs.branch }} uses: actions/checkout@v4 with: - ssh-key: ${{ secrets.DEPLOY_KEY_TO_UPDATE_STRICT_BRANCH }} + ssh-key: ${{ secrets.BOT_SSH_KEY }} - name: Apply ${{ matrix.patch }} patch run: | git checkout -b ${{ steps.determine.outputs.branch }} diff --git a/.github/workflows/update-components.yaml b/.github/workflows/update-components.yaml index 7f9e43745..23aa952a4 100644 --- a/.github/workflows/update-components.yaml +++ b/.github/workflows/update-components.yaml @@ -33,7 +33,7 @@ jobs: uses: actions/checkout@v4 with: ref: ${{ matrix.branch }} - ssh-key: ${{ secrets.DEPLOY_KEY_TO_UPDATE_STRICT_BRANCH }} + ssh-key: ${{ secrets.BOT_SSH_KEY }} - name: Setup Python uses: actions/setup-python@v5 @@ -51,10 +51,11 @@ jobs: - name: Create pull request uses: peter-evans/create-pull-request@v6 with: - git-token: ${{ secrets.DEPLOY_KEY_TO_UPDATE_STRICT_BRANCH }} commit-message: "[${{ matrix.branch }}] Update component versions" title: "[${{ matrix.branch }}] Update component versions" body: "[${{ matrix.branch }}] Update component versions" branch: "autoupdate/sync/${{ matrix.branch }}" + labels: | + automerge delete-branch: true base: ${{ matrix.branch }} diff --git a/build-scripts/hack/auto-merge-successful-pr.py b/build-scripts/hack/auto-merge-successful-pr.py new file mode 100755 index 000000000..ea6c98e6b --- /dev/null +++ b/build-scripts/hack/auto-merge-successful-pr.py @@ -0,0 +1,55 @@ +#!/bin/env python3 + +import shlex +import subprocess +import json + +LABEL = "automerge" +APPROVE_MSG = "All status checks passed for PR #{}." + + +def sh(cmd: str) -> str: + """Run a shell command and return its output.""" + _pipe = subprocess.PIPE + result = subprocess.run(shlex.split(cmd), stdout=_pipe, stderr=_pipe, text=True) + if result.returncode != 0: + raise Exception(f"Error running command: {cmd}\nError: {result.stderr}") + return result.stdout.strip() + + +def get_pull_requests() -> list: + """Fetch open pull requests matching some label.""" + prs_json = sh("gh pr list --state open --json number,labels") + prs = json.loads(prs_json) + return [pr for pr in prs if any(label["name"] == LABEL for label in pr["labels"])] + + +def check_pr_passed(pr_number) -> bool: + """Check if all status checks passed for the given PR.""" + checks_json = sh(f"gh pr checks {pr_number} --json bucket") + checks = json.loads(checks_json) + return all(check["bucket"] == "pass" for check in checks) + + +def approve_and_merge_pr(pr_number) -> None: + """Approve and merge the PR.""" + print(APPROVE_MSG.format(pr_number) + "Proceeding with merge...") + sh(f'gh pr review {pr_number} --comment -b "{APPROVE_MSG.format(pr_number)}"') + sh(f"gh pr merge {pr_number} --admin --squash") + + +def process_pull_requests(): + """Process the PRs and merge if checks have passed.""" + prs = get_pull_requests() + + for pr in prs: + pr_number: int = pr["number"] + + if check_pr_passed(pr_number): + approve_and_merge_pr(pr_number) + else: + print(f"Status checks have not passed for PR #{pr_number}. Skipping merge.") + + +if __name__ == "__main__": + process_pull_requests() diff --git a/build-scripts/patches/moonray/apply b/build-scripts/patches/moonray/apply index 1233dae42..32a2f8510 100755 --- a/build-scripts/patches/moonray/apply +++ b/build-scripts/patches/moonray/apply @@ -2,9 +2,12 @@ DIR="$(realpath "$(dirname "${0}")")" -# Configure git author -git config user.email k8s-bot@canonical.com -git config user.name k8s-bot +# Configure git author if unset +git_email=$(git config --default "" user.email) +if [ -z "${git_email}" ]; then + git config user.email k8s-team-ci@canonical.com + git config user.name 'k8s-team-ci (CDK Bot)' +fi # Remove unrelated tests rm "${DIR}/../../../tests/integration/tests/test_cilium_e2e.py" diff --git a/build-scripts/patches/strict/apply b/build-scripts/patches/strict/apply index 1729742e2..3f6f7de14 100755 --- a/build-scripts/patches/strict/apply +++ b/build-scripts/patches/strict/apply @@ -3,8 +3,11 @@ DIR="$(realpath "$(dirname "${0}")")" # Configure git author -git config user.email k8s-bot@canonical.com -git config user.name k8s-bot +git_email=$(git config --default "" user.email) +if [ -z "${git_email}" ]; then + git config user.email k8s-team-ci@canonical.com + git config user.name 'k8s-team-ci (CDK Bot)' +fi # Apply strict patch git am "${DIR}/0001-Strict-patch.patch" From f55f6f8022aa47b49e703f440d6a6d8f1886f579 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maciek=20Go=C5=82aszewski?= Date: Tue, 17 Sep 2024 21:20:58 +0200 Subject: [PATCH 42/45] Warnings that k8s service may not work (#657) Warnings that k8s service may not work (#657) KU-1475 --- src/k8s/cmd/k8s/hooks.go | 35 ++++++++++++++++++ src/k8s/cmd/k8s/k8s_bootstrap.go | 2 +- src/k8s/cmd/k8s/k8s_join_cluster.go | 2 +- src/k8s/cmd/util/hooks.go | 56 +++++++++++++++++++++++++++++ 4 files changed, 93 insertions(+), 2 deletions(-) create mode 100644 src/k8s/cmd/util/hooks.go diff --git a/src/k8s/cmd/k8s/hooks.go b/src/k8s/cmd/k8s/hooks.go index 481d63df6..6b97bd16d 100644 --- a/src/k8s/cmd/k8s/hooks.go +++ b/src/k8s/cmd/k8s/hooks.go @@ -2,6 +2,7 @@ package k8s import ( cmdutil "github.com/canonical/k8s/cmd/util" + "github.com/spf13/cobra" ) @@ -34,3 +35,37 @@ func hookInitializeFormatter(env cmdutil.ExecutionEnvironment, format *string) f } } } + +// hookCheckLXD verifies the ownership of directories needed for Kubernetes to function. +// If a potential issue is detected, it displays a warning to the user. +func hookCheckLXD() func(*cobra.Command, []string) { + return func(cmd *cobra.Command, args []string) { + // pathsOwnershipCheck paths to validate root is the owner + var pathsOwnershipCheck = []string{"/sys", "/proc", "/dev/kmsg"} + inLXD, err := cmdutil.InLXDContainer() + if err != nil { + cmd.PrintErrf("Failed to check if running inside LXD container: %s", err.Error()) + return + } + if inLXD { + var errMsgs []string + for _, pathToCheck := range pathsOwnershipCheck { + if err = cmdutil.ValidateRootOwnership(pathToCheck); err != nil { + errMsgs = append(errMsgs, err.Error()) + } + } + if len(errMsgs) > 0 { + if debug, _ := cmd.Flags().GetBool("debug"); debug { + cmd.PrintErrln("Warning: When validating required resources potential issues found:") + for _, errMsg := range errMsgs { + cmd.PrintErrln("\t", errMsg) + } + } + cmd.PrintErrln("The lxc profile for Canonical Kubernetes might be missing.") + cmd.PrintErrln("For running k8s inside LXD container refer to " + + "https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/howto/install/lxd/") + } + } + return + } +} diff --git a/src/k8s/cmd/k8s/k8s_bootstrap.go b/src/k8s/cmd/k8s/k8s_bootstrap.go index b4243d824..5510ed34e 100644 --- a/src/k8s/cmd/k8s/k8s_bootstrap.go +++ b/src/k8s/cmd/k8s/k8s_bootstrap.go @@ -45,7 +45,7 @@ func newBootstrapCmd(env cmdutil.ExecutionEnvironment) *cobra.Command { Use: "bootstrap", Short: "Bootstrap a new Kubernetes cluster", Long: "Generate certificates, configure service arguments and start the Kubernetes services.", - PreRun: chainPreRunHooks(hookRequireRoot(env), hookInitializeFormatter(env, &opts.outputFormat)), + PreRun: chainPreRunHooks(hookRequireRoot(env), hookInitializeFormatter(env, &opts.outputFormat), hookCheckLXD()), Run: func(cmd *cobra.Command, args []string) { if opts.interactive && opts.configFile != "" { cmd.PrintErrln("Error: --interactive and --file flags cannot be set at the same time.") diff --git a/src/k8s/cmd/k8s/k8s_join_cluster.go b/src/k8s/cmd/k8s/k8s_join_cluster.go index 7507fedcb..4cd5bfe6d 100644 --- a/src/k8s/cmd/k8s/k8s_join_cluster.go +++ b/src/k8s/cmd/k8s/k8s_join_cluster.go @@ -32,7 +32,7 @@ func newJoinClusterCmd(env cmdutil.ExecutionEnvironment) *cobra.Command { cmd := &cobra.Command{ Use: "join-cluster ", Short: "Join a cluster using the provided token", - PreRun: chainPreRunHooks(hookRequireRoot(env), hookInitializeFormatter(env, &opts.outputFormat)), + PreRun: chainPreRunHooks(hookRequireRoot(env), hookInitializeFormatter(env, &opts.outputFormat), hookCheckLXD()), Args: cmdutil.ExactArgs(env, 1), Run: func(cmd *cobra.Command, args []string) { token := args[0] diff --git a/src/k8s/cmd/util/hooks.go b/src/k8s/cmd/util/hooks.go new file mode 100644 index 000000000..a02dc64c3 --- /dev/null +++ b/src/k8s/cmd/util/hooks.go @@ -0,0 +1,56 @@ +package cmdutil + +import ( + "fmt" + "os" + "strings" + "syscall" +) + +// getFileOwnerAndGroup retrieves the UID and GID of a file. +func getFileOwnerAndGroup(filePath string) (uid, gid uint32, err error) { + // Get file info using os.Stat + fileInfo, err := os.Stat(filePath) + if err != nil { + return 0, 0, fmt.Errorf("error getting file info: %w", err) + } + // Convert the fileInfo.Sys() to syscall.Stat_t to access UID and GID + stat, ok := fileInfo.Sys().(*syscall.Stat_t) + if !ok { + return 0, 0, fmt.Errorf("failed to cast to syscall.Stat_t") + } + // Return the UID and GID + return stat.Uid, stat.Gid, nil +} + +// ValidateRootOwnership checks if the specified path is owned by the root user and root group. +func ValidateRootOwnership(path string) (err error) { + UID, GID, err := getFileOwnerAndGroup(path) + if err != nil { + return err + } + if UID != 0 { + return fmt.Errorf("owner of %s is user with UID %d expected 0", path, UID) + } + if GID != 0 { + return fmt.Errorf("owner of %s is group with GID %d expected 0", path, GID) + } + return nil +} + +// InLXDContainer checks if k8s runs in a lxd container. +func InLXDContainer() (isLXD bool, err error) { + initialProcessEnvironmentVariables := "/proc/1/environ" + content, err := os.ReadFile(initialProcessEnvironmentVariables) + if err != nil { + // if the permission to file is missing we still want to display info about lxd + if os.IsPermission(err) { + return true, fmt.Errorf("cannnot access %s to check if runing in LXD container: %w", initialProcessEnvironmentVariables, err) + } + return false, fmt.Errorf("cannnot read %s to check if runing in LXD container: %w", initialProcessEnvironmentVariables, err) + } + if strings.Contains(string(content), "container=lxc") { + return true, nil + } + return false, nil +} From cbcbcfc5f02471dd8ee40767c1a2c911b2e52e30 Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Wed, 18 Sep 2024 11:16:43 +0100 Subject: [PATCH 43/45] link epa pages, fix nav --- docs/src/snap/explanation/epa.md | 8 +++++--- docs/src/snap/howto/index.md | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/src/snap/explanation/epa.md b/docs/src/snap/explanation/epa.md index a0884df55..b4bcaccc1 100644 --- a/docs/src/snap/explanation/epa.md +++ b/docs/src/snap/explanation/epa.md @@ -27,9 +27,10 @@ capabilities. This document provides a detailed guide of how EPA applies to performance and reducing latency. This document provides relevant links to detailed instructions for setting up -and installing these technologies. It is designed for developers -and architects who wish to integrate these new technologies into their -{{product}}-based networking solutions. +and installing these technologies. It is designed for developers and architects +who wish to integrate these new technologies into their {{product}}-based +networking solutions. The separate [how to guide][howto-epa] for EPA includes the +necessary steps to implement these features on {{product}}. ## HugePages @@ -542,3 +543,4 @@ components and their roles: [no_hz]: https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt +[howto-epa]: /snap/howto/epa diff --git a/docs/src/snap/howto/index.md b/docs/src/snap/howto/index.md index 45c18485c..3ae545030 100644 --- a/docs/src/snap/howto/index.md +++ b/docs/src/snap/howto/index.md @@ -19,10 +19,10 @@ networking/index storage/index external-datastore proxy -epa backup-restore refresh-certs restore-quorum +epa contribute support ``` From 40937463329479b1b2c9b761fc24dcb9d67f8aac Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Wed, 18 Sep 2024 11:21:37 +0100 Subject: [PATCH 44/45] address review comments --- docs/src/snap/howto/epa.md | 46 +++++++++++++++++++------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index c3e92f49b..ed960a6ff 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -415,7 +415,7 @@ EPA capabilities. 1. [Install the snap][install-link] from the relevant [channel][channel]. ```{note} - A pre-release channel is required currently until there is a finalised release of {{product}}. + A pre-release channel is required currently until there is a stable release of {{product}}. ``` For example: @@ -1102,28 +1102,28 @@ the correct PCI address: ``` ... - k8s.v1.cni.cncf.io/network-status: - [{ - "name": "k8s-pod-network", - "ips": [ - "10.1.17.141" - ], - "default": true, - "dns": {} - },{ - "name": "default/dpdk-net1", - "interface": "net1", - "mac": "26:e4:aa:f4:ce:ba", - "dns": {}, - "device-info": { - "type": "pci", - "version": "1.1.0", - "pci": { - "pci-address": "0000:98:1f.2" - } - } - }] - k8s.v1.cni.cncf.io/networks: dpdk-net1 + k8s.v1.cni.cncf.io/network-status: + [{ + "name": "k8s-pod-network", + "ips": [ + "10.1.17.141" + ], + "default": true, + "dns": {} + },{ + "name": "default/dpdk-net1", + "interface": "net1", + "mac": "26:e4:aa:f4:ce:ba", + "dns": {}, + "device-info": { + "type": "pci", + "version": "1.1.0", + "pci": { + "pci-address": "0000:98:1f.2" + } + } + }] + k8s.v1.cni.cncf.io/networks: dpdk-net1 ... ``` From 7554a0fb4f173518c8a0fc30fb488f729281616f Mon Sep 17 00:00:00 2001 From: Nick Veitch Date: Wed, 18 Sep 2024 11:25:28 +0100 Subject: [PATCH 45/45] lint --- docs/src/snap/howto/epa.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/src/snap/howto/epa.md b/docs/src/snap/howto/epa.md index ed960a6ff..50e6b4783 100644 --- a/docs/src/snap/howto/epa.md +++ b/docs/src/snap/howto/epa.md @@ -1,7 +1,8 @@ # How to set up Enhanced Platform Awareness This section explains how to set up the Enhanced Platform Awareness (EPA) -features in a {{product}} cluster. Please see the [EPA explanation page][explain-epa] for details about how EPA applies to {{product}}. +features in a {{product}} cluster. Please see the [EPA explanation +page][explain-epa] for details about how EPA applies to {{product}}. The content starts with the setup of the environment (including steps for using [MAAS][MAAS]). Then the setup of {{product}}, including the Multus & SR-IOV/DPDK