Skip to content

a2geek/bosh-lxd-cpi-release

Repository files navigation

BOSH LXD CPI

Would love PRs if people are still playing with BOSH in 2024!

This is a BOSH CPI implementation to support LXD and Incus. Both require BIOS boot capability, so this generally is more recent releases of LXD and Incus.

Requirements

LXD 5.21, LXD supports BIOS boots for VMs, which all the BOSH Stemcells use. Without this feature, they must be UEFI boot devices and VMs are not an option. Incus support is in place for the 6.0 LTS release, and likely some of the pre-releases.

Note that the (initial) BOSH deployment can be run from Linux, a Mac, and presumably from the Linux on WSL. Once a BOSH director is running, the BOSH CLI can be used from pretty much anywhere.

The current development environments are Ubuntu 22.04:

  • LXD (currently 5.21/stable) has been installed via a Snap and this guide was generally followed
  • Incus (currently 6.0.0-1ubuntu0.1) has been installed via apt and the documentation was followed for Ubuntu.

Documentation

Current State

All CPI calls have been implemented, including snapshots and IaaS-native disk resizing.

LXD adjustments

  • There is a nightly scheduled process to look for unused configuration disks (vol-c-<uuid> format). There are events (at least with ZFS) where the configuration disk sometimes doesn't get detached from the VM. The nigltly process simply scans the list of detached configuration disks, tries to delete them, and reports success or error in the log. See cleanup.
  • Throttling. Not so much LXD, but more the single host conundrum. There is a server that runs and maintains a hash map of "transaction" reservations. Once the CPI level transaction completes, the transaction is also released. Additionally, these reservations will time out after a certain amount time. By default, this is disabled. (See Tuning for more details. Code is at throttle.)

Tuning

Beyond the general tuning of a VM size (# of CPUs, amount of memory, or sizing of disks), the following is available for tuning the CPI. Note that the host development environment is a single server with a Xeon CPU, 128GB+ RAM, and SSDs (no spinning disks). SSDs are likely very important for I/O activities.

  • Throttling CPI activity. This is a timed gateway type throttle. See the spec for actual definition. Here is a sample of overrides. (Note path is the default and can likely just be left off.)

    # This is for rendering within the VM once stood up
    - type: replace
      path: /instance_groups/name=bosh/properties/lxd_cpi/throttle_config?
      value:
        enabled: true
        path: "/var/vcap/sys/run/lxd_cpi/throttle.sock"
        limit: 4
        hold: "2m"

    Note that the original reason this was required was that for every VM that LXD launched, the QCOW2 format source image gets converted to a RAW format image. This seems to have been resolved. The solution was simply that the root disk was specifying the disk size -- which, apparently, triggers LXD to copy the contents of the source image instead of doing overlay type magic.