Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Solution to Fixing Containers when lxcfs Crashes #583

Open
deleriux opened this issue Jan 18, 2023 · 6 comments
Open

A Solution to Fixing Containers when lxcfs Crashes #583

deleriux opened this issue Jan 18, 2023 · 6 comments
Assignees
Labels
Feature New feature, not a bug

Comments

@deleriux
Copy link

deleriux commented Jan 18, 2023

Hello all,

I briefly mentioned last week that I had a solution Transport endpoint not connected errors in containers when lxcfs crashes without having to restart every container that came up.

I've uploaded the code I have as-is and here it is.
https://github.com/deleriux/lxcfs-reattach

I've tested on Ubuntu 22 and Ubuntu 16 (with updated kernel).

The way that this works is by utilizing various system calls introduced into the kernel post 5.2 that split up the mount process into multiple steps, see:

https://lwn.net/Articles/759499/

You can leverage this step-by-step approach to take the source path in the host namespace, then switch to the container namespace and mount the target path in the container namespace.

The algorithm basically is as follows.

  1. Enter the mount namespace lxcfs is running in.
  2. Locate the particular bind mount you are interested in fixing. IE /var/lib/lxcfs/proc/meminfo
  3. Call open_tree() on the path to obtain a mount_fd representing this mount point.
  4. Enter the containers mount namespace (you've now snatched the mount FD from the hosts namespace!)
  5. Call unmount() on containers path to /proc/meminfo
  6. Call move_mount() against /proc/meminfo to reattach this mountpoint to the containers VFS.

The code for this part is kept in https://github.com/deleriux/lxcfs-reattach/blob/main/container.c#L145 .

The remaining code is mostly dedicated to heuristics in finding containers to mount and mountpoints to monitor.
I'm pretty sure it littered with stupid bugs, but it works.

The process supports a monitor mode that uses epoll() against all discovered /proc/pid/mounts to watch mounts come and go. If a qualifying mountpoint is unmounted then remounted (such as if lxcfs gets restarted) the process detects it and issues a request to test then rebind mountpoints that no longer work.

If lxcfs crashes and is not restarted, then it cant help there, but as soon as a new instance comes up it should rebind the mountpoints pretty quickly.

My code doesn't / can't distinguish which lxcfs process to use when rebinding mountpoints, it merely selects the 'best/first' working one and runs with it. This is particularly prevalent in LXD in snaps which tends to run its own lxcfs along with the systems lxcfs which can also be running.

I'm not suggesting this is the best and only solution to this problem (or my code for that matter is suitable for this project in its current form) but the algorithm to fix running containers is pretty straightforwards and tends to work flawlessly without being too disruptive.

@mihalicyn
Copy link
Member

Hi @deleriux

that's a good idea, as I said before we are currently working on internal lxcfs mechanism to recover from crashes. But that's a good solution for some cases if rebooting all containers is problematic.

@stgraber
Copy link
Member

We'll need to be very very careful when doing something like that as root in the container can mess with the mount namespace.
So we may be tricked into traversing some symlinks, get locked up by hitting an intentionally broken FUSE mount, ...

That's the reason we never invested too much effort into injecting LXCFS mounts into an existing instance.
Even prior to the new mount API, we had a workaround using mount propagation to add/remove mounts from containers, but that still had the same security concerns attached to it.

I certainly feel a lot better about the current plan from @mihalicyn to allow recovering from a lxcfs crash by re-attaching to the existing FUSE mounts.

@stgraber stgraber added the Feature New feature, not a bug label Sep 29, 2023
@stgraber
Copy link
Member

@mihalicyn we can re-use this one to track the FUSE re-attach work

@zhoushuke
Copy link

@mihalicyn we can re-use this one to track the FUSE re-attach work

@stgraber so, would this fix be added to next release?

@mihalicyn
Copy link
Member

@mihalicyn we can re-use this one to track the FUSE re-attach work

@stgraber so, would this fix be added to next release?

It won't be addressed in the next release I would say. We need to make some changes in the Linux kernel as a part of this work. But it will be definitely implemented in LXCFS.

Do you have any issues with LXCFS right now?

@zhoushuke
Copy link

@mihalicyn any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Development

No branches or pull requests

4 participants