Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support enable-host-diskstats option. #601

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peppaJoeng
Copy link

new interface

--enable-host-diskstats

Experimental procedure

  • Disk condition after partition:
$ lsblk
sdd                8:48   0  100G  0 disk 
├─sdd1             8:49   0   50G  0 part 
├─sdd2             8:50   0    1K  0 part 
└─sdd5             8:53   0   20G  0 part
  • start lxcfs
$ lxcfs --enable-host-diskstats /var/lib/lxc/lxcfs/ &
  • Start the container
$ docker run -tid -m 256m --device=/dev/sdd1:/dev/test -v /var/lib/lxc/lxcfs/proc/meminfo:/proc/meminfo:rw -v /var/lib/lxc/lxcfs/proc/diskstats:/proc/diskstats:rw --name test ubuntu bash
  • View container's /proc/diskstats
$ docker exec -it test ls /dev/test
/dev/test
$ docker exec -it test cat /proc/diskstats
8       49 test 249 0 17928 24 75 1115 535153 152 0 169 180 51 0 104857608 3
$ cat /proc/diskstats 
   8       0 sda 23881 2843 2239558 980807 62612 124673 23769904 27578300 0 1989128 29020938 0 0 0 0 11198 461830
   8      32 sdc 1329 0 20890 1088 5376 71021 5013944 101385 0 52829 168482 21 0 41943048 1 277 66007
   8      48 sdd 1383 0 75670 143 104 1115 536836 643 0 813 1279 51 0 104857608 3 5 488
   8      49 sdd1 249 0 17928 24 75 1115 535153 152 0 169 180 51 0 104857608 3 0 0
   8      50 sdd2 10 0 68 0 0 0 0 0 0 10 0 0 0 0 0 0 0
   8      53 sdd5 312 0 20032 29 22 0 1667 2 0 82 32 0 0 0 0 0 0
  • delete container
docker rm -f `docker ps -aq`

After enabling the enable-host-diskstats option, when the user accesses the /proc/diskstats file, the host /proc/diskstats data will be used instead of reading data from the cgroup. This option is suitable for scenarios where the container exclusively uses block/character devices.

fix #599

After enabling the enable-host-diskstats option, when the user accesses the /proc/diskstats file, the host /proc/diskstats data will be used instead of reading data from the cgroup.
This option is suitable for scenarios where the container exclusively uses block/character devices.

Signed-off-by: peppaJoeng <noblei@163.com>
@mihalicyn
Copy link
Member

Hi @peppaJoeng!

I'll review your PR on this week. Sorry about delay.

@peppaJoeng
Copy link
Author

Hi, @mihalicyn
is there any progress?

lxcfs_error("Error opening dir /dev: %s\n", strerror(errno));
goto child_out;
}
while ((ptr = readdir(dir)) != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you are iterating over the /dev inside the container mount namespace to make a list of block devices which are used by the container, right?

What do you think about using /proc/<pid>/mountinfo file instead to get all used block devices? Speaking more precisely, this file allows you to get all block devices which are mounted inside the container mount namespace.

Another limitation here is that in the container there can be a few different mount namespaces.

Copy link
Author

@peppaJoeng peppaJoeng Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Thank you for your suggestion. @mihalicyn

For the first question, I did this by going into the container's namespace and traversing the /dev directory underneath it.

I had thought about using /proc/PID/mountinfo on the host side (If I understand correctly) instead of going into the container namespace to achieve this functionality. But I have encountered some problems.

  1. I cannot get the unmounted block device from this file; in fact, the block device mounted through --device cannot be obtained through the above method.
[root@localhost ~]# docker run -tid --name test --device /dev/sdb:/dev/xxx  rnd-dockerhub.huawei.com/official/ubuntu-arm64 bash
c82db9506ffaff30ce480852d3133cd4528a013d427372c77d91621edfb75809
[root@localhost ~]# docker inspect --format '{{.State.Pid}}'  c82db9506ffaff30ce480852d313
3325697
[root@localhost ~]# cat /proc/3325697/mountinfo | grep sdb
[root@localhost ~]# cat /proc/3325697/mountinfo | grep xxx
[root@localhost ~]# cat /proc/3325697/mountinfo | grep /dev
520 518 0:59 / /dev rw,nosuid - tmpfs tmpfs rw,seclabel,size=65536k,mode=755
521 520 0:60 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=666
530 523 0:37 /docker/c82db9506ffaff30ce480852d3133cd4528a013d427372c77d91621edfb75809 /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,seclabel,devices
540 520 0:57 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel
541 518 253:0 /var/lib/docker/containers/c82db9506ffaff30ce480852d3133cd4528a013d427372c77d91621edfb75809/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/mapper/euleros-root rw,seclabel
542 518 253:0 /var/lib/docker/containers/c82db9506ffaff30ce480852d3133cd4528a013d427372c77d91621edfb75809/hostname /etc/hostname rw,relatime - ext4 /dev/mapper/euleros-root rw,seclabel
543 518 253:0 /var/lib/docker/containers/c82db9506ffaff30ce480852d3133cd4528a013d427372c77d91621edfb75809/hosts /etc/hosts rw,relatime - ext4 /dev/mapper/euleros-root rw,seclabel
544 520 0:56 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,seclabel,size=65536k
464 520 0:60 /0 /dev/console rw,nosuid,noexec,relatime - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=666

As you can see, I didn't get this device. However, I can get it in the following way:

[root@localhost ~]# docker exec -it test bash -c "ls /dev/xxx"
/dev/xxx
  1. An important reason why I enter the /dev device to read the device name is that the device names in the container and the host are different.

Hope I understand you correctly. Welcome to exchange your ideas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot get the unmounted block device from this file; in fact, the block device mounted through --device cannot be obtained through the above method.

yes, you can only get devices which are mounted inside the container using this method. --device option in Docker does not mount device, but just (I guess) performs a bind-mount of it inside the container.

@botieking98
Copy link

Can anybody take some time to review this PR?

@notlate
Copy link

notlate commented Jan 15, 2024

Hi, @mihalicyn ,could you take a moment to review this code?

Copy link
Member

@mihalicyn mihalicyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this feature in general, especially taking into account that it's a optional thing and if it's useful for some workloads I think we can take it after some rework and polishing.

return read_file_fuse("/proc/diskstats", buf, size, d);
}
if (opts && opts->use_host_diskstats) {
lock_mutex(&container_dev_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this mutex?

lxcfs_error("Error setting mnt ns: %s\n", strerror(errno));
goto child_out;
}
dir = opendir("/dev");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can avoid fork() and stuff, by doing something like opendir("/proc/<pid>/root/dev"). In this case you don't need to have an extra process, do setns and stuff, but just read the directory.

lxcfs_error("Error opening dir /dev: %s\n", strerror(errno));
goto child_out;
}
while ((ptr = readdir(dir)) != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot get the unmounted block device from this file; in fact, the block device mounted through --device cannot be obtained through the above method.

yes, you can only get devices which are mounted inside the container using this method. --device option in Docker does not mount device, but just (I guess) performs a bind-mount of it inside the container.

@mihalicyn
Copy link
Member

Hi @peppaJoeng

Sorry for long delay with review (please, feel free to ping us next time).

Please rebase this PR if it's still actual. I have left some review comments.

Kind regards,
Alex

@stgraber stgraber added the Incomplete Waiting on more information from reporter label Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Incomplete Waiting on more information from reporter
Development

Successfully merging this pull request may close these issues.

/proc/diskstats information is inaccurate
5 participants