Skip to content

Commit

Permalink
linuxc: create transient unit cgroup on systemd enabled distribution …
Browse files Browse the repository at this point in the history
…via dbus

fixes #89
  • Loading branch information
criyle committed Nov 9, 2023
1 parent 2f24785 commit 4de8c98
Show file tree
Hide file tree
Showing 6 changed files with 162 additions and 71 deletions.
6 changes: 4 additions & 2 deletions README.cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ interface WSResult {

- 默认同时运行任务数为和 CPU 数量相同,使用 `-parallelism` 指定
- 默认文件存储在内存里,使用 `-dir` 指定本地目录为文件存储
- 默认 cgroup 的前缀为 `executor_server` ,使用 `-cgroup-prefix` 指定
- 默认 cgroup 的前缀为 `gojudge` ,使用 `-cgroup-prefix` 指定
- 默认没有磁盘文件复制限制,使用 `-src-prefix` 限制 copyIn 操作文件目录前缀,使用逗号 `,` 分隔(需要绝对路径)(例如:`/bin,/usr`
- 默认时间和内存使用检查周期为 100 毫秒(`100ms`),使用 `-time-limit-checker-interval` 指定
- 默认最大输出限制为 `256MiB`,使用 `-output-limit` 指定
Expand Down Expand Up @@ -502,7 +502,9 @@ interface WSResult {

`executorserver` 目前已经支持 cgroup v2 鉴于越来越多的 Linux 发行版默认启用 cgroup v2 而不是 v1 (比如 Ubuntu 21.10+,Fedora 31+)。然而,对于内核版本小于 5.19 的版本,因为 cgroup v2 在内存控制器里面缺少 `memory.max_usage_in_bytes`,内存使用量计数会转而采用 `maxrss` 指标。这项指标会显示的比使用 cgroup v1 时候要稍多,在运行使用内存较少的程序时比较明显。对于内核版本大于或等于 5.19 的版本,`memory.peak` 会被采用。

同时,如果本程序在容器中运行,容器中的进程会被移到 `/init` cgroup v2 控制器中来开启 cgroup v2 嵌套支持。
同时,如果本程序在容器中运行,容器中的进程会被移到 `/api` cgroup v2 控制器中来开启 cgroup v2 嵌套支持。

`systemd``init` 的发行版中运行时,`executorserver` 会使用 `dbus` 通知 `systemd` 来创建一个临时 `scope` 作为 `cgroup` 的根。

#### CentOS 7

Expand Down
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,8 +333,7 @@ Plese use PostMan or similar tools to send request to `http://localhost:5050/run
}
},
"copyOut": ["stdout", "stderr"],
"copyOutCached": ["a.cc", "a"],
"copyOutDir": "1"
"copyOutCached": ["a.cc", "a"]
}]
}
```
Expand Down Expand Up @@ -602,7 +601,7 @@ Sandbox:

- The default concurrency equal to number of CPU, Can be specified with `-parallelism` flag.
- The default file store is in memory, local cache can be specified with `-dir` flag.
- The default CGroup prefix is `executor_server`, Can be specified with `-cgroup-prefix` flag.
- The default CGroup prefix is `gojudge`, Can be specified with `-cgroup-prefix` flag.
- `-src-prefix` to restrict `src` copyIn path split by comma (need to be absolute path) (example: `/bin,/usr`)
- `-time-limit-checker-interval` specifies time limit checker interval (default 100ms) (valid value: \[1ms, 1s\])
- `-output-limit` specifies size limit of POSIX rlimit of output (default 256MiB)
Expand Down Expand Up @@ -632,7 +631,9 @@ Environment variable will be override by command line arguments if they both pre

Build by your own `docker build -t executorserver -f Dockerfile.exec .`

The `executorserver` need root privilege to create `cgroup`. Either creates sub-directory `/sys/fs/cgroup/cpuacct/executor_server`, `/sys/fs/cgroup/memory/executor_server`, `/sys/fs/cgroup/pids/executor_server` and make execution user readable or use `sudo` to run it.
For cgroup v1, the `executorserver` need root privilege to create `cgroup`. Either creates sub-directory `/sys/fs/cgroup/cpuacct/executor_server`, `/sys/fs/cgroup/memory/executor_server`, `/sys/fs/cgroup/pids/executor_server` and make execution user readable or use `sudo` to run it.

For cgroup v2, systemd dbus will be used to create a transient scope for cgroup integration.

#### Build Shared object

Expand Down Expand Up @@ -725,7 +726,9 @@ If a bind mount is specifying a target within the previous mounted one, please e

The cgroup v2 is supported by `executorserver` now when running as root since more Linux distribution are enabling cgroup v2 by default (e.g. Ubuntu 21.10+, Fedora 31+). However, for kernel < 5.19, due to missing `memory.max_usage_in_bytes` in `memory` controller, the memory usage is now accounted by `maxrss` returned by `wait4` syscall. Thus, the memory usage appears higher than those who uses cgroup v1. For kernel >= 5.19, `memory.peak` is being used.

When running in containers, the `executorserver` will migrate all processed into `/init` hierarchy to enable nesting support.
When running in containers, the `executorserver` will migrate all processed into `/api` hierarchy to enable nesting support.

When running in Linux distributions powered by `systemd`, the `executorserver` will contact `systemd` via `dbus` to create a transient scope as cgroup root.

#### CentOS 7

Expand Down
2 changes: 1 addition & 1 deletion cmd/executorserver/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ type Config struct {
MountConf string `flagUsage:"specifies mount configuration file" default:"mount.yaml"`
SeccompConf string `flagUsage:"specifies seccomp filter" default:"seccomp.yaml"`
Parallelism int `flagUsage:"control the # of concurrency execution (default equal to number of cpu)"`
CgroupPrefix string `flagUsage:"control cgroup prefix" default:"executor_server"`
CgroupPrefix string `flagUsage:"control cgroup prefix" default:"gojudge"`
ContainerCredStart int `flagUsage:"control the start uid&gid for container (0 uses unprivileged root)" default:"0"`

// file store
Expand Down
120 changes: 99 additions & 21 deletions env/env_linux.go
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
package env

import (
"context"
"fmt"
"os"
"sync/atomic"
"syscall"

"github.com/coreos/go-systemd/v22/dbus"
"github.com/criyle/go-judge/env/linuxcontainer"
"github.com/criyle/go-judge/env/pool"
"github.com/criyle/go-sandbox/container"
"github.com/criyle/go-sandbox/pkg/cgroup"
"github.com/criyle/go-sandbox/pkg/forkexec"
"github.com/criyle/go-sandbox/pkg/mount"
ddbus "github.com/godbus/dbus/v5"
"golang.org/x/sys/unix"
)

Expand Down Expand Up @@ -111,35 +114,17 @@ func NewBuilder(c Config) (pool.EnvBuilder, map[string]any, error) {
ContainerUID: cUID,
ContainerGID: cGID,
}
t := cgroup.DetectType()
if t == cgroup.CgroupTypeV2 {
c.Info("Enable cgroup v2 nesting support")
if err := cgroup.EnableV2Nesting(); err != nil {
c.Warn("Enable cgroup v2 failed", err)
}
}
cgb := cgroup.NewBuilder(c.CgroupPrefix).WithType(t).WithCPUAcct().WithMemory().WithPids().WithCPUSet()
if c.EnableCPURate {
cgb = cgb.WithCPU()
}
cgb, err = cgb.FilterByEnv()
cgb, err := newCgroup(c)
if err != nil {
c.Error("Failed to create cgroup ", c.CgroupPrefix, " ", err)
return nil, nil, err
}
c.Info("Test created cgroup builder with: ", cgb)
if cg, err := cgb.Random(""); err != nil {
c.Warn("Tested created cgroup with error: ", err)
c.Warn("Failed back to rlimit / rusage mode")
cgb = nil
} else {
cg.Destroy()
}

var cgroupPool linuxcontainer.CgroupPool
if cgb != nil {
cgroupPool = linuxcontainer.NewFakeCgroupPool(cgb, c.CPUCfsPeriod)
}
cgroupType := int(t)
cgroupType := int(cgroup.DetectedCgroupType)
if cgb == nil {
cgroupType = 0
}
Expand All @@ -163,6 +148,99 @@ func NewBuilder(c Config) (pool.EnvBuilder, map[string]any, error) {
}, nil
}

func newCgroup(c Config) (cgroup.Cgroup, error) {
prefix := c.CgroupPrefix
t := cgroup.DetectedCgroupType
ct, err := cgroup.GetAvailableController()
if err != nil {
c.Error("Failed to get available controllers", err)
return nil, err
}
if t == cgroup.CgroupTypeV2 {
// Check if running on a systemd enabled system
c.Info("Running with cgroup v2, connecting systemd dbus to create cgroup")
var conn *dbus.Conn
if os.Getuid() == 0 {
conn, err = dbus.NewSystemConnectionContext(context.TODO())
} else {
conn, err = dbus.NewUserConnectionContext(context.TODO())
}
if err != nil {
c.Info("Connecting to systemd dbus failed:", err)
c.Info("Assuming running in container, enable cgroup v2 nesting support and take control of the whole cgroupfs")
prefix = ""
} else {
defer conn.Close()

scopeName := c.CgroupPrefix + ".scope"
c.Info("Connected to systemd bus, attempting to create transient unit: ", scopeName)

properties := []dbus.Property{
dbus.PropDescription("go judge - a high performance sandbox service base on container technologies"),
dbus.PropWants(scopeName),
dbus.PropPids(uint32(os.Getpid())),
newSystemdProperty("Delegate", true),
}
ch := make(chan string, 1)
if _, err := conn.StartTransientUnitContext(context.TODO(), scopeName, "replace", properties, ch); err != nil {
c.Error("Failed to start transient unit ", err)
return nil, err
}
s := <-ch
if s != "done" {
c.Error("Starting transient unit returns ", s)
return nil, err
}
scopeName, err := cgroup.GetCurrentCgroupPrefix()
if err != nil {
return nil, err
}
c.Info("Current cgroup is ", scopeName)
prefix = scopeName
ct, err = cgroup.GetAvailableControllerWithPrefix(prefix)
if err != nil {
return nil, err
}
}
}
cgb, err := cgroup.New(prefix, ct)
if err != nil {
if os.Getuid() == 0 {
c.Error("Failed to create cgroup ", prefix, " ", err)
return nil, err
}
c.Warn("Not running in root and have no permission on cgroup, falling back to rlimit / rusage mode")
return nil, nil
}
// Create api and migrate current process into it
c.Info("Creating nesting api cgroup ", cgb)
if _, err = cgb.Nest("api"); err != nil {
// Only allow to fall back to rlimit mode when not running with root
if os.Getuid() != 0 {
c.Warn("Creating api cgroup with error: ", err)
c.Warn("As running in non-root mode, falling back back to rlimit / rusage mode")
cgb.Destroy()
return nil, nil
}
}

c.Info("Creating containers cgroup")
cg, err := cgb.New("containers")
if err != nil {
c.Warn("Creating containers cgroup with error: ", err)
c.Warn("Falling back to rlimit / rusage mode")
cgb = nil
}
return cg, nil
}

func newSystemdProperty(name string, units interface{}) dbus.Property {
return dbus.Property{
Name: name,
Value: ddbus.MakeVariant(units),
}
}

type credGen struct {
cur uint32
}
Expand Down
30 changes: 16 additions & 14 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,47 +3,49 @@ module github.com/criyle/go-judge
go 1.21

require (
github.com/coreos/go-systemd/v22 v22.5.0
github.com/creack/pty v1.1.20
github.com/criyle/go-sandbox v0.9.17
github.com/criyle/go-sandbox v0.10.0
github.com/elastic/go-seccomp-bpf v1.3.0
github.com/elastic/go-ucfg v0.8.6
github.com/gin-contrib/zap v0.2.0
github.com/gin-gonic/gin v1.9.1
github.com/godbus/dbus/v5 v5.1.0
github.com/golang/protobuf v1.5.3
github.com/gorilla/websocket v1.5.0
github.com/gorilla/websocket v1.5.1
github.com/grpc-ecosystem/go-grpc-middleware v1.4.0
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7
github.com/prometheus/client_golang v1.17.0
github.com/zsais/go-gin-prometheus v0.1.0
go.uber.org/zap v1.26.0
golang.org/x/crypto v0.14.0
golang.org/x/net v0.17.0
golang.org/x/sync v0.4.0
golang.org/x/sys v0.13.0
golang.org/x/crypto v0.15.0
golang.org/x/net v0.18.0
golang.org/x/sync v0.5.0
golang.org/x/sys v0.14.0
google.golang.org/grpc v1.59.0
google.golang.org/protobuf v1.31.0
gopkg.in/yaml.v2 v2.4.0
)

require (
cloud.google.com/go/compute/metadata v0.2.3 // indirect
cloud.google.com/go/compute v1.23.2 // indirect
github.com/BurntSushi/toml v1.3.2 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/bytedance/sonic v1.10.2 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/chenzhuoyu/base64x v0.0.0-20230717121745-296ad89f973d // indirect
github.com/chenzhuoyu/iasm v0.9.0 // indirect
github.com/chenzhuoyu/iasm v0.9.1 // indirect
github.com/fatih/camelcase v1.0.0 // indirect
github.com/fatih/structs v1.1.0 // indirect
github.com/gabriel-vasile/mimetype v1.4.3 // indirect
github.com/gin-contrib/sse v0.1.0 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.15.5 // indirect
github.com/go-playground/validator/v10 v10.16.0 // indirect
github.com/goccy/go-json v0.10.2 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/cpuid/v2 v2.2.5 // indirect
github.com/klauspost/cpuid/v2 v2.2.6 // indirect
github.com/leodido/go-urn v1.2.4 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/matttproud/golang_protobuf_extensions/v2 v2.0.0 // indirect
Expand All @@ -57,10 +59,10 @@ require (
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
github.com/ugorji/go/codec v1.2.11 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/arch v0.5.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231016165738-49dd2c1f3d0b // indirect
golang.org/x/arch v0.6.0 // indirect
golang.org/x/term v0.14.0 // indirect
golang.org/x/text v0.14.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231106174013-bbf56f31fb17 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

Expand Down
Loading

0 comments on commit 4de8c98

Please sign in to comment.