Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oarsub -t deploy -I does not move the user to its slice #128

Open
lnussbaum opened this issue Feb 21, 2017 · 13 comments
Open

oarsub -t deploy -I does not move the user to its slice #128

lnussbaum opened this issue Feb 21, 2017 · 13 comments

Comments

@lnussbaum
Copy link

after oarsub -t deploy -I, I would expect my shell to be in my user slice, according to systemd-cgls.

But this is not the case. my shell (or the script I run inside my job) is located in the cgroup named "oar-node.service"

This is probably because the login mechanism used by OAR does not go through PAM.

As a result, it is not possible to enforce per-slice limits, for example.

@npf
Copy link
Contributor

npf commented Feb 23, 2017

With genepi-21 configured as a OAR frontend (and deploy frontend) + pam_systemd added to common-session, I get:

pneyron@genepi-21:~$ cat /proc/self/cgroup 
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-10106.slice/session-35.scope
pneyron@genepi-21:~$ oarsub -I -t deploy
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=16
Interactive mode : waiting...
Starting...

Connect to OAR job 16 via the node genepi-21
['/usr/bin/ssh','-p','6667','-x','-t','genepi-21','bash -c \'echo $PPID >> /var/lib/oar/oarsub_connections_16 && TTY=$(tty) && test -e $TTY && oardodo chown pneyron:oar $TTY && oardodo chmod 660 $TTY\' && OARDO_BECOME_USER=pneyron oardodo bash --noprofile --norc -c \'if [ "a$TERM" == "a" ] || [ "x$TERM" == "xunknown" ];then    export TERM=xterm;fi;export OAR_FILE_NODES="/var/lib/oar/16";export OAR_JOBID=16;export OAR_ARRAYID=16;export OAR_ARRAYINDEX=1;export OAR_USER="pneyron";export OAR_WORKDIR="/home/pneyron";export OAR_RESOURCE_PROPERTIES_FILE="/var/lib/oar/16_resources";export OAR_NODEFILE=$OAR_FILE_NODES;export OAR_O_WORKDIR=$OAR_WORKDIR;export OAR_NODE_FILE=$OAR_FILE_NODES;export OAR_RESOURCE_FILE=$OAR_FILE_NODES;export OAR_WORKING_DIRECTORY=$OAR_WORKDIR;export OAR_JOB_ID=$OAR_JOBID;export OAR_ARRAY_ID=$OAR_ARRAYID;export OAR_ARRAY_INDEX=$OAR_ARRAYINDEX;export OAR_JOB_NAME="";export OAR_PROJECT_NAME="default";export OAR_JOB_WALLTIME="2:0:0";export OAR_JOB_WALLTIME_SECONDS=7200;export SHELL="/bin/bash";export SUDO_COMMAND=OAR;SHLVL=1;if ( cd "$OAR_WORKING_DIRECTORY" &> /dev/null );then    cd "$OAR_WORKING_DIRECTORY";else    exit 2;fi;(exec -a -${SHELL##*/} $SHELL);exit 0\'']
pneyron@genepi-21:~$ cat /proc/self/cgroup 
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-116.slice/session-54.scope
pneyron@genepi-21:~$ 

-> seems ok... but it does not work the same with the real Grid'5000 frontend, I don't know why.

To debug, I dumped the command executed to connect to the deploy frontend:

root@genepi-21:/usr/lib/oar# diff -Naur oarsub*
--- oarsub	2017-02-23 21:23:34.985737879 +0100
+++ oarsub.orig	2017-02-23 21:14:43.063299950 +0100
@@ -217,7 +217,6 @@
         #UID=EUID
         $< = $>;
         print("Connect to OAR job $job_id via the node $host_to_connect_via_ssh\n");
-print Dumper(\@cmd)."\n";
         system({$cmd[0]} @cmd);
         my $exit_value = $? >> 8;
         if ($exit_value == 2){

@npf
Copy link
Contributor

npf commented Feb 23, 2017

Work also ok with oarsub -C [jobid]

pneyron@genepi-21:~$ cat /proc/self/cgroup 
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-10106.slice/session-50.scope
pneyron@genepi-21:~$ oarsub -C 17
Connect to OAR job 17 via the node genepi-21
['/usr/bin/ssh','-p','6667','-x','-t','genepi-21','bash -c \'echo $PPID >> /var/lib/oar/oarsub_connections_17 && TTY=$(tty) && test -e $TTY && oardodo chown pneyron:oar $TTY && oardodo chmod 660 $TTY\' && OARDO_BECOME_USER=pneyron oardodo bash --noprofile --norc -c \'if [ "a$TERM" == "a" ] || [ "x$TERM" == "xunknown" ];then    export TERM=xterm;fi;export OAR_FILE_NODES="/var/lib/oar/17";export OAR_JOBID=17;export OAR_ARRAYID=17;export OAR_ARRAYINDEX=1;export OAR_USER="pneyron";export OAR_WORKDIR="/home/pneyron";export OAR_RESOURCE_PROPERTIES_FILE="/var/lib/oar/17_resources";export OAR_NODEFILE=$OAR_FILE_NODES;export OAR_O_WORKDIR=$OAR_WORKDIR;export OAR_NODE_FILE=$OAR_FILE_NODES;export OAR_RESOURCE_FILE=$OAR_FILE_NODES;export OAR_WORKING_DIRECTORY=$OAR_WORKDIR;export OAR_JOB_ID=$OAR_JOBID;export OAR_ARRAY_ID=$OAR_ARRAYID;export OAR_ARRAY_INDEX=$OAR_ARRAYINDEX;export OAR_JOB_NAME="";export OAR_PROJECT_NAME="default";export OAR_JOB_WALLTIME="2:0:0";export OAR_JOB_WALLTIME_SECONDS=7200;export SHELL="/bin/bash";export SUDO_COMMAND=OAR;SHLVL=1;if ( cd "$OAR_WORKING_DIRECTORY" &> /dev/null );then    cd "$OAR_WORKING_DIRECTORY";else    exit 2;fi;(exec -a -${SHELL##*/} $SHELL);exit 0\'']
pneyron@genepi-21:~$ cat /proc/self/cgroup 
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-116.slice/session-58.scope
pneyron@genepi-21:~$ 

@npf
Copy link
Contributor

npf commented Feb 23, 2017

My test environment is :

  • jessie-x64-std
  • oar-server & oar-user & postgresql
  • very basic setup in oar.conf
  • pam_systemd added to /etc/pam.d/common-session just like in grid'5000 frontends

@npf
Copy link
Contributor

npf commented Feb 23, 2017

Same with non interactive job:

pneyron@genepi-21:~$ oarsub -t deploy "hostname ; cat /proc/self/cgroup"
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=18
pneyron@genepi-21:~$ cat OAR.18.stdout
genepi-21.grenoble.grid5000.fr
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-116.slice/session-65.scope
pneyron@genepi-21:~$

@lnussbaum
Copy link
Author

so what is the difference between your setup (where indeed the result looks fine) and Grid'5000 frontends?
Maybe a PAM config problem ?

@npf
Copy link
Contributor

npf commented Feb 24, 2017

My guess:

  • Debian might be more recent in the jessie-std env than on the frontend ?
  • indeed also need to check the actual differences between the 2 pam configs : in the jessie-std env, I just added:
session optional        pam_systemd.so

@lnussbaum
Copy link
Author

the frontends were built by upgrading wheezy systems. So yes, it's possible that something was left behind. But I did not see anything obvious...

@npf
Copy link
Contributor

npf commented Feb 24, 2017

Closing the issue here, follow-up on g5k's bugzilla: https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=6560.

@npf npf closed this as completed Feb 24, 2017
@npf
Copy link
Contributor

npf commented Feb 24, 2017

user-116 is actually the oar user (uid 116) -> no good in fact.
On grid'5000 however the slice is

1:name=systemd:/system.slice/oar-node.service

which is a system slice, not a user slice as I see in my tests -> weird

@npf npf reopened this Feb 24, 2017
@npf
Copy link
Contributor

npf commented Feb 24, 2017

using systemd-run --scope + su, I kinda manage to get what we would like to get:

oar@genepi-2:~$ id oar
uid=116(oar) gid=126(oar) groups=27(sudo),126(oar)
oar@genepi-2:~$ cat /proc/self/cgroup 
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-116.slice/session-64.scope
oar@genepi-2:~$ sudo systemd-run --scope su pneyron -c "cat /proc/self/cgroup"
Running as unit run-13218.scope.
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-10106.slice/session-c36.scope
oar@genepi-2:~$ id pneyron
uid=10106(pneyron) gid=8000(users)...

But it does not work without "systemd-run --scope":

oar@genepi-2:~$ sudo su pneyron -c "cat /proc/self/cgroup"
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-116.slice/session-64.scope

Also it does not work without su in the loop, e.g.:

oar@genepi-2:~$ sudo systemd-run --scope --setenv OARDO_BECOME_USER=pneyron /usr/lib/oar/oardodo/oardodo bash -c "id ; cat /proc/self/cgroup"
Running as unit run-13388.scope.
uid=10106(pneyron) gid=8000(users) groups=8000(users),27(sudo),9004(user),9005(account-manager),9006(site-manager),9016(ciment),9023(devel),9024(ml-users),9027(digitalis),9998(ct),15000(grenoble),15001(grenoble-staff)
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/system.slice/run-13388.scope

or

oar@genepi-2:~$ sudo systemd-run --scope --uid 10106 --gid 8000 bash -c "id ; cat /proc/self/cgroup"
Running as unit run-13496.scope.
uid=10106(pneyron) gid=8000(users) groups=8000(users),0(root)
8:perf_event:/
7:blkio:/
6:net_cls,net_prio:/
5:freezer:/
4:devices:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/system.slice/run-13496.scope

NB: also note group 0 in the latter output.

=> Seems like oardodo should maybe:

  • call systemd-run --scope → C function call ?
  • create a (pam) session ? (journalctl -f shows not pam activity with oardodo, while it does with su)

@vdanjean
Copy link
Contributor

vdanjean commented Feb 24, 2017

A way to get a slice with only oardodo (but a 'run' slice, not a 'session' one) :

$ /usr/lib/oar/oardodo/oardodo systemd-run --scope --slice user-1002 --setenv OARDO_BECOME_USER=cbardel  /usr/lib/oar/oardodo/oardodo bash -c "id ; cat /proc/self/cgroup "
Running scope as unit: run-rb894e831d93346aabd644bb0c8cb1204.scope
uid=1002(cbardel) gid=1002(cbardel) groups=1002(cbardel),111(netdev)
10:freezer:/
9:blkio:/user.slice
8:cpu,cpuacct:/user.slice
7:perf_event:/
6:net_cls,net_prio:/
5:cpuset:/
4:memory:/user.slice
3:devices:/user.slice
2:pids:/user.slice/user-1002.slice/run-rb894e831d93346aabd644bb0c8cb1204.scope
1:name=systemd:/user.slice/user-1002.slice/run-rb894e831d93346aabd644bb0c8cb1204.scope

This does not go though pam however.

@npf
Copy link
Contributor

npf commented Feb 24, 2017

Next stop could be:

  • see how to make oardodo talk to pam (but as an option since we may not want that on the nodes, just on frontends)
  • see how to have systemd-run --scope called in the middle ?

@lnussbaum
Copy link
Author

lnussbaum commented Feb 27, 2017

Regarding talking to pam, since oardodo is in C, good starting points are:
https://linux.die.net/man/3/pam
and https://linux.die.net/man/3/pam_open_session

It would probably be better to just talk to PAM, and not call systemd-run manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants