Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECE 2.13+ and 3.x does not bootstrap on SLES #155

Open
obierlaire opened this issue Apr 28, 2022 · 2 comments
Open

ECE 2.13+ and 3.x does not bootstrap on SLES #155

obierlaire opened this issue Apr 28, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@obierlaire
Copy link
Contributor

Starting 2.13 and above (including 3.0 and above), ECE does not bootstrap on SLES 12 and 15, with docker 19 or 20:

Details

bootstrap logs:

- Starting local runner {}
- Started local runner {}
- Waiting for runner container node {}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Errors have caused Elastic Cloud Enterprise installation to fail - Please check logs 
  Node type - initial
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

in docker logs of runner:

ok: run: docker-socket-proxy: (pid 30) 2s
Traceback (most recent call last):
  File "/elastic_cloud_apps/runner/write_config.py", line 10, in <module>
    with open('runner.conf', 'w') as dest:
PermissionError: [Errno 13] Permission denied: 'runner.conf'

What I noticed is ece user is well in passwd and group, and elastic well belongs to ece group! So this failure should not happen.

elastic:x:1000:1000::/home/elastic:/bin/false
ece:x:199:199::/home/ece:/bin/bash

ece:x:199:elastic
elastic:x:1000:

Indeed, path to runner.conf :

$ ls -lah /elastic_cloud_apps/runner
total 16K
drwxrwxr-x 1 199     199       65 Apr 28 14:36 .

On ubuntu, user ece is well set as owner of /elastic_cloud_apps/runner, but on SLES it shows its uid 199
For bootstrapper docker container, it's well displayed ece and not its uid

Also, the following command does not work:

$ setuser ece whoami
setuser: user ece not found

This does not make sense as ece user is well defined in /etc/passwd
Again, it's all good on ubuntu and on SLES from inside boostrapper container

My guess is that docker have issues with mapping uid/gid between the host and the container. Indeed, the user/group ece does not exists on the host. And so, elastic does not belong to group ece on the host.

Workaround

On the host, create a user and group named ece with uid/gid both 199, and add user elastic to ece group.
Then run ECE installer, and that should work!

@obierlaire obierlaire added the bug Something isn't working label Apr 28, 2022
@obierlaire
Copy link
Contributor Author

Since 2.13+, with https://github.com/elastic/cloud/pull/82702, we are mounting /run into the runner container.

Also, even if we workaround the uid/gid problem (cf decription), I noticed that runner cannot talk to zookeeper and so is still not detected as running. If you log into runner container, hosts are not resolved anymore:

$ ping containerhost
ping: containerhost: Name or service not known

While containerhost is well described in :

[root@831b834c4693 /]# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.31.3.6	containerhost
172.17.0.1	831b834c4693

I noticed that bootstrapper mounts /run into the container since 2.13 and if I start the runner container manually without mounting /run, I can well resolve hosts.

@obierlaire
Copy link
Contributor Author

A workaround that is working for uid/gid problem and /etc/hosts problem seems to be disabling/uninstalling nscd: #156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant