-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databrowser api #64
Databrowser api #64
Conversation
Because the services of MariaDB, DatabrowserAPI and Apache httpd will be deployed | ||
on docker container images, docker needs to be available on the target servers. | ||
Since version *v2309.0.0* of the deployment the containers are set up | ||
using `doker-compose`. Hence `docker-compose` (or `podman-compose`) has to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using docker-compose
. Hence docker-compose
(or podman-compose
) has to be
:maxdepth: 0 | ||
:titlesonly: | ||
|
||
v2309.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the correct version?
shouldn't be sthing like v2402.0.0
Because the services of MariaDB, DatabrowserAPI and Apache httpd will be deployed | ||
on docker container images, docker needs to be available on the target servers. | ||
Since version *v2309.0.0* of the deployment the containers are set up | ||
using `doker-compose`. Hence `docker-compose` (or `podman-compose`) has to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using docker-compose
. Hence docker-compose
(or podman-compose
) has to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm (2 typos, many things that I do not get, but from which I could not meaningfully contribute anyway in short term)
one question -> teh version number shouldnÄt be v2402.0.0 ?
General comment: I tried to load an old (without the databrowser-bits) TOML file. It could be loaed and everything was filled out accordingly except the databrowser-bits, which was to be expected. I filled them out and tried to re-save the toml, but it didn't work:
I am confused why it doesn't try to integrate my changes into a new toml. Trying to execute this config doesn't work either.
|
I noticed that as well and I fixed that. It took me a while to figure out what was going on. The problem is that the |
After I edited the file manually the deployment-proces started, but I got the message again, that the file couldn't be save because
No idea where this comes from and wether it has to do with the fact, that I was using a pre-existing config. |
After I tried to redeploy everything and delete all the existing data (not manually on the machine, but just expecting it happens automagically during deployment) the deployment crashed (very lengthy error message which I will not post here, yet as I am still investigating) but all containers were up and running. I noticed that the redis and the httpd-container were still running from a previous deployment process a week ago. This can be a coincidence since the job crashed anyway, but I just wanted to mention it here, because it felt like those two containers just weren't touched at all. |
Ok, so it seems, that the stuff going on in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I give up for now and we should probably discuss this stuff next week, as I was not able to deploy freva successfully and I do not really understand why.
After I fixed all the obvious things (outdated config-file and the redis-stuff) I let it run which resulted in the following error message:
fatal: [vaskebjorn2.cloud.dkrz.de]: FAILED! => {"changed": true, "cmd": "$PYTHON3 deploy.py /opt/frevacond -s --arch Linux-x86_64 --packages freva", "delta": "0:33:16.020783", "end": "2024-02-29 16:56:22.008307", "msg": "non-zero return code", "rc": 1, "start": "2024-02-29 16:23:05.987524", "stderr": "Traceback (most recent call last):
File \"/tmp/evaluation_system/deploy.py\", line 457, in <module>
Inst.create_conda()
File \"/tmp/evaluation_system/deploy.py\", line 274, in create_conda
self.run_cmd(cmd)
File \"/tmp/evaluation_system/deploy.py\", line 241, in run_cmd
raise CalledProcessError(res.returncode, cmd)
subprocess.CalledProcessError: Command '/tmp/conda6ljuo2te/env/bin/conda create -c conda-forge -q -p /opt/frevacond python freva conda pip -y' returned non-zero exit status 1.", "stderr_lines": ["Traceback (most recent call last):", " File \"/tmp/evaluation_system/deploy.py\", line 457, in <module>", " Inst.create_conda()", " File \"/tmp/evaluation_system/deploy.py\", line 274, in create_conda", " self.run_cmd(cmd)", " File \"/tmp/evaluation_system/deploy.py\", line 241, in run_cmd", " raise CalledProcessError(res.returncode, cmd)", "subprocess.CalledProcessError: Command '/tmp/conda6ljuo2te/env/bin/conda create -c conda-forge -q -p /opt/frevacond python freva conda pip -y' returned non-zero exit status 1."], "stdout": "
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 700, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 441, in _error_catcher
yield
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 767, in read_chunked
self._update_chunk_length()
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 704, in _update_chunk_length
raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/requests/models.py\", line 760, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 575, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 796, in read_chunked
self._original_response.close()
File \"/tmp/conda6ljuo2te/env/lib/python3.9/contextlib.py\", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 458, in _error_catcher
raise ProtocolError(\"Connection broken: %r\" % e, e)
urllib3.exceptions.ProtocolError: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))
There is more to it, but I thing this is the relevant bit. Those fatal
Error-messages are extremely poorly formatted and I always have copy them into a separate texteditor to add some newline in order to make them somewhat readable. Is it possible to improve the readability of the error messages?
Anyway, the error happened during TASK [Deploying evaluation_system]
and apparently during the installation of a (temporary?) conda-environment. Next, I de-selected the block 3. Install a new conda environment
during the core-deployment and tried it again.
This time it crashed with:
TASK [Cloning the evluation_system reposiotry] ****************************************************************************************
fatal: [vaskebjorn2.cloud.dkrz.de]: FAILED! => {"msg": "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user (rc: 1, err: chmod: invalid mode: ‘A+user:\"root\"\\n:rx:allow’\nTry 'chmod --help' for more information.\n}). For information on working around this, see https://docs.ansible.com/ansible-core/2.14/user_guide/become.html#risks-of-becoming-an-unprivileged-user"}
It seems that I wasn't allowed to change permissions to a folder? Unfortunatelly it does not say where.
And now I give up and we have to discuss this in person or so...
@@ -148,13 +162,49 @@ def _convert_dict( | |||
inp_dict[key] = get_current_file_dir(cfd, value) | |||
|
|||
|
|||
def _update_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this get triggered? I saw that it more or less could load my config, but I didn't update anything. I had to do it by hand
- "/opt/freva/{{project_name}}/web_service/freva_web.conf:/usr/local/apache2/conf/httpd.conf:z" | ||
- "/opt/freva/{{project_name}}/web_service/server-cert.crt:/etc/ssl/certs/server-cert.crt:z" | ||
- "/opt/freva/{{project_name}}/web_service/server-key.key:/etc/ssl/private/server-key.key:z" | ||
- "/opt/freva/{{project_name}}/web_service/cacert.pem:/etc/ssl/certs/cacert.pem:z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to get rid of the cacert. According to Gerald it is a bad practice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I said that it worked for some reason I didn't understand last time, but I was wrong (probably I had changed something locally, no idea):
l.253:
_webserver_items["REDIS_HOST"] = f"{self.project_name}-redis"
The redis-container cannot be found by the web and it doesn't work. So is the compose-stuff still not running properly? Everywhere else when you try to connect to another container you are using the hostname of the VM and not the container-name like here. Why?
That's very interesting. The error! I'll have a look, next week I'll order a "virgin" vm and see. About the error messages. Tahat is ansible. I am not aware of any real alternatives, but I can try to check. There is ansible-runner, which parses json files as output, so that std out from tasks can be parsed, I figured that might be something interesting. |
Ok, there are still issues. I have fixed some of them in a different branch. What I would suggest is the following: I will merge this branch into the current branch. And let you know when you can take a look at it. Since the ultimate goal is to create a version of the deployment that deals with versions of the microservices we can also do the following. I keep changing things and you review the outcome. But beware those changes might be big, as in a refactoring of the current code. What do you think? |
Do it. I am beyond the point where I can follow all of the code changes but I can definitely try stuff out and give you my opinion and will also occasionally comment on code (but not necessarily as a main focus). As soon as you are really back we should probably do another deployment-session with the new colleagues to hear their opinion on stuff. |
I will merge this branch into another branch. |
This should be the PR for setting up docker-compose. Sorry, I had forgotten this PR for a few months since September last year.
I have just merged this branch with the main branch and hope it still works.
Have fun with this. There is still more coming!