Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databrowser api #64

Closed
wants to merge 24 commits into from
Closed

Databrowser api #64

wants to merge 24 commits into from

Conversation

antarcticrainforest
Copy link
Member

This should be the PR for setting up docker-compose. Sorry, I had forgotten this PR for a few months since September last year.

I have just merged this branch with the main branch and hope it still works.

Have fun with this. There is still more coming!

Because the services of MariaDB, DatabrowserAPI and Apache httpd will be deployed
on docker container images, docker needs to be available on the target servers.
Since version *v2309.0.0* of the deployment the containers are set up
using `doker-compose`. Hence `docker-compose` (or `podman-compose`) has to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using docker-compose. Hence docker-compose (or podman-compose) has to be

:maxdepth: 0
:titlesonly:

v2309.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the correct version?
shouldn't be sthing like v2402.0.0

Because the services of MariaDB, DatabrowserAPI and Apache httpd will be deployed
on docker container images, docker needs to be available on the target servers.
Since version *v2309.0.0* of the deployment the containers are set up
using `doker-compose`. Hence `docker-compose` (or `podman-compose`) has to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using docker-compose. Hence docker-compose (or podman-compose) has to be

Copy link
Contributor

@eelucio eelucio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (2 typos, many things that I do not get, but from which I could not meaningfully contribute anyway in short term)
one question -> teh version number shouldnÄt be v2402.0.0 ?

@Karinon
Copy link
Contributor

Karinon commented Feb 29, 2024

General comment: I tried to load an old (without the databrowser-bits) TOML file. It could be loaed and everything was filled out accordingly except the databrowser-bits, which was to be expected. I filled them out and tried to re-save the toml, but it didn't work:

 Couldn't save config: 
'Key "databrowser" does not exist.'

I am confused why it doesn't try to integrate my changes into a new toml. Trying to execute this config doesn't work either.

  File "/home/afast/workspace/freva-deployment/src/freva_deployment/deploy.py", line 513, in create_playbooks
    getattr(self, f"_prep_{step}")()
  File "/home/afast/workspace/freva-deployment/src/freva_deployment/deploy.py", line 149, in _prep_db
    self._prep_web(False)
  File "/home/afast/workspace/freva-deployment/src/freva_deployment/deploy.py", line 231, in _prep_web
    f'{self.cfg["databrowser"]["hosts"]}:'
KeyError: 'databrowser'

@antarcticrainforest
Copy link
Member Author

I noticed that as well and I fixed that. It took me a while to figure out what was going on. The problem is that the copy method of the buildin dict class doesn't do a deepcopy. I think I fixed that in another branch I was working on. I'll push those changes here as well.

@Karinon
Copy link
Contributor

Karinon commented Mar 1, 2024

After I edited the file manually the deployment-proces started, but I got the message again, that the file couldn't be save because

 Couldn't save config: 
'Key "solr" does not exist.'

No idea where this comes from and wether it has to do with the fact, that I was using a pre-existing config.

@Karinon
Copy link
Contributor

Karinon commented Mar 1, 2024

After I tried to redeploy everything and delete all the existing data (not manually on the machine, but just expecting it happens automagically during deployment) the deployment crashed (very lengthy error message which I will not post here, yet as I am still investigating) but all containers were up and running. I noticed that the redis and the httpd-container were still running from a previous deployment process a week ago.

This can be a coincidence since the job crashed anyway, but I just wanted to mention it here, because it felt like those two containers just weren't touched at all.

@Karinon
Copy link
Contributor

Karinon commented Mar 1, 2024

 Couldn't save config: 
'Key "solr" does not exist.'

Ok, so it seems, that the stuff going on in main_window.py reloads a cached json-config. For some reason this json-config has both the databrowser-block aswell as the solr-block. During the _save_config_to_file-routine it iterates over all the steps in self.config (the cached json-file) and try to open the corresponding items inside config_tmpl (the inventory-file?). Which obviously doesn't work as the inventory-file has no solr block. Hence, saving does not work

Copy link
Contributor

@Karinon Karinon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give up for now and we should probably discuss this stuff next week, as I was not able to deploy freva successfully and I do not really understand why.

After I fixed all the obvious things (outdated config-file and the redis-stuff) I let it run which resulted in the following error message:

fatal: [vaskebjorn2.cloud.dkrz.de]: FAILED! => {"changed": true, "cmd": "$PYTHON3 deploy.py /opt/frevacond -s --arch Linux-x86_64 --packages freva", "delta": "0:33:16.020783", "end": "2024-02-29 16:56:22.008307", "msg": "non-zero return code", "rc": 1, "start": "2024-02-29 16:23:05.987524", "stderr": "Traceback (most recent call last):
  File \"/tmp/evaluation_system/deploy.py\", line 457, in <module>
    Inst.create_conda()
  File \"/tmp/evaluation_system/deploy.py\", line 274, in create_conda
    self.run_cmd(cmd)
  File \"/tmp/evaluation_system/deploy.py\", line 241, in run_cmd
    raise CalledProcessError(res.returncode, cmd)
subprocess.CalledProcessError: Command '/tmp/conda6ljuo2te/env/bin/conda create -c conda-forge -q -p /opt/frevacond python freva conda pip -y' returned non-zero exit status 1.", "stderr_lines": ["Traceback (most recent call last):", "  File \"/tmp/evaluation_system/deploy.py\", line 457, in <module>", "    Inst.create_conda()", "  File \"/tmp/evaluation_system/deploy.py\", line 274, in create_conda", "    self.run_cmd(cmd)", "  File \"/tmp/evaluation_system/deploy.py\", line 241, in run_cmd", "    raise CalledProcessError(res.returncode, cmd)", "subprocess.CalledProcessError: Command '/tmp/conda6ljuo2te/env/bin/conda create -c conda-forge -q -p /opt/frevacond python freva conda pip -y' returned non-zero exit status 1."], "stdout": "

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 700, in _update_chunk_length
        self.chunk_left = int(line, 16)
    ValueError: invalid literal for int() with base 16: b''
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 441, in _error_catcher
        yield
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 767, in read_chunked
        self._update_chunk_length()
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 704, in _update_chunk_length
        raise InvalidChunkLength(self, line)
    urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/requests/models.py\", line 760, in generate
        for chunk in self.raw.stream(chunk_size, decode_content=True):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 575, in stream
        for line in self.read_chunked(amt, decode_content=decode_content):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 796, in read_chunked
        self._original_response.close()
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/contextlib.py\", line 137, in __exit__
        self.gen.throw(typ, value, traceback)
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 458, in _error_catcher
        raise ProtocolError(\"Connection broken: %r\" % e, e)
    urllib3.exceptions.ProtocolError: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))

There is more to it, but I thing this is the relevant bit. Those fatal Error-messages are extremely poorly formatted and I always have copy them into a separate texteditor to add some newline in order to make them somewhat readable. Is it possible to improve the readability of the error messages?

Anyway, the error happened during TASK [Deploying evaluation_system] and apparently during the installation of a (temporary?) conda-environment. Next, I de-selected the block 3. Install a new conda environment during the core-deployment and tried it again.

This time it crashed with:

TASK [Cloning the evluation_system reposiotry] ****************************************************************************************
fatal: [vaskebjorn2.cloud.dkrz.de]: FAILED! => {"msg": "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user (rc: 1, err: chmod: invalid mode: ‘A+user:\"root\"\\n:rx:allow’\nTry 'chmod --help' for more information.\n}). For information on working around this, see https://docs.ansible.com/ansible-core/2.14/user_guide/become.html#risks-of-becoming-an-unprivileged-user"}

It seems that I wasn't allowed to change permissions to a folder? Unfortunatelly it does not say where.

And now I give up and we have to discuss this in person or so...

@@ -148,13 +162,49 @@ def _convert_dict(
inp_dict[key] = get_current_file_dir(cfd, value)


def _update_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this get triggered? I saw that it more or less could load my config, but I didn't update anything. I had to do it by hand

- "/opt/freva/{{project_name}}/web_service/freva_web.conf:/usr/local/apache2/conf/httpd.conf:z"
- "/opt/freva/{{project_name}}/web_service/server-cert.crt:/etc/ssl/certs/server-cert.crt:z"
- "/opt/freva/{{project_name}}/web_service/server-key.key:/etc/ssl/private/server-key.key:z"
- "/opt/freva/{{project_name}}/web_service/cacert.pem:/etc/ssl/certs/cacert.pem:z"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to get rid of the cacert. According to Gerald it is a bad practice...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said that it worked for some reason I didn't understand last time, but I was wrong (probably I had changed something locally, no idea):

l.253:

_webserver_items["REDIS_HOST"] = f"{self.project_name}-redis"

The redis-container cannot be found by the web and it doesn't work. So is the compose-stuff still not running properly? Everywhere else when you try to connect to another container you are using the hostname of the VM and not the container-name like here. Why?

@antarcticrainforest
Copy link
Member Author

I give up for now and we should probably discuss this stuff next week, as I was not able to deploy freva successfully and I do not really understand why.

After I fixed all the obvious things (outdated config-file and the redis-stuff) I let it run which resulted in the following error message:

fatal: [vaskebjorn2.cloud.dkrz.de]: FAILED! => {"changed": true, "cmd": "$PYTHON3 deploy.py /opt/frevacond -s --arch Linux-x86_64 --packages freva", "delta": "0:33:16.020783", "end": "2024-02-29 16:56:22.008307", "msg": "non-zero return code", "rc": 1, "start": "2024-02-29 16:23:05.987524", "stderr": "Traceback (most recent call last):
  File \"/tmp/evaluation_system/deploy.py\", line 457, in <module>
    Inst.create_conda()
  File \"/tmp/evaluation_system/deploy.py\", line 274, in create_conda
    self.run_cmd(cmd)
  File \"/tmp/evaluation_system/deploy.py\", line 241, in run_cmd
    raise CalledProcessError(res.returncode, cmd)
subprocess.CalledProcessError: Command '/tmp/conda6ljuo2te/env/bin/conda create -c conda-forge -q -p /opt/frevacond python freva conda pip -y' returned non-zero exit status 1.", "stderr_lines": ["Traceback (most recent call last):", "  File \"/tmp/evaluation_system/deploy.py\", line 457, in <module>", "    Inst.create_conda()", "  File \"/tmp/evaluation_system/deploy.py\", line 274, in create_conda", "    self.run_cmd(cmd)", "  File \"/tmp/evaluation_system/deploy.py\", line 241, in run_cmd", "    raise CalledProcessError(res.returncode, cmd)", "subprocess.CalledProcessError: Command '/tmp/conda6ljuo2te/env/bin/conda create -c conda-forge -q -p /opt/frevacond python freva conda pip -y' returned non-zero exit status 1."], "stdout": "

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 700, in _update_chunk_length
        self.chunk_left = int(line, 16)
    ValueError: invalid literal for int() with base 16: b''
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 441, in _error_catcher
        yield
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 767, in read_chunked
        self._update_chunk_length()
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 704, in _update_chunk_length
        raise InvalidChunkLength(self, line)
    urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/requests/models.py\", line 760, in generate
        for chunk in self.raw.stream(chunk_size, decode_content=True):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 575, in stream
        for line in self.read_chunked(amt, decode_content=decode_content):
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 796, in read_chunked
        self._original_response.close()
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/contextlib.py\", line 137, in __exit__
        self.gen.throw(typ, value, traceback)
      File \"/tmp/conda6ljuo2te/env/lib/python3.9/site-packages/urllib3/response.py\", line 458, in _error_catcher
        raise ProtocolError(\"Connection broken: %r\" % e, e)
    urllib3.exceptions.ProtocolError: (\"Connection broken: InvalidChunkLength(got length b'', 0 bytes read)\", InvalidChunkLength(got length b'', 0 bytes read))

There is more to it, but I thing this is the relevant bit. Those fatal Error-messages are extremely poorly formatted and I always have copy them into a separate texteditor to add some newline in order to make them somewhat readable. Is it possible to improve the readability of the error messages?

Anyway, the error happened during TASK [Deploying evaluation_system] and apparently during the installation of a (temporary?) conda-environment. Next, I de-selected the block 3. Install a new conda environment during the core-deployment and tried it again.

This time it crashed with:

TASK [Cloning the evluation_system reposiotry] ****************************************************************************************
fatal: [vaskebjorn2.cloud.dkrz.de]: FAILED! => {"msg": "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user (rc: 1, err: chmod: invalid mode: ‘A+user:\"root\"\\n:rx:allow’\nTry 'chmod --help' for more information.\n}). For information on working around this, see https://docs.ansible.com/ansible-core/2.14/user_guide/become.html#risks-of-becoming-an-unprivileged-user"}

It seems that I wasn't allowed to change permissions to a folder? Unfortunatelly it does not say where.

And now I give up and we have to discuss this in person or so...

That's very interesting. The error! I'll have a look, next week I'll order a "virgin" vm and see. About the error messages. Tahat is ansible. I am not aware of any real alternatives, but I can try to check. There is ansible-runner, which parses json files as output, so that std out from tasks can be parsed, I figured that might be something interesting.

@antarcticrainforest
Copy link
Member Author

Ok, there are still issues. I have fixed some of them in a different branch. What I would suggest is the following: I will merge this branch into the current branch. And let you know when you can take a look at it.

Since the ultimate goal is to create a version of the deployment that deals with versions of the microservices we can also do the following. I keep changing things and you review the outcome. But beware those changes might be big, as in a refactoring of the current code.

What do you think?

@Karinon
Copy link
Contributor

Karinon commented Mar 5, 2024

Do it. I am beyond the point where I can follow all of the code changes but I can definitely try stuff out and give you my opinion and will also occasionally comment on code (but not necessarily as a main focus). As soon as you are really back we should probably do another deployment-session with the new colleagues to hear their opinion on stuff.

@antarcticrainforest
Copy link
Member Author

I will merge this branch into another branch.

@antarcticrainforest antarcticrainforest deleted the databrowser-api branch March 8, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants