Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize indicator runners #1968

Open
melange396 opened this issue Jun 1, 2024 · 3 comments
Open

Dockerize indicator runners #1968

melange396 opened this issue Jun 1, 2024 · 3 comments
Labels
chore devops future-solution Solutions to problems we don't have yet but still dread release Track the finishing work for features ready for release

Comments

@melange396
Copy link
Contributor

Following up to #1967, for consistency and reproducibility, we should "Dockerize" (or equivalent/similar) our indicator runtime environments. Though it will not be perfect, it will help us run the same code on different machines without having to worry about subtle differences in configurations or versioning of dependencies.

Our current installations essentially run on "bare hardware" (not even inside venvs, AFAICT) where different jobs may expect particular setups but instead actually be limited by each other's constraints. This will be a kind of paradigm shift in that our deployment processes and job scheduling/triggering will have to change.

Somewhat related to cmu-delphi/delphi-epidata#1389 .

@melange396 melange396 added release Track the finishing work for features ready for release chore future-solution Solutions to problems we don't have yet but still dread devops labels Jun 1, 2024
@dshemetov
Copy link
Contributor

dshemetov commented Jun 2, 2024

Good idea! Started thinking a little about this (using some tips from this blog), here's something for the hhs indicator to get started (chose it because I don't think it requires extra credentials). The lint command worked, the indicator started, but I didn't see it through to the end:

# covidcast-indicators/hhs_hosp/Dockerfile
FROM python:3.8.19-slim-bookworm

RUN mkdir /usr/src/app
WORKDIR /usr/src/app

RUN apt-get update
RUN apt-get install -y make git

COPY . /usr/src/app

RUN make install
# covidcast-indicators/hhs_hosp/.dockerignore
# Ignore bulky directories we bind-mount
cache
receiving
# Ignore local virtual environment
env
# EOF

# Commands to be run in the covidcast-indicators/hhs_hosp directory
docker build -f Dockerfile . -t delphi_hhs
docker run -it delphi_hhs make lint
docker run \
  -it \
  --mount type=bind,source="$(pwd)"/cache,target=/usr/src/app/cache \
  --mount type=bind,source="$(pwd)"/receiving,target=/usr/src/app/receiving \
  -e DELPHI_EPIDATA_KEY="$(echo $DELPHI_EPIDATA_KEY)" \
  delphi_hhs env/bin/python -m delphi_hhs

Next steps might be something like:

  • make an analogue for an indicator that's pulling current data (hhs is not, for now)
  • get it going on staging and compare outputs (the two main ones being cache and receiving directory; one possible snag here is that those directories are specified in the params.json file for each indicator and on prod I think they point to a directory outside the repo, so this will complicate the bind mount recipe up above)
  • make sure logging in the container hooks into our logging infra correctly
  • figure out other things that need to match and get them to match (maybe deploy repo type of stuff?)

@dsweber2
Copy link
Contributor

dsweber2 commented Jun 4, 2024

So, I have maybe a dumb patch of an idea that can temporarily make sure that indicators have up to date environments:

A chronicle job that backs up the venv folder, does make clean; make install on staging for each indicator, makes sure that works, and then after 1-3 days does the same on Prod (enough time to cancel if it broke on staging). Run this like once a month or something.

@melange396
Copy link
Contributor Author

I was mistaken, we do actually make use of venvs for the indicators... I thought it was necessary to execute the activate script to properly set up the environment, which we do not do in our scheduled job run ; in fact it is not required, and the way we invoke indicator jobs should take advantage of their respective virtual environments.

Using Jenkins (on a separate machine), we "build" environments and tar them up and then unzip those directory trees in the prod and staging machines. However, such environments are not intended to be moved, even to a different directory on the same machine. Perhaps it is good that we do not "activate" the environments because there is path information from the build machine that is included in the script:

$ less ~indicators/runtime/nchs_mortality/env/bin/activate | grep nchs_mortality
VIRTUAL_ENV="/mnt/data/jenkins/workspace/covidcast-indicators_prod/nchs_mortality/env"

This approach has the "build once and then distribute" paradigm similar to Docker, but it unfortunately has these problems (and i am surprised we havent been bitten by them (yet?)).

After consulting w/ @korlaxxalrok , he thinks that (but dont quote me on this!) Jenkins can be made to build virtual environments on the prod/staging servers or Jenkins could build Docker images in a similar way instead. He also suggested that we could get GH actions to do it, but voiced concerns about secrets being leaked from there (unless we are careful to use methods to mask variables in the logs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore devops future-solution Solutions to problems we don't have yet but still dread release Track the finishing work for features ready for release
Projects
None yet
Development

No branches or pull requests

3 participants