Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing additional build arguments to Dockerfile.torchx #749

Open
anjali-chadha opened this issue Aug 2, 2023 · 4 comments
Open

Passing additional build arguments to Dockerfile.torchx #749

anjali-chadha opened this issue Aug 2, 2023 · 4 comments

Comments

@anjali-chadha
Copy link

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

Before submitting, please ensure you have gone through our
documentation.

Question

Use case:
My team uses torchx to submit the job to remote scheduler such as AWS Batch. While building the docker image, we want to use a private PyPi repository to install the python dependncies.

It seems that Dockerfile doesn't allow passing additional build arguments, besides Image and Workspace (reference). We need to pass additional build arguments such as pip index-url to point to our private PyPi repository during the image build process.

Does the torchx team have any recommendations on how to achieve our use case of passing additional build args, while building the docker

@schmidt-ai
Copy link
Contributor

I have the same use case. In general, it would be useful to expose all the arguments to the docker build command. For instance, specifying a target in the case of a multi-stage Dockerfile.

@kiukchung
Copy link
Collaborator

Short Answer

You can achieve this today by building your own base image using pure docker (where you can pass any arguments to your liking to docker build) and using TorchX's docker workspace to "patch" the local changes to your project. Basically:

  1. Create a Dockerfile and build your project's base image as you want. Lets call this my_project_base_img.
  2. Use my_project_base_img:tag as the base image when running components (e.g. torchx run dist.ddp --image my_project_base_img:tag ...)
  3. If your project only contains python sources, then you can skip Dockerfile.torchx altogether (see below)
  4. If your project requires a specific install (e.g. you have c-extensions and have made changes to it locally) you can additionally write a Dockerfile.torchx as:
    ARG IMAGE
    FROM $IMAGE
    RUN pip install -e . #  assumes that c-extensions are built and installed with setuptools
    

Longer Explanation

The thinking behind TorchX's docker workspace was to automate the building of ephemeral docker images (based on a base image) where the changes to the user workspace (e.g. the local directory of your project that contains your pytorch scripts) are "patched" into the base image. It wasn't really intended to fully build the base image.

Therefore, originally we didn't have a concept of Dockerfile.torchx and by default simply copied the current-working-directory onto the base image (see source):

ARG IMAGE # <-- this is the --image argument you pass to the component (e.g. torchx run dist.ddp --image=foobar)
FROM $IMAGE

COPY . . # <-- just copies your current working directory into the base image

While this approach works well for most pure python source projects, we added Dockerfile.torchx for:

  1. Projects that required an install (e.g. the likes of pip install -e ., versus just copying source *.py files)
  2. User wants to ignore certain files in the ephemeral build (e.g. test binaries or artifacts that change often locally and thus require an expensive docker build even when you don't need it).

cc) @d4l3k who originally designed and implemented docker workspace. Thoughts?

@d4l3k
Copy link
Member

d4l3k commented Oct 4, 2023

@anjali-chadha does the index-url change depending on where it's being built from? I'm curious why you don't add the index to the Dockerfile. Another option could be to save it to a file that's read during the build

FROM ...

ADD index-url.txt .

RUN pip install --index-url "$(<index-url.txt)" ...

@kiukchung we do now have workspace specific options so it wouldn't be an issue\ to expose additional docker args such as the multistage tag. Any objections to adding more optional fields to the DockerWorkspaceMixin?

https://github.com/pytorch/torchx/blob/main/torchx/workspace/docker_workspace.py#L87-L94

@schmidt-ai do you have a full list of options you would like to control? Anything other than the target?

@schmidt-ai
Copy link
Contributor

--platform, potentially? Ideally could we just pass through kwargs to the build command? @kiukchung's workaround works for me though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants