Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 bucket script #624

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

S3 bucket script #624

wants to merge 9 commits into from

Conversation

benjaminking
Copy link
Collaborator

@benjaminking benjaminking commented Jan 9, 2025

This PR modifies the previous S3 bucket script to support automatic mounting of the S3 bucket in a dev container for both WSL2-based Ubuntu systems and standalone Ubuntu systems. It splits the logic of the previous script into two portions:

  • the part that does not depend on any SILNLP repo files (this logic has been moved into the Dockerfile) and
  • the part that uses files in the SILNLP repo (this remains in the s3_bucket_setup.sh script, which is run as a post-install command)

The contents of the S3 bucket should now be accessible at /silnlp upon creation of the dev container. The major change required to allow this to run on standalone Ubuntu systems is to run the Docker container with a different AppArmor profile. For simplicity, I configured the container to run in "unconfined" mode, but if more fine-grained security is desired (or if ClearML requires it), we should be able to create a custom AppArmor profile that we distribute in the SILNLP repo.

This has been tested on both WSL2-based Ubuntu (in Windows 11) and in standalone Linux Mint, a Ubuntu-based distro.


This change is Reviewable

Copy link
Collaborator

@mshannon-sil mshannon-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on my Ubuntu machine, and it's also working there. I'll check with Bryan from TechOps to see if he has any security concerns with running the container like this.

Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions (waiting on @benjaminking)


.devcontainer/Dockerfile line 26 at r1 (raw file):

    curl \
    unzip
RUN apt-get update

I'm not sure it's necessary to call apt-get update again, since it's already called right before this, unless apt-get needs to be updated only after certain packages have been installed.


.devcontainer/Dockerfile line 28 at r1 (raw file):

RUN apt-get update
RUN apt-get install --no-install-recommends -y \
    less \

I don't see these two packages used anywhere. Are they just being added as helpful developer tools? I think they'd be useful, especially since I use nano quite a bit. I think it'd be cleaner to move this to the larger apt-get install section directly above it


.devcontainer/Dockerfile line 54 at r1 (raw file):

CMD ["bash"]
# Set up the S3 bucket
RUN apt update

Is this separated out because it needs to use apt instead of apt-get, or could it be added to the longer apt-get install earlier in the dockerfile?

@benjaminking
Copy link
Collaborator Author

I updated the Dockerfile based on your comments. Firstly, less and nano were included by mistake -- I had those in my Dockerfile for convenience. (Strangely, to get apt-get to recognize them, you have to call apt-get update a second time) But I've removed those for now.

Also, for reasons I don't understand, if you want to install fuse3 and rclone on WSL2, they have to be in a separate apt-get command from the other installs. I've moved them up to the same section as the other apt-get installs. I also switched to using the bundled version of rclone instead of downloading it from its website, and that doesn't seem to cause any problems.

Copy link
Collaborator

@mshannon-sil mshannon-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)


.devcontainer/Dockerfile line 48 at r2 (raw file):

ENV SIL_NLP_CACHE_PROJECT_DIR=/root/.cache/silnlp/projects
# Set up the S3 bucket
RUN mkdir -p /silnlp

To be consistent with the readme instructions, which mount at ~/S, it's best to mount it at /root/S. I double checked in my docker container, and the ~ is equivalent to the /root folder.

@benjaminking
Copy link
Collaborator Author

I've just pushed a commit that changes the mount point to ~/S, adds an AppArmor profile, and updates devcontainer.json to use the new AppArmor profile.

To register the new AppArmor profile, you need to run sudo apparmor_parser -r -W docker-apparmor. This should only need to be done once for each machine.

I believe that Docker will ignore the AppArmor profile on systems that don't support AppArmor (or don't have it installed), so there shouldn't be any impact to non-Linux users. The AppArmor profile is also the same as the default Docker profile, except that it allows the container to perform mounts.

Copy link
Collaborator

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)

Copy link
Collaborator

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I will let someone who has a better understanding of these things approve it.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants