Skip to content

Commit 7f238ce

Browse files
committed
Update to wiki docs [skip ci]
1 parent 1fb07f1 commit 7f238ce

File tree

1 file changed

+7
-314
lines changed

1 file changed

+7
-314
lines changed

README.md

Lines changed: 7 additions & 314 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,17 @@
11
[![Docker Build](https://github.com/ai-dock/pytorch/actions/workflows/docker-build.yml/badge.svg)](https://github.com/ai-dock/pytorch/actions/workflows/docker-build.yml)
22

3-
# Pytorch
3+
# AI-Dock + PyTorch
44

5-
Run python in a container with pytorch pre-installed.
5+
Run python in a cloud-first AI-Dock container with PyTorch pre-installed.
66

7-
## About Pytorch
7+
This image provides a great starting point for python development when used standalone but its also a solid foundation for extending upon.
88

9-
PyTorch is an open-source deep learning framework developed by Facebook's AI Research lab (FAIR). It provides a flexible and dynamic computational graph, allowing developers to build and train neural networks. Unlike some other frameworks, PyTorch enables defining and modifying network architectures on-the-fly, making experimentation and debugging easier.
9+
## Documentation
1010

11-
One of PyTorch's essential features is automatic differentiation, which is crucial for training neural networks using gradient-based optimization algorithms like backpropagation. Additionally, PyTorch supports GPU acceleration, enabling faster computation during model training.
11+
All AI-Dock containers share a common base which is designed to make running on cloud services such as [vast.ai](https://link.ai-dock.org/vast.ai) and [runpod.io](https://link.ai-dock.org/template) as straightforward and user friendly as possible.
1212

13-
The framework has gained popularity among researchers and developers due to its ease of use and extensive community support. With a large and active community, users can find abundant resources, tutorials, and libraries to support their deep learning projects.
13+
Common features and options are documented in the [base wiki](https://github.com/ai-dock/base-image/wiki) but any additional features unique to this image will be detailed below.
1414

15-
## Pre-built Images
16-
17-
Docker images are built automatically through a GitHub Actions workflow and hosted at the GitHub Container Registry.
18-
19-
An incremental build process is used to avoid needing a huge cache - The following images are used to provide functionality:
20-
21-
- [nvidia/cuda](https://github.com/NVIDIA/nvidia-docker) / [ubuntu](https://github.com/docker-library/docs/tree/master/ubuntu) ↴
22-
- [ai-dock/base-image](https://github.com/ai-dock/base-image) ↴
23-
- [ai-dock/python](https://github.com/ai-dock/python) ↴
24-
- ai-dock/pytorch
2515

2616
#### Version Tags
2717

@@ -46,296 +36,12 @@ Tags follow these patterns:
4636

4737
Browse [here](https://github.com/ai-dock/pytorch/pkgs/container/pytorch) for an image suitable for your target environment.
4838

49-
You can also self-build from source by editing `.env` and running `docker compose build`.
50-
5139
Supported Python versions: `3.12`, `3.11`, `3.10`
5240

5341
Supported Pytorch versions: `2.2.1` `2.1.2`
5442

5543
Supported Platforms: `NVIDIA CUDA`, `AMD ROCm`, `CPU`
5644

57-
## Building Images
58-
59-
You can self-build from source by editing `docker-compose.yaml` or `.env` and running `docker compose build`.
60-
61-
It is a good idea to leave the source tree alone and copy any edits you would like to make into `build/COPY_ROOT_EXTRA/...`. The structure within this directory will be overlayed on `/` at the end of the build process.
62-
63-
As this overlaying happens after the main build, it is easy to add extra files such as ML models and datasets to your images. You will also be able to rebuild quickly if your file overrides are made here.
64-
65-
Any directories and files that you add into `opt/storage` will be made available in the running container at `$WORKSPACE/storage`.
66-
67-
This directory is monitored by `inotifywait`. Any items appearing in this directory will be automatically linked to the application directories as defined in `/opt/ai-dock/storage_monitor/etc/mappings.sh`. This is particularly useful if you need to run several applications that each need to make use of the stored files.
68-
69-
70-
## Run Locally
71-
72-
A 'feature-complete' `docker-compose.yaml` file is included for your convenience. All features of the image are included - Simply edit the environment variables in `.env`, save and then type `docker compose up`.
73-
74-
If you prefer to use the standard `docker run` syntax, the command to pass is `init.sh`.
75-
76-
## Run in the Cloud
77-
78-
This image should be compatible with any GPU cloud platform. You simply need to pass environment variables at runtime.
79-
80-
>[!NOTE]
81-
>Please raise an issue on this repository if your provider cannot run the image.
82-
83-
__Container Cloud__
84-
85-
Container providers don't give you access to the docker host but are quick and easy to set up. They are often inexpensive when compared to a full VM or bare metal solution.
86-
87-
All images built for ai-dock are tested for compatibility with both [vast.ai](https://link.ai-dock.org/template-vast-pytorch) and [runpod.io](https://link.ai-dock.org/template-runpod-pytorch).
88-
89-
See a list of pre-configured templates [here](#pre-configured-templates)
90-
91-
>[!WARNING]
92-
>Container cloud providers may offer both 'community' and 'secure' versions of their cloud. If your usecase involves storing sensitive information (eg. API keys, auth tokens) then you should always choose the secure option.
93-
94-
__VM Cloud__
95-
96-
Running docker images on a virtual machine/bare metal server is much like running locally.
97-
98-
You'll need to:
99-
- Configure your server
100-
- Set up docker
101-
- Clone this repository
102-
- Edit `.env`and `docker-compose.yml`
103-
- Run `docker compose up`
104-
105-
Find a list of compatible VM providers [here](#compatible-vm-providers).
106-
107-
### Connecting to Your Instance
108-
109-
All services listen for connections at [`0.0.0.0`](https://en.m.wikipedia.org/wiki/0.0.0.0). This gives you some flexibility in how you interact with your instance:
110-
111-
_**Expose the Ports**_
112-
113-
This is fine if you are working locally but can be **dangerous for remote connections** where data is passed in plaintext between your machine and the container over http.
114-
115-
_**SSH Tunnel**_
116-
117-
You will only need to expose port `22` (SSH) which can then be used with port forwarding to allow **secure** connections to your services.
118-
119-
If you are unfamiliar with port forwarding then you should read the guides [here](https://link.ai-dock.org/guide-ssh-tunnel-do-a) and [here](https://link.ai-dock.org/guide-ssh-tunnel-do-b).
120-
121-
_**Cloudflare Tunnel**_
122-
123-
You can use the included `cloudflared` service to make secure connections without having to expose any ports to the public internet. See more below.
124-
125-
## Environment Variables
126-
127-
| Variable | Description |
128-
| ------------------------ | ----------- |
129-
| `CF_TUNNEL_TOKEN` | Cloudflare zero trust tunnel token - See [documentation](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/). |
130-
| `CF_QUICK_TUNNELS` | Create ephemeral Cloudflare tunnels for web services (default `false`) |
131-
| `DIRECT_ADDRESS` | IP/hostname for service portal direct links (default `localhost`) |
132-
| `DIRECT_ADDRESS_GET_WAN` | Use the internet facing interface for direct links (default `false`) |
133-
| `GPU_COUNT` | Limit the number of available GPUs |
134-
| `PROVISIONING_SCRIPT` | URL of a remote script to execute on init. See [note](#provisioning-script). |
135-
| `RCLONE_*` | Rclone configuration - See [rclone documentation](https://rclone.org/docs/#config-file) |
136-
| `SKIP_ACL` | Set `true` to skip modifying workspace ACL |
137-
| `SSH_PORT_LOCAL` | Set a non-standard port for SSH (default `22`) |
138-
| `SSH_PUBKEY` | Your public key for SSH |
139-
| `USER_NAME` | System account username (default `user`)|
140-
| `USER_PASSWORD` | System account username (default `password`)|
141-
| `WEB_ENABLE_AUTH` | Enable password protection for web services (default `true`) |
142-
| `WEB_USER` | Username for web services (default `user`) |
143-
| `WEB_PASSWORD` | Password for web services (default `auto generated`) |
144-
| `WORKSPACE` | A volume path. Defaults to `/workspace/` |
145-
| `WORKSPACE_SYNC` | Move mamba environments and services to workspace if mounted (default `false`) |
146-
147-
Environment variables can be specified by using any of the standard methods (`docker-compose.yaml`, `docker run -e...`). Additionally, environment variables can also be passed as parameters of `init.sh`.
148-
149-
Passing environment variables to init.sh is usually unnecessary, but is useful for some cloud environments where the full `docker run` command cannot be specified.
150-
151-
Example usage: `docker run -e STANDARD_VAR1="this value" -e STANDARD_VAR2="that value" init.sh EXTRA_VAR="other value"`
152-
153-
## Security
154-
155-
All ai-dock containers are interactive and will not drop root privileges. You should ensure that your docker daemon runs as an unprivileged user.
156-
157-
### System
158-
159-
A system user will be created at startup. The UID will be either 1000 or will match the UID of the `$WORKSPACE` bind mount.
160-
161-
The user will share the root user's ssh public key.
162-
163-
Some processes may start in the user context for convenience only.
164-
165-
### Web Services
166-
167-
By default, all exposed web services are protected by a single login form at `:1111/login`.
168-
169-
The default username is `user` and the password is auto generated unless you have passed a value in the environment variable `WEB_PASSWORD`. To find the auto-generated password and related tokens you should type `env | grep WEB_` from inside the container.
170-
171-
You can set your credentials by passing environment variables as shown above.
172-
173-
If you are running the image locally on a trusted network, you may disable authentication by setting the environment variable `WEB_ENABLE_AUTH=false`.
174-
175-
If you need to connect programmatically to the web services you can authenticate using either `Bearer $WEB_TOKEN` or `Basic $WEB_PASSWORD_B64`.
176-
177-
The security measures included aim to be as secure as basic authentication, i.e. not secure without HTTPS. Please use the provided cloudflare connections wherever possible.
178-
179-
>[!NOTE]
180-
>You can use `set-web-credentials.sh <username> <password>` to change the username and password in a running container.
181-
182-
## Provisioning script
183-
184-
It can be useful to perform certain actions when starting a container, such as creating directories and downloading files.
185-
186-
You can use the environment variable `PROVISIONING_SCRIPT` to specify the URL of a script you'd like to run.
187-
188-
The URL must point to a plain text file - GitHub Gists/Pastebin (raw) are suitable options.
189-
190-
If you are running locally you may instead opt to mount a script at `/opt/ai-dock/bin/provisioning.sh`.
191-
192-
>[!NOTE]
193-
>If configured, `sshd`, `caddy`, `cloudflared`, `serviceportal`, `storagemonitor` & `logtail` will be launched before provisioning; Any other processes will launch after.
194-
195-
>[!WARNING]
196-
>Only use scripts that you trust and which cannot be changed without your consent.
197-
198-
## Software Management
199-
200-
A small software collection is installed by apt-get to provide basic utility.
201-
202-
All other software is installed into its own environment by `micromamba`, which is a drop-in replacement for conda/mamba. Read more about it [here](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html).
203-
204-
Micromamba environments are particularly useful where several software packages are required but their dependencies conflict.
205-
206-
### Installed Micromamba Environments
207-
208-
| Environment | Packages |
209-
| -------------- | ----------------------------------------- |
210-
| `base` | micromamba's base environment |
211-
| `python_[ver]` | `python` |
212-
213-
If you are extending this image or running an interactive session where additional software is required, you should almost certainly create a new environment first. See below for guidance.
214-
215-
### Useful Micromamba Commands
216-
217-
| Command | Function |
218-
| -------------------------------------| --------------------- |
219-
| `micromamba env list` | List available environments |
220-
| `micromamba activate [name]` | Activate the named environment |
221-
| `micromamba deactivate` | Close the active environment |
222-
| `micromamba run -n [name] [command]` | Run a command in the named environment without activating |
223-
224-
All ai-dock images create micromamba environments using the `--always-softlink` flag which can save disk space where multiple environments are available.
225-
226-
To create an additional micromamba environment, eg for python, you can use the following:
227-
228-
`micromamba --always-softlink create -y -c conda-forge -c defaults -n [name] python=3.10`
229-
230-
## Volumes
231-
232-
Data inside docker containers is ephemeral - You'll lose all of it when the container is destroyed.
233-
234-
You may opt to mount a data volume at `/workspace` - This is a directory that ai-dock images will look for to make downloaded data available outside of the container for persistence.
235-
236-
This is usually of importance where large files are downloaded at runtime or if you need a space to save your work. This is the ideal location to store any code you are working on.
237-
238-
You can define an alternative path for the workspace directory by passing the environment variable `WORKSPACE=/my/alternative/path/` and mounting your volume there. This feature will generally assist where cloud providers enforce their own mountpoint location for persistent storage.
239-
240-
The provided docker-compose.yaml will mount the local directory `./workspace` at `/workspace`.
241-
242-
As docker containers generally run as the root user, new files created in /workspace will be owned by uid 0(root).
243-
244-
To ensure that the files remain accessible to the local user that owns the directory, the docker entrypoint will set a default ACL on the directory by executing the commamd `setfacl -d -m u:${WORKSPACE_UID}:rwx /workspace`.
245-
246-
## Running Services
247-
248-
This image will spawn multiple processes upon starting a container because some of our remote environments do not support more than one container per instance.
249-
250-
All processes are managed by [supervisord](https://supervisord.readthedocs.io/en/latest/) and will restart upon failure until you either manually stop them or terminate the container.
251-
252-
>[!NOTE]
253-
>*Some of the included services would not normally be found **inside** of a container. They are, however, necessary here as some cloud providers give no access to the host; Containers are deployed as if they were a virtual machine.*
254-
255-
### Caddy
256-
257-
This is a simple webserver acting as a reverse proxy.
258-
259-
Caddy is used to enable basic authentication for all sensitive web services.
260-
261-
To make changes to the caddy configuration inside a runing container you should edit `/opt/caddy/share/base_config` followed by `supervisorctl restart caddy`.
262-
263-
### Service Portal
264-
265-
This is a simple list of links to the web services available inside the container.
266-
267-
The service will bind to port `1111`.
268-
269-
For each service, you will find a direct link and, if you have set `CF_QUICK_TUNNELS=true`, a link to the service via a fast and secure Cloudflare tunnel.
270-
271-
A simple web-based log viewer and process manager are included for convenience.
272-
273-
### Cloudflared
274-
275-
The Cloudflare tunnel daemon will start if you have provided a token with the `CF_TUNNEL_TOKEN` environment variable.
276-
277-
This service allows you to connect to your local services via https without exposing any ports.
278-
279-
You can also create a private network to enable remote connecions to the container at its local address (`172.x.x.x`) if your local machine is running a Cloudflare WARP client.
280-
281-
If you do not wish to provide a tunnel token, you could enable `CF_QUICK_TUNNELS` which will create a throwaway tunnel for your web services.
282-
283-
Full documentation for Cloudflare tunnels is [here](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/).
284-
285-
>[!NOTE]
286-
>_Cloudflared is included so that secure networking is available in all cloud environments._
287-
288-
>[!WARNING]
289-
>You should only provide tunnel tokens in secure cloud environments.
290-
291-
### SSHD
292-
293-
A SSH server will be started if at least one valid public key is found inside the running container in the file `/root/.ssh/authorized_keys`
294-
295-
The server will bind to port `22` unless you specify variable `SSH_PORT`.
296-
297-
There are several ways to get your keys to the container.
298-
299-
- If using docker compose, you can paste your key in the local file `config/authorized_keys` before starting the container.
300-
301-
- You can pass the environment variable `SSH_PUBKEY` with your public key as the value.
302-
303-
- Cloud providers often have a built-in method to transfer your key into the container
304-
305-
If you choose not to provide a public key then the SSH server will not be started.
306-
307-
To make use of this service you should map port `22` to a port of your choice on the host operating system.
308-
309-
See [this guide](https://link.ai-dock.org/guide-sshd-do) by DigitalOcean for an excellent introduction to working with SSH servers.
310-
311-
>[!NOTE]
312-
>_SSHD is included because the end-user should be able to know the version prior to deloyment. Using a providers add-on, if available, does not guarantee this._
313-
314-
### Syncthing
315-
316-
[Syncthing](https://syncthing.net/) is a peer-to-peer continuous file synchronization program which is very useful for efficiently transporting your work files from a local workstation to a remote container instance. As the files are sync'd in real-time there is no need for a separate download to retrieve the files.
317-
318-
### Logtail
319-
320-
This script follows and prints the log files for each of the above services to stdout. This allows you to follow the progress of all running services through docker's own logging system.
321-
322-
If you are logged into the container you can follow the logs by running `logtail.sh` in your shell.
323-
324-
### Storage Monitor
325-
326-
This service detects changes to files in `$WORKSPACE/storage` and creates symbolic links to the application directories defined in `/opt/ai-dock/storage_monitor/etc/mappings.sh`
327-
328-
## Open Ports
329-
330-
Some ports need to be exposed for the services to run or for certain features of the provided software to function
331-
332-
333-
| Open Port | Service / Description |
334-
| --------------------- | ------------------------- |
335-
| `22` | SSH server |
336-
| `1111` | Service Portal web UI |
337-
| `8384` | Syncthing UI |
338-
| `22999` | Syncthing TCP Transport |
33945

34046
## Pre-Configured Templates
34147

@@ -350,19 +56,6 @@ Some ports need to be exposed for the services to run or for certain features of
35056
>[!NOTE]
35157
>These templates are configured to use the `:latest` tag but you are free to change to any of the available Pytorch CUDA tags listed [here](https://github.com/ai-dock/pytorch/pkgs/container/pytorch)
35258
353-
## Compatible VM Providers
354-
355-
Images that do not require a GPU will run anywhere - Use an image tagged `:*-cpu-xx.xx`
356-
357-
Where a GPU is required you will need either `:*cuda*` or `:*rocm*` depending on the underlying hardware.
358-
359-
A curated list of VM providers currently offering GPU instances:
360-
361-
- [Akami/Linode](https://link.ai-dock.org/linode.com)
362-
- [Amazon Web Services](https://link.ai-dock.org/aws.amazon.com)
363-
- [Google Compute Engine](https://link.ai-dock.org/cloud.google.com)
364-
- [Vultr](https://link.ai-dock.org/vultr.com)
365-
36659
---
36760

368-
_The author ([@robballantyne](https://github.com/robballantyne)) may be compensated if you sign up to services linked in this document. Testing multiple variants of GPU images in many different environments is both costly and time-consuming; This helps to offset costs_
61+
_The author ([@robballantyne](https://github.com/robballantyne)) may be compensated if you sign up to services linked in this document. Testing multiple variants of GPU images in many different environments is both costly and time-consuming; This along with [sponsorships](https://github.com/sponsors/ai-dock) helps to offset costs and further the development of the project_

0 commit comments

Comments
 (0)