Skip to content

Commit

Permalink
Improve document (#126)
Browse files Browse the repository at this point in the history
**Description**
Improve document by:
Add using docker option in installation page
Tell users the time of compiling MSCCL.
  • Loading branch information
tocean authored Nov 7, 2023
1 parent e3e6885 commit 7b346f4
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 6 deletions.
32 changes: 28 additions & 4 deletions docs/getting-started/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,47 @@ Here're the system requirements for MS-AMP.
* CUDA version 11 or later (which can be checked by running `nvcc --version`).
* PyTorch version 1.14 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).

We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). For example, to start PyTorch 2.1 container, run the following command:
You can try MS-AMP in two ways: Using Docker or installing from source:

* Using Docker is a convenient way to get started with MS-AMP. You can use the pre-built Docker image to quickly set up an environment for running MS-AMP.
* On the other hand, installing from source gives you more control over the installation process and allows you to customize the installation to your needs.

## Use Docker

You can try the latest MS-AMP Docker container with the following commands:

```bash
sudo docker run -it -d --name=msampcu121 --privileged --net=host --ipc=host --gpus=all -v /:/hostroot ghcr.io/azure/msamp:main-cuda12.1 bash
sudo docker exec -it msampcu121 bash
```

MS-AMP is pre-installed in Docker container and you can verify it by running:

```bash
python -c 'import msamp;print(msamp.__version__)'
```

We also provide stable Docker images [here](../user-tutorial/container-images.mdx).

## Install from source

We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) to avoid messing up local environment.
For example, to start PyTorch 2.1 container, run the following command:

```bash
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.04-py3 bash
sudo docker exec -it msamp bash
```

## Install MS-AMP
You can clone the source from GitHub.
Then, you can clone the source from GitHub.

```bash
git clone https://github.com/Azure/MS-AMP.git
cd MS-AMP
git submodule update --init --recursive
```

If you want to train model with multiple GPU, you need to install MSCCL to support FP8.
If you want to train model with multiple GPU, you need to install MSCCL to support FP8. Please note that the compilation of MSCCL may take ~40 minutes on A100 nodes and ~7 minutes on H100 node.

```bash
cd third_party/msccl
Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/run-msamp.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ After installing MS-AMP, you can run several simple examples using MS-AMP. Pleas
python mnist.py --enable-msamp --opt-level=O2
```

### 2. Run mnist using multi GPUS in single node
### 2. Run mnist using multi GPUs in single node

```bash
torchrun --nproc_per_node=$GPUS mnist_ddp.py --enable-msamp --opt-level=O2
torchrun --nproc_per_node=8 mnist_ddp.py --enable-msamp --opt-level=O2
```

## CIFAR10
Expand Down

0 comments on commit 7b346f4

Please sign in to comment.