Skip to content

Latest commit

 

History

History
114 lines (89 loc) · 3.72 KB

Web_based_single_GPU_example_with_script.md

File metadata and controls

114 lines (89 loc) · 3.72 KB

Web based Execution of a Single GPU with Docker

  1. Launch instances on EC2 console:


  1. Choose an Amazon Machine Image (AMI)
    An AMI is a template that contains the software configuration (operating system, application server, and applications) required to launch your instance. For GPU case, we use Deep Learning Base AMI (Ubuntu 16.04) Version 40.0 in "Community AMIs".


  1. Choose an Instance Type
    Based on your purpose, AWS provides various instance types on https://aws.amazon.com/ec2/instance-types/. For GPU application, we recommand to use p3.2xlarge instance.

  2. Configure Number of instances
    We use 1 instance for single machine computation.


4.5 (Optional)Add Storage Add additional storage to your instance if needed.

  1. Configure Security Group


  1. Review, Create your SSH key pair, and Launch


  1. View your Instance and wait for Initialing


  1. SSH into your instance


  1. Copy bootstrap.sh to your instance, and run the script
sudo bash bootstrap.sh
  1. Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo service docker start
sudo usermod -a -G docker ubuntu
sudo chmod 666 /var/run/docker.sock

  1. Download Docker images or build images by Dockerfile.
docker pull starlyxxx/horovod-pytorch-cuda10.1-cudnn7
  • or, build from Dockerfile:
docker build -t <your-image-name> .

  1. Download ML applications and data on AWS S3.
  • For privacy, we store the application code and data on AWS S3. Install aws cli and set aws credentials.
curl 'https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip' -o 'awscliv2.zip'
unzip awscliv2.zip
sudo ./aws/install
aws configure set aws_access_key_id your-access-key
aws configure set aws_secret_access_key your-secret-key
  • Download ML applications and data on AWS S3.

    Download:

    aws s3 cp s3://kddworkshop/MultiGpus-Domain-Adaptation-main.zip ./
    aws s3 cp s3://kddworkshop/office31.tar.gz ./

    or

    wget https://kddworkshop.s3.us-west-2.amazonaws.com/MultiGpus-Domain-Adaptation-main.zip
    wget https://kddworkshop.s3.us-west-2.amazonaws.com/office31.tar.gz

    Extract the files:

    unzip MultiGpus-Domain-Adaptation-main.zip
    tar -xzvf office31.tar.gz

  1. Run docker containers for GPU applications
  • Single GPU:
nvidia-docker run -it -v /home/ubuntu/MultiGpus-Domain-Adaptation-main:/root/MultiGpus-Domain-Adaptation-main -v /home/ubuntu/office31:/root/office31 starlyxxx/horovod-pytorch-cuda10.1-cudnn7:latest /bin/bash

  1. Run ML GPU application
cd MultiGpus-Domain-Adaptation-main
horovodrun --verbose -np 1 -H localhost:1 /usr/bin/python3.6 main.py --config DeepCoral/DeepCoral.yaml --data_dir ../office31 --src_domain webcam --tgt_domain amazon

  1. Terminate the virtual machine on EC2 when finishing experiments.


Additional useful commands:

-nvidia-smi command can show memory usage, GPU utilization and temperature of Nvidia GPU.