Skip to content

Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.

License

Notifications You must be signed in to change notification settings

ETOgaosion/bamboo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Afforable deep learning through resilient preemptible instances.

v0.1 - 01/20/22

Summary of Bamboo

Bamboo is a system for running large scale DNNs using pipeline parallelism affordably, reliably, and efficiently on spot instances. It is built on top of DeepSpeed. It uses redundant computation in the pipeline by taking advantage of pipeline bubbles to enable low-pause recovery from failures.

etcdctl rm --dir --recursive /torchelastic

Setup

Ensure you have the following requirements:

  • Python 3.7
  • PyTorch 1.10.0
  • etcd 2.x (2.3.6 recommended, but you cannot run it as a systemctl service, open a tmux session and run it)

According to bisection search, Bamboo was modified from DeepSpeed v0.5.2

Documentation has the following requirements:

  • TeX Live
  • Biber

First, create the virtual environment:

python -m venv --system-site-packages venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txt

For the documentation you may want to create a ~/.latexmkrc file containing the following (this example uses Evince):

$pdf_previewer = 'start evince';

Running

Start all commands with the following:

python -m project_pactum

For the documentation, go to the directory of whichever document you want to build and run the following:

latexmk -pvc

This command will recompile the LaTeX file as many times as needed and open it in your preferred PDF viewer. For modifications keep this command running, and the document recompiles automatically.

About

Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.4%
  • Shell 10.4%
  • Dockerfile 0.2%