KISTI UltraScaleAI EnvPipe

EnvPipe (Envelope + Pipeline Parallelism) is an energy-efficient DNN training framework designed to reduce energy consumption while maintaining minimal performance impact. This project aims to address the high energy demands and sustainability challenges associated with scaling large language models (LLMs). By leveraging slack time created by bubbles in pipeline parallelism, EnvPipe strategically schedules pipeline units and dynamically adjusts SM frequency, enabling energy savings without compromising training accuracy or hyperparameters.

Enhancements in This Version

This improved implementation of EnvPipe builds upon the original EnvPipe repository with the following updates:

LLama Model Support: Enhanced compatibility with the Llama model family.
DeepSpeed Upgrade: Updated for compatibility with the latest DeepSpeed library (v0.15.4).
Huggingface Integration: Refactored code to seamlessly support Huggingface models, incorporating updates based on the Transpeeder repository for Llama model compatibility. If a Huggingface model can run with DeepSpeed's pipeline parallelism, it is compatible with EnvPipe.
Code Refactoring: Improved code structure for better compatibility and maintainability.
Improved P2P Communication: Redesigned the activation and gradient transfer mechanism to ensure deadlock-free execution aligned with EnvPipe's scheduling method. This improvement resolves the reliance on increased NCCL_BUFFSIZE for non-blocking communication, which is not guaranteed as clarified here.

Getting Started

Run the Docker Environment

To set up the environment:

scripts/run_docker.sh

Install DeepSpeed Library

Once inside the Docker container, install DeepSpeed using one of the following methods:

Editable mode (recommended for development):
```
pip install -e .
```
Normal mode (for production):
```
pip install .
```

Running Benchmarks

Navigate to the benchmarks directory and use the provided script to train a model with DeepSpeed and EnvPipe:

benchmarks/examples/train_llama_deepspeed.sh

Usage

To run the EnvPipe training script, use the following options:

Usage: ./train_llama_deepspeed.sh [options]

Options:
  --type TYPE                Set ENVPIPE_TYPE (baseline, uniform, envelope). Required.
  --scheduling SCHEDULING    Set ENVPIPE_SCHEDULING (1f1b, ours). Required.
  --reconfig RECONFIGURE     Set ENVPIPE_RECONFIGURE (default, greedy, balanced). Required.
  --gpus GPUS                Specify GPU numbers (comma-separated, e.g., 0,1,3). Required.
  -h, --help                 Show this help message.

Parameter	Inputs	Explanation
ENVPIPE_TYPE	baseline	Run all GPUs with maximum SM frequency.
	uniform	Run all GPUs with optimal SM frequency that represents the minimum point in the energy valley curve.
	envelope	Run pipeline units with optimal SM frequency that are inside the outer envelope.
ENVPIPE_SCHEDULING	1f1b	1F1B scheduling method.
	ours	EnvPipe's scehduling method.
ENVPIPE_RECONFIGURE	default	SM frequencies of pipeline units on the critical path are not reconfigured.
	greedy	SM frequencies of pipeline units on the critical path are greedily reconfigured from the end of the critical path.
	balanced	SM frequencies of pipeline units on the critical path are balanced as much as possible.

Example Command

Here’s an example of how to run the script:

./train_llama_deepspeed.sh --type envelope --scheduling ours --reconfig balanced --gpus 0,1,3

Add New GPU Architecture

EnvPipe supports the following GPU architectures by default: V100, RTX3090, A100, and A6000. To extend support to a new GPU architecture, you need to configure its supported clock frequencies, granularity parameters, and update specific parts of the codebase.

Steps to Add a New GPU Architecture

1. Check Supported Clock Frequencies

Determine the clock frequencies supported by your GPU architecture to ensure compatibility. Use the provided script:

python benchmarks/examples/scripts/get_supported_clock_frequencies.py

This script lists the available frequencies for your GPU. Profiling all frequencies can be time-consuming, so you will define a specific range with an appropriate granularity for efficiency.

2. Define the New GPU Architecture

Add the new GPU architecture and its associated clock frequency parameters to the configuration file.

Open deepspeed/runtime/constants.py.
Append your new GPU architecture with the following parameters:
- FILTER_MAX and FILTER_MIN: Define the range of SM frequencies for profiling. This restricts the profiling process to avoid testing unnecessary frequencies, reducing profiling time.
- GRANULARITY: Specifies the step size between consecutive SM frequencies for profiling.
- RECONFIGURE_GRANULARITY: Defines the minimum step size for frequency adjustment during runtime reconfiguration.

Example:

# Add your new GPU architecture name
ENVPIPE_GPU_NEWARCH = 'newarch'

# Define clock frequency parameters for the new GPU
NEWARCH_SM_FREQ_FILTER_MAX = 1800  # Maximum profiled SM frequency (MHz)
NEWARCH_SM_FREQ_FILTER_MIN = 900   # Minimum profiled SM frequency (MHz)
NEWARCH_SM_FREQ_GRANULARITY = 90   # Step size for profiling SM frequency (MHz)
NEWARCH_RECONFIGURE_GRANULARITY = 30  # Minimum step size for runtime reconfiguration (MHz)

3. Update the Profiling Logic

Add the new GPU parameters to the profiling logic in EnvPipe/DeepSpeed/deepspeed/profiling/energy_profiler/profiler.py.

This ensures that profiling respects the frequency constraints and granularity defined for the new GPU architecture:

elif self.config["gpu"] == ENVPIPE_GPU_NEWARCH:
    sm_freq_filter_max = NEWARCH_SM_FREQ_FILTER_MAX
    sm_freq_filter_min = NEWARCH_SM_FREQ_FILTER_MIN
    sm_freq_granularity = NEWARCH_SM_FREQ_GRANULARITY

4. Update the Reconfiguration Logic

In EnvPipe/DeepSpeed/deepspeed/runtime/pipe/reconfiguration.py, incorporate the reconfiguration granularity for the new GPU to adjust SM frequencies dynamically during runtime:

elif self.config["gpu"] == ENVPIPE_GPU_NEWARCH:
    reconfigure_granularity = NEWARCH_RECONFIGURE_GRANULARITY

Explanation of Parameters

Parameter	Description
*`_SM_FREQ_FILTER_MAX`**	Maximum SM frequency (in MHz) to consider during profiling.
*`_SM_FREQ_FILTER_MIN`**	Minimum SM frequency (in MHz) to consider during profiling.
*`_SM_FREQ_GRANULARITY`**	Step size for SM frequency adjustments during profiling.
*`_RECONFIGURE_GRANULARITY`**	Minimum step size for SM frequency reconfiguration during runtime adjustments.

By following these steps, you can extend EnvPipe to support additional GPU architectures while maintaining efficient profiling and runtime reconfiguration.

Additional Information

For more details about DeepSpeed, refer to the original DeepSpeed README.

Name		Name	Last commit message	Last commit date
Latest commit History 2,576 Commits
.github		.github
accelerator		accelerator
azure		azure
benchmarks		benchmarks
bin		bin
blogs		blogs
csrc		csrc
deepspeed		deepspeed
docker		docker
docs		docs
op_builder		op_builder
release		release
requirements		requirements
scripts		scripts
tests		tests
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
.style.yapf		.style.yapf
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MANIFEST_win.in		MANIFEST_win.in
README.md		README.md
README_deepspeed.md		README_deepspeed.md
SECURITY.md		SECURITY.md
build_win.bat		build_win.bat
environment.yml		environment.yml
install.sh		install.sh
setup.cfg		setup.cfg
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KISTI UltraScaleAI EnvPipe

Enhancements in This Version

Getting Started

Run the Docker Environment

Install DeepSpeed Library

Running Benchmarks

Usage

Example Command

Add New GPU Architecture

Steps to Add a New GPU Architecture

1. Check Supported Clock Frequencies

2. Define the New GPU Architecture

3. Update the Profiling Logic

4. Update the Reconfiguration Logic

Explanation of Parameters

Additional Information

About

Uh oh!

Releases

Packages

Languages

License

casys-kaist/KISTI-UltraScaleAI-EnvPipe

Folders and files

Latest commit

History

Repository files navigation

KISTI UltraScaleAI EnvPipe

Enhancements in This Version

Getting Started

Run the Docker Environment

Install DeepSpeed Library

Running Benchmarks

Usage

Example Command

Add New GPU Architecture

Steps to Add a New GPU Architecture

1. Check Supported Clock Frequencies

2. Define the New GPU Architecture

3. Update the Profiling Logic

4. Update the Reconfiguration Logic

Explanation of Parameters

Additional Information

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages