Skip to content

Commit

Permalink
bump up version, update documentation for CPU support
Browse files Browse the repository at this point in the history
  • Loading branch information
aarora79 committed Aug 27, 2024
1 parent 8ca2232 commit 941c4d6
Show file tree
Hide file tree
Showing 10 changed files with 181 additions and 77 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ Llama3 is now available on SageMaker (read [blog post](https://aws.amazon.com/bl

## New in this release

## 2.0.5

1. Support for Intel CPU based instances such as `c5.18xlarge` and `m5.16xlarge`.

## 2.0.4

1. Support for AMD CPU based instances such as `m7a`.
Expand All @@ -89,9 +93,7 @@ Llama3 is now available on SageMaker (read [blog post](https://aws.amazon.com/bl

1. Support for a EFA directory for benchmarking on EC2.

## 2.0.2

1. Code cleanup, minor bug fixes and report improvements.



Expand Down
4 changes: 3 additions & 1 deletion create_manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,9 @@ def create_manifest_file(config_yml_dir):
# append them to the base list
all_manifest_files = config_yml_files + BASE_FILE_LIST

# write to manifest.txt
# sort so that diff between versions is easier to understand
all_manifest_files = sorted(all_manifest_files)
# and write to manifest.txt
written: int = Path(MANIFEST_FILE).write_text("\n".join([f for f in all_manifest_files]))
print(f"written {written} bytes to {MANIFEST_FILE}")

Expand Down
2 changes: 1 addition & 1 deletion docs/analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ _What is the minimum number of instances N, of most cost optimal instance type T

#### Summary for payload: payload_en_x-y

- The metrics below in the table are examples and do not represent any specific model or instance type. This table can be used to make analysis on the cost and instance maintenance perspective based on the use case. For example, `instance_type_1` costs $10 and requires 1 instance to host `model_1` until it can handle 100 requests per minute. As the requests scale to a 1,000 requests per minute, 5 instances are required and cost $50. As the requests scale to 10,000 requests per minute, the number of instances to maintain scale to 30, and the cost becomes $450 dollars.
- The metrics below in the table are examples and do not represent any specific model or instance type. This table can be used to make analysis on the cost and instance maintenance perspective based on the use case. For example, `instance_type_1` costs 10 dollars and requires 1 instance to host `model_1` until it can handle 100 requests per minute. As the requests scale to a 1,000 requests per minute, 5 instances are required and cost 50 dollars. As the requests scale to 10,000 requests per minute, the number of instances to maintain scale to 30, and the cost becomes 450 dollars.

- On the other hand, `instance_type_2` is more costly, with a price of $499 for 10,000 requests per minute to host the same model, but only requires 22 instances to maintain, which is 8 less than when the model is hosted on `instance_type_1`.

Expand Down
89 changes: 88 additions & 1 deletion docs/benchmarking_on_ec2.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ command below. The config file for this example can be viewed [here](src/fmbench

1. All metrics are stored in the `/tmp/fmbench-write` directory created automatically by the `fmbench` package. Once the run completes all files are copied locally in a `results-*` folder as usual.

## Benchmarking on an instance type with AMD processors
## Benchmarking on an CPU instance type with AMD processors

**_As of 2024-08-27 this has been tested on a `m7a.16xlarge` instance_**

1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new `conda` environment for `FMBench`. See instructions for downloading anaconda [here](https://www.anaconda.com/download)

Expand Down Expand Up @@ -144,3 +146,88 @@ command below. The config file for this example can be viewed [here](src/fmbench
```

1. All metrics are stored in the `/tmp/fmbench-write` directory created automatically by the `fmbench` package. Once the run completes all files are copied locally in a `results-*` folder as usual.


## Benchmarking on an CPU instance type with Intel processors

**_As of 2024-08-27 this has been tested on `c5.18xlarge` and `m5.16xlarge` instances_**

1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new `conda` environment for `FMBench`. See instructions for downloading anaconda [here](https://www.anaconda.com/download)

```{.bash}
# Install Docker and Git using the YUM package manager
sudo yum install docker git -y
# Start the Docker service
sudo systemctl start docker
# Download the Miniconda installer for Linux
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \ # Run the Miniconda installer in batch mode (no manual intervention)
&& rm -f Miniconda3-latest-Linux-x86_64.sh \ # Remove the installer script after installation
&& eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)"\ # Initialize conda for bash shell
&& conda init # Initialize conda, adding it to the shell
```

1. Setup the `fmbench_python311` conda environment.

```{.bash}
# Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel
conda create --name fmbench_python311 -y python=3.11 ipykernel
# Activate the newly created conda environment
source activate fmbench_python311
# Upgrade pip and install the fmbench package
pip install -U fmbench
```

1. Build the `vllm` container for serving the model.

1. 👉 The `vllm` container we are building locally is going to be references in the `FMBench` config file.

1. The container being build is for CPU only (GPU support might be added in future).

```{.bash}
# Clone the vLLM project repository from GitHub
git clone https://github.com/vllm-project/vllm.git
# Change the directory to the cloned vLLM project
cd vllm
# Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB
sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=12g .
```

1. Create local directory structure needed for `FMBench` and copy all publicly available dependencies from the AWS S3 bucket for `FMBench`. This is done by running the `copy_s3_content.sh` script available as part of the `FMBench` repo. Replace `/tmp` in the command below with a different path if you want to store the config files and the `FMBench` generated data in a different directory.

```{.bash}
curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
```

1. To download the model files from HuggingFace, create a `hf_token.txt` file in the `/tmp/fmbench-read/scripts/` directory containing the Hugging Face token you would like to use. In the command below replace the `hf_yourtokenstring` with your Hugging Face token.

```{.bash}
echo hf_yourtokenstring > /tmp/fmbench-read/scripts/hf_token.txt
```

1. Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use `sudo` each time.

```{.bash}
sudo usermod -a -G docker $USER
newgrp docker
```

1. Run `FMBench` with a packaged or a custom config file. **_This step will also deploy the model on the EC2 instance_**. The `--write-bucket` parameter value is just a placeholder and an actual S3 bucket is not required. You could set the `--tmp-dir` flag to an EFA path instead of `/tmp` if using a shared path for storing config files and reports.

```{.bash}
fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
```

1. Open a new Terminal and and do a `tail` on `fmbench.log` to see a live log of the run.

```{.bash}
tail -f fmbench.log
```

1. All metrics are stored in the `/tmp/fmbench-write` directory created automatically by the `fmbench` package. Once the run completes all files are copied locally in a `results-*` folder as usual.
2 changes: 2 additions & 0 deletions docs/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,9 @@ Here is a listing of the various configuration files available out-of-the-box wi
[│   └── llama3/70b/config-llama3-70b-instruct-p4d.yml]( configs/llama3/70b/config-llama3-70b-instruct-p4d.yml)
**└── llama3/8b**
[├── llama3/8b/config-bedrock.yml](configs/llama3/8b/config-bedrock.yml)
[├── llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml](configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml)
[├── llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml](configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml)
[├── llama3/8b/config-ec2-llama3-8b-m5-16xlarge.yml](configs/llama3/8b/config-ec2-llama3-8b-m5-16xlarge.yml)
[├── llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml](configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml)
[├── llama3/8b/config-ec2-llama3-8b.yml](configs/llama3/8b/config-ec2-llama3-8b.yml)
[├── llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml](configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml)
Expand Down
4 changes: 4 additions & 0 deletions docs/releases.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Releases

## 2.0.5

1. Support for Intel CPU based instances such as `c5.18xlarge` and `m5.16xlarge`.

## 2.0.4

1. Support for AMD CPU based instances such as `m7a`.
Expand Down
142 changes: 72 additions & 70 deletions manifest.txt
Original file line number Diff line number Diff line change
@@ -1,103 +1,105 @@
configs/byoe/config-model-byo-sagemaker-endpoint.yml
configs/bedrock/config-bedrock-titan-text-express.yml
configs/bedrock/config-bedrock-models-OpenOrca.yml
configs/bedrock/config-bedrock-haiku-sonnet-majority-voting.yml
configs/bedrock/config-bedrock-claude.yml
configs/bedrock/config-bedrock-llama3-streaming.yml
configs/bedrock/config-bedrock-llama3-1-8b-streaming.yml
configs/bedrock/config-bedrock-all-anthropic-models-longbench-data.yml
configs/bedrock/config-bedrock-anthropic-models-OpenOrca.yml
configs/bedrock/config-bedrock-claude.yml
configs/bedrock/config-bedrock-evals-only-conc-1.yml
configs/bedrock/config-bedrock-haiku-sonnet-majority-voting.yml
configs/bedrock/config-bedrock-llama3-1-70b-streaming.yml
configs/bedrock/config-bedrock-llama3-1-8b-streaming.yml
configs/bedrock/config-bedrock-llama3-1-no-streaming.yml
configs/bedrock/config-bedrock-all-anthropic-models-longbench-data.yml
configs/bedrock/config-bedrock-llama3-streaming.yml
configs/bedrock/config-bedrock-models-OpenOrca.yml
configs/bedrock/config-bedrock-titan-text-express.yml
configs/bedrock/config-bedrock.yml
configs/bedrock/config-bedrock-llama3-1-70b-streaming.yml
configs/gemma/config-gemma-2b-g5.yml
configs/phi/config-phi-3-g5.yml
configs/mistral/config-mistral-v3-inf2-48xl-deploy-ec2-tp24.yml
configs/mistral/config-mistral-instruct-v2-p5-lmi-dist.yml
configs/mistral/config-mistral-trn1-32xl-deploy-ec2-tp32.yml
configs/mistral/config-mistral-instruct-v1-p5-trtllm.yml
configs/mistral/config-mistral-instruct-p4d.yml
configs/mistral/config-mistral-instruct-AWQ-p4d.yml
configs/mistral/config-mistral-instruct-AWQ-p5-byo-ep.yml
configs/mistral/config-mistral-instruct-AWQ-p5.yml
configs/mistral/config-mistral-7b-tgi-g5.yml
configs/mistral/config-mistral-instruct-v2-p4d-lmi-dist.yml
configs/mistral/config-mistral-7b-eks-inf2.yml
configs/mistral/config-mistral-instruct-v2-p5-trtllm.yml
configs/mistral/config-mistral-instruct-v2-p4d-trtllm.yml
configs/bert/config-distilbert-base-uncased.yml
configs/llama2/13b/config-llama2-13b-inf2-g5.yml
configs/llama2/13b/config-llama2-13b-inf2-g5-p4d.yml
configs/llama2/13b/config-byo-rest-ep-llama2-13b.yml
configs/byoe/config-model-byo-sagemaker-endpoint.yml
configs/gemma/config-gemma-2b-g5.yml
configs/llama2/13b/config-bedrock-sagemaker-llama2.yml
configs/llama2/13b/config-byo-rest-ep-llama2-13b.yml
configs/llama2/13b/config-llama2-13b-inf2-g5-p4d.yml
configs/llama2/13b/config-llama2-13b-inf2-g5.yml
configs/llama2/70b/config-ec2-llama2-70b.yml
configs/llama2/70b/config-llama2-70b-g5-p4d-tgi.yml
configs/llama2/70b/config-llama2-70b-g5-p4d-trt.yml
configs/llama2/70b/config-llama2-70b-inf2-g5.yml
configs/llama2/7b/config-llama2-7b-byo-sagemaker-endpoint.yml
configs/llama2/7b/config-llama2-7b-inf2-g5.yml
configs/llama2/7b/config-llama2-7b-g4dn-g5-trt.yml
configs/llama2/7b/config-llama2-7b-g5-no-s3-quick.yml
configs/llama2/7b/config-llama2-7b-g5-quick.yml
configs/llama2/7b/config-llama2-7b-inf2-g5.yml
configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2-48xl-deploy-ec2.yml
configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2.yml
configs/llama3.1/8b/client-config-ec2-llama3-1-8b.yml
configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2.yml
configs/llama3.1/8b/config-llama3.1-8b-g5.yml
configs/llama3.1/8b/server-config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
configs/llama3/70b/config-bedrock.yml
configs/llama3/70b/config-ec2-llama3-70b-instruct.yml
configs/llama3/70b/config-ec2-neuron-llama3-70b-inf2-48xl.yml
configs/llama3/70b/config-llama3-70b-instruct-g5-48xl.yml
configs/llama3/70b/config-llama3-70b-instruct-g5-p4d.yml
configs/llama3/70b/config-llama3-70b-instruct-p4d.yml
configs/llama3/8b/config-bedrock.yml
configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml
configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml
configs/llama3/8b/config-ec2-llama3-8b-m5-16xlarge.yml
configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml
configs/llama3/8b/config-ec2-llama3-8b.yml
configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml
configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-48xl.yml
configs/llama3/8b/config-llama3-8b-eks-inf2.yml
configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-lmi-dist.yml
configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-vllm.yml
configs/llama3/8b/config-llama3-8b-instruct-g6-12xl.yml
configs/llama3/8b/config-llama3-8b-g5-streaming.yml
configs/llama3/8b/config-llama3-8b-inf2-24xl-tp=8-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-inf2-g5-byoe-w-openorca.yml
configs/llama3/8b/config-llama3-8b-inf2-g5.yml
configs/llama3/8b/config-llama3-8b-instruct-all.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-12xl-4-instances.yml
configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=16-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-12xl.yml
configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml
configs/llama3/8b/config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
configs/llama3/8b/config-ec2-llama3-8b.yml
configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-24xl.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-2xl.yml
configs/llama3/8b/config-llama3-8b-g5-streaming.yml
configs/llama3/8b/llama3-8b-inf2-48xl-byoe-g5-24xl.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-48xl.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-p4d.yml
configs/llama3/8b/config-llama3-8b-instruct-g6-12xl.yml
configs/llama3/8b/config-llama3-8b-instruct-g6-24xl.yml
configs/llama3/8b/config-llama3-8b-inf2-g5.yml
configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-48xl.yml
configs/llama3/8b/config-llama3-8b-trn1.yml
configs/llama3/8b/config-llama3-8b-instruct-g6-48xl.yml
configs/llama3/8b/llama3-8b-inf2-24xl-byoe-g5-12xl.yml
configs/llama3/8b/config-llama3-8b-inf2-24xl-tp=8-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-24xl.yml
configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml
configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=8-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-lmi-dist.yml
configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-vllm.yml
configs/llama3/8b/config-llama3-8b-instruct-p5-djl-lmi-dist.yml
configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=16-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=8-bs=4-byoe.yml
configs/llama3/8b/config-llama3-8b-trn1.yml
configs/llama3/8b/llama3-8b-inf2-24xl-byoe-g5-12xl.yml
configs/llama3/8b/llama3-8b-inf2-48xl-byoe-g5-24xl.yml
configs/llama3/8b/llama3-8b-trn1-32xl-byoe-g5-24xl.yml
configs/llama3/8b/config-llama3-8b-instruct-g5-p4d.yml
configs/llama3/8b/config-llama3-8b-inf2-g5-byoe-w-openorca.yml
configs/llama3/8b/config-bedrock.yml
configs/llama3/70b/config-llama3-70b-instruct-g5-48xl.yml
configs/llama3/70b/config-llama3-70b-instruct-g5-p4d.yml
configs/llama3/70b/config-ec2-llama3-70b-instruct.yml
configs/llama3/70b/config-llama3-70b-instruct-p4d.yml
configs/llama3/70b/config-ec2-neuron-llama3-70b-inf2-48xl.yml
configs/llama3/70b/config-bedrock.yml
configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2.yml
configs/llama3.1/8b/server-config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
configs/llama3.1/8b/config-llama3.1-8b-g5.yml
configs/llama3.1/8b/client-config-ec2-llama3-1-8b.yml
configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2.yml
configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2-48xl-deploy-ec2.yml
prompt_template/.keep
tokenizer/.keep
configs/mistral/config-mistral-7b-eks-inf2.yml
configs/mistral/config-mistral-7b-tgi-g5.yml
configs/mistral/config-mistral-instruct-AWQ-p4d.yml
configs/mistral/config-mistral-instruct-AWQ-p5-byo-ep.yml
configs/mistral/config-mistral-instruct-AWQ-p5.yml
configs/mistral/config-mistral-instruct-p4d.yml
configs/mistral/config-mistral-instruct-v1-p5-trtllm.yml
configs/mistral/config-mistral-instruct-v2-p4d-lmi-dist.yml
configs/mistral/config-mistral-instruct-v2-p4d-trtllm.yml
configs/mistral/config-mistral-instruct-v2-p5-lmi-dist.yml
configs/mistral/config-mistral-instruct-v2-p5-trtllm.yml
configs/mistral/config-mistral-trn1-32xl-deploy-ec2-tp32.yml
configs/mistral/config-mistral-v3-inf2-48xl-deploy-ec2-tp24.yml
configs/phi/config-phi-3-g5.yml
llama2_tokenizer/.keep
llama3_tokenizer/.keep
llama3_1_tokenizer/.keep
llama3_tokenizer/.keep
mistral_tokenizer/.keep
phi_tokenizer/.keep
prompt_template/.keep
scripts/.keep
source_data/2wikimqa_e.jsonl
source_data/2wikimqa.jsonl
source_data/hotpotqa_e.jsonl
source_data/2wikimqa_e.jsonl
source_data/LICENSE.txt
source_data/THIRD_PARTY_LICENSES.txt
source_data/hotpotqa.jsonl
source_data/hotpotqa_e.jsonl
source_data/narrativeqa.jsonl
source_data/triviaqa_e.jsonl
source_data/triviaqa.jsonl
source_data/LICENSE.txt
source_data/THIRD_PARTY_LICENSES.txt
source_data/triviaqa_e.jsonl
tokenizer/.keep
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,14 @@ nav:
- BYO REST predictor: byo_rest_predictor.md
- BYO dataset: byo_dataset.md
- Build FMBench: build.md
- Analytics: analytics.md

- Results:
- Report: results.md
- Website: internal_website.md

- Releases:
- Major release: 2.0.x.md
- Major release: announcement.md
- releases.md
- Resources:
- resources.md
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "fmbench"
version = "2.0.4"
version = "2.0.5"
description ="Benchmark performance of **any Foundation Model (FM)** deployed on **any AWS Generative AI service**, be it **Amazon SageMaker**, **Amazon Bedrock**, **Amazon EKS**, or **Amazon EC2**. The FMs could be deployed on these platforms either directly through `FMbench`, or, if they are already deployed then also they could be benchmarked through the **Bring your own endpoint** mode supported by `FMBench`."
authors = ["Amit Arora <aroraai@amazon.com>", "Madhur prashant <Madhurpt@amazon.com>"]
readme = "README.md"
Expand Down
4 changes: 4 additions & 0 deletions release_history.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 2.0.2

1. Code cleanup, minor bug fixes and report improvements.

## 2.0.0

1. 🚨 Model evaluations done by a **Panel of LLM Evaluators[[1]](#1)** 🚨
Expand Down

0 comments on commit 941c4d6

Please sign in to comment.