bump up version, update documentation for CPU support

aws-samples · Aug 27, 2024 · 941c4d6 · 941c4d6
1 parent 8ca2232
commit 941c4d6
Show file tree

Hide file tree

Showing 10 changed files with 181 additions and 77 deletions.
diff --git a/README.md b/README.md
@@ -81,6 +81,10 @@ Llama3 is now available on SageMaker (read [blog post](https://aws.amazon.com/bl
 
 ## New in this release
 
+## 2.0.5
+
+1. Support for Intel CPU based instances such as `c5.18xlarge` and `m5.16xlarge`.
+
 ## 2.0.4
 
 1. Support for AMD CPU based instances such as `m7a`.
@@ -89,9 +93,7 @@ Llama3 is now available on SageMaker (read [blog post](https://aws.amazon.com/bl
 
 1. Support for a EFA directory for benchmarking on EC2.
 
-## 2.0.2
 
-1. Code cleanup, minor bug fixes and report improvements.
 
 
 

diff --git a/create_manifest.py b/create_manifest.py
@@ -85,7 +85,9 @@ def create_manifest_file(config_yml_dir):
     # append them to the base list
     all_manifest_files = config_yml_files + BASE_FILE_LIST
 
-    # write to manifest.txt
+    # sort so that diff between versions is easier to understand
+    all_manifest_files = sorted(all_manifest_files)
+    # and write to manifest.txt
     written: int = Path(MANIFEST_FILE).write_text("\n".join([f for f in all_manifest_files]))
     print(f"written {written} bytes to {MANIFEST_FILE}")
 

diff --git a/docs/analytics.md b/docs/analytics.md
@@ -13,7 +13,7 @@ _What is the minimum number of instances N, of most cost optimal instance type T
 
 #### Summary for payload: payload_en_x-y
 
-- The metrics below in the table are examples and do not represent any specific model or instance type. This table can be used to make analysis on the cost and instance maintenance perspective based on the use case. For example, `instance_type_1` costs $10 and requires 1 instance to host `model_1` until it can handle 100 requests per minute. As the requests scale to a 1,000 requests per minute, 5 instances are required and cost $50. As the requests scale to 10,000 requests per minute, the number of instances to maintain scale to 30, and the cost becomes $450 dollars. 
+- The metrics below in the table are examples and do not represent any specific model or instance type. This table can be used to make analysis on the cost and instance maintenance perspective based on the use case. For example, `instance_type_1` costs 10 dollars and requires 1 instance to host `model_1` until it can handle 100 requests per minute. As the requests scale to a 1,000 requests per minute, 5 instances are required and cost 50 dollars. As the requests scale to 10,000 requests per minute, the number of instances to maintain scale to 30, and the cost becomes 450 dollars. 
 
 - On the other hand, `instance_type_2` is more costly, with a price of $499 for 10,000 requests per minute to host the same model, but only requires 22 instances to maintain, which is 8 less than when the model is hosted on `instance_type_1`. 
 

diff --git a/docs/benchmarking_on_ec2.md b/docs/benchmarking_on_ec2.md
@@ -63,7 +63,9 @@ command below. The config file for this example can be viewed [here](src/fmbench
 
 1. All metrics are stored in the `/tmp/fmbench-write` directory created automatically by the `fmbench` package. Once the run completes all files are copied locally in a `results-*` folder as usual.
 
-## Benchmarking on an instance type with AMD processors
+## Benchmarking on an CPU instance type with AMD processors
+
+**_As of 2024-08-27 this has been tested on a `m7a.16xlarge` instance_**
 
 1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new `conda` environment for `FMBench`. See instructions for downloading anaconda [here](https://www.anaconda.com/download)
 
@@ -144,3 +146,88 @@ command below. The config file for this example can be viewed [here](src/fmbench
     ```
 
 1. All metrics are stored in the `/tmp/fmbench-write` directory created automatically by the `fmbench` package. Once the run completes all files are copied locally in a `results-*` folder as usual.
+
+
+## Benchmarking on an CPU instance type with Intel processors
+
+**_As of 2024-08-27 this has been tested on `c5.18xlarge` and `m5.16xlarge` instances_**
+
+1. Connect to your instance using any of the options in EC2 (SSH/EC2 Connect), run the following in the EC2 terminal. This command installs Anaconda on the instance which is then used to create a new `conda` environment for `FMBench`. See instructions for downloading anaconda [here](https://www.anaconda.com/download)
+
+    ```{.bash}
+    # Install Docker and Git using the YUM package manager
+    sudo yum install docker git -y
+
+    # Start the Docker service
+    sudo systemctl start docker
+
+    # Download the Miniconda installer for Linux
+    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
+    && bash Miniconda3-latest-Linux-x86_64.sh -b \  # Run the Miniconda installer in batch mode (no manual intervention)
+    && rm -f Miniconda3-latest-Linux-x86_64.sh \    # Remove the installer script after installation
+    && eval "$(/home/$USER/miniconda3/bin/conda shell.bash hook)"\ # Initialize conda for bash shell
+    && conda init  # Initialize conda, adding it to the shell
+    ```
+
+1. Setup the `fmbench_python311` conda environment.
+
+    ```{.bash}
+    # Create a new conda environment named 'fmbench_python311' with Python 3.11 and ipykernel
+    conda create --name fmbench_python311 -y python=3.11 ipykernel
+
+    # Activate the newly created conda environment
+    source activate fmbench_python311
+
+    # Upgrade pip and install the fmbench package
+    pip install -U fmbench
+    ```
+
+1. Build the `vllm` container for serving the model. 
+
+    1. 👉 The `vllm` container we are building locally is going to be references in the `FMBench` config file.
+
+    1. The container being build is for CPU only (GPU support might be added in future).
+
+        ```{.bash}
+        # Clone the vLLM project repository from GitHub
+        git clone https://github.com/vllm-project/vllm.git
+
+        # Change the directory to the cloned vLLM project
+        cd vllm
+
+        # Build a Docker image using the provided Dockerfile for CPU, with a shared memory size of 12GB
+        sudo docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=12g .
+        ```
+
+1. Create local directory structure needed for `FMBench` and copy all publicly available dependencies from the AWS S3 bucket for `FMBench`. This is done by running the `copy_s3_content.sh` script available as part of the `FMBench` repo. Replace `/tmp` in the command below with a different path if you want to store the config files and the `FMBench` generated data in a different directory.
+
+    ```{.bash}
+    curl -s https://raw.githubusercontent.com/aws-samples/foundation-model-benchmarking-tool/main/copy_s3_content.sh | sh -s -- /tmp
+    ```
+
+1. To download the model files from HuggingFace, create a `hf_token.txt` file in the `/tmp/fmbench-read/scripts/` directory containing the Hugging Face token you would like to use. In the command below replace the `hf_yourtokenstring` with your Hugging Face token.
+
+    ```{.bash}
+    echo hf_yourtokenstring > /tmp/fmbench-read/scripts/hf_token.txt
+    ```
+
+1. Before running FMBench, add the current user to the docker group. Run the following commands to run Docker without needing to use `sudo` each time.
+
+    ```{.bash}
+    sudo usermod -a -G docker $USER
+    newgrp docker
+    ```
+
+1. Run `FMBench` with a packaged or a custom config file. **_This step will also deploy the model on the EC2 instance_**. The `--write-bucket` parameter value is just a placeholder and an actual S3 bucket is not required. You could set the `--tmp-dir` flag to an EFA path instead of `/tmp` if using a shared path for storing config files and reports.
+
+    ```{.bash}
+    fmbench --config-file /tmp/fmbench-read/configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml --local-mode yes --write-bucket placeholder --tmp-dir /tmp > fmbench.log 2>&1
+    ```
+
+1. Open a new Terminal and and do a `tail` on `fmbench.log` to see a live log of the run.
+
+    ```{.bash}
+    tail -f fmbench.log
+    ```
+
+1. All metrics are stored in the `/tmp/fmbench-write` directory created automatically by the `fmbench` package. Once the run completes all files are copied locally in a `results-*` folder as usual.
diff --git a/docs/manifest.md b/docs/manifest.md
@@ -49,7 +49,9 @@ Here is a listing of the various configuration files available out-of-the-box wi
 [│   └── llama3/70b/config-llama3-70b-instruct-p4d.yml]( configs/llama3/70b/config-llama3-70b-instruct-p4d.yml)  
 **└── llama3/8b**  
 [├── llama3/8b/config-bedrock.yml](configs/llama3/8b/config-bedrock.yml)  
+[├── llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml](configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml)  
 [├── llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml](configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml)  
+[├── llama3/8b/config-ec2-llama3-8b-m5-16xlarge.yml](configs/llama3/8b/config-ec2-llama3-8b-m5-16xlarge.yml)  
 [├── llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml](configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml)  
 [├── llama3/8b/config-ec2-llama3-8b.yml](configs/llama3/8b/config-ec2-llama3-8b.yml)  
 [├── llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml](configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml)  

diff --git a/docs/releases.md b/docs/releases.md
@@ -1,5 +1,9 @@
 # Releases
 
+## 2.0.5
+
+1. Support for Intel CPU based instances such as `c5.18xlarge` and `m5.16xlarge`.
+
 ## 2.0.4
 
 1. Support for AMD CPU based instances such as `m7a`.

diff --git a/manifest.txt b/manifest.txt
@@ -1,103 +1,105 @@
-configs/byoe/config-model-byo-sagemaker-endpoint.yml
-configs/bedrock/config-bedrock-titan-text-express.yml
-configs/bedrock/config-bedrock-models-OpenOrca.yml
-configs/bedrock/config-bedrock-haiku-sonnet-majority-voting.yml
-configs/bedrock/config-bedrock-claude.yml
-configs/bedrock/config-bedrock-llama3-streaming.yml
-configs/bedrock/config-bedrock-llama3-1-8b-streaming.yml
+configs/bedrock/config-bedrock-all-anthropic-models-longbench-data.yml
 configs/bedrock/config-bedrock-anthropic-models-OpenOrca.yml
+configs/bedrock/config-bedrock-claude.yml
 configs/bedrock/config-bedrock-evals-only-conc-1.yml
+configs/bedrock/config-bedrock-haiku-sonnet-majority-voting.yml
+configs/bedrock/config-bedrock-llama3-1-70b-streaming.yml
+configs/bedrock/config-bedrock-llama3-1-8b-streaming.yml
 configs/bedrock/config-bedrock-llama3-1-no-streaming.yml
-configs/bedrock/config-bedrock-all-anthropic-models-longbench-data.yml
+configs/bedrock/config-bedrock-llama3-streaming.yml
+configs/bedrock/config-bedrock-models-OpenOrca.yml
+configs/bedrock/config-bedrock-titan-text-express.yml
 configs/bedrock/config-bedrock.yml
-configs/bedrock/config-bedrock-llama3-1-70b-streaming.yml
-configs/gemma/config-gemma-2b-g5.yml
-configs/phi/config-phi-3-g5.yml
-configs/mistral/config-mistral-v3-inf2-48xl-deploy-ec2-tp24.yml
-configs/mistral/config-mistral-instruct-v2-p5-lmi-dist.yml
-configs/mistral/config-mistral-trn1-32xl-deploy-ec2-tp32.yml
-configs/mistral/config-mistral-instruct-v1-p5-trtllm.yml
-configs/mistral/config-mistral-instruct-p4d.yml
-configs/mistral/config-mistral-instruct-AWQ-p4d.yml
-configs/mistral/config-mistral-instruct-AWQ-p5-byo-ep.yml
-configs/mistral/config-mistral-instruct-AWQ-p5.yml
-configs/mistral/config-mistral-7b-tgi-g5.yml
-configs/mistral/config-mistral-instruct-v2-p4d-lmi-dist.yml
-configs/mistral/config-mistral-7b-eks-inf2.yml
-configs/mistral/config-mistral-instruct-v2-p5-trtllm.yml
-configs/mistral/config-mistral-instruct-v2-p4d-trtllm.yml
 configs/bert/config-distilbert-base-uncased.yml
-configs/llama2/13b/config-llama2-13b-inf2-g5.yml
-configs/llama2/13b/config-llama2-13b-inf2-g5-p4d.yml
-configs/llama2/13b/config-byo-rest-ep-llama2-13b.yml
+configs/byoe/config-model-byo-sagemaker-endpoint.yml
+configs/gemma/config-gemma-2b-g5.yml
 configs/llama2/13b/config-bedrock-sagemaker-llama2.yml
+configs/llama2/13b/config-byo-rest-ep-llama2-13b.yml
+configs/llama2/13b/config-llama2-13b-inf2-g5-p4d.yml
+configs/llama2/13b/config-llama2-13b-inf2-g5.yml
 configs/llama2/70b/config-ec2-llama2-70b.yml
 configs/llama2/70b/config-llama2-70b-g5-p4d-tgi.yml
 configs/llama2/70b/config-llama2-70b-g5-p4d-trt.yml
 configs/llama2/70b/config-llama2-70b-inf2-g5.yml
 configs/llama2/7b/config-llama2-7b-byo-sagemaker-endpoint.yml
-configs/llama2/7b/config-llama2-7b-inf2-g5.yml
 configs/llama2/7b/config-llama2-7b-g4dn-g5-trt.yml
 configs/llama2/7b/config-llama2-7b-g5-no-s3-quick.yml
 configs/llama2/7b/config-llama2-7b-g5-quick.yml
+configs/llama2/7b/config-llama2-7b-inf2-g5.yml
+configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2-48xl-deploy-ec2.yml
+configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2.yml
+configs/llama3.1/8b/client-config-ec2-llama3-1-8b.yml
+configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
+configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2.yml
+configs/llama3.1/8b/config-llama3.1-8b-g5.yml
+configs/llama3.1/8b/server-config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
+configs/llama3/70b/config-bedrock.yml
+configs/llama3/70b/config-ec2-llama3-70b-instruct.yml
+configs/llama3/70b/config-ec2-neuron-llama3-70b-inf2-48xl.yml
+configs/llama3/70b/config-llama3-70b-instruct-g5-48xl.yml
+configs/llama3/70b/config-llama3-70b-instruct-g5-p4d.yml
+configs/llama3/70b/config-llama3-70b-instruct-p4d.yml
+configs/llama3/8b/config-bedrock.yml
+configs/llama3/8b/config-ec2-llama3-8b-c5-18xlarge.yml
+configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml
+configs/llama3/8b/config-ec2-llama3-8b-m5-16xlarge.yml
+configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml
+configs/llama3/8b/config-ec2-llama3-8b.yml
+configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml
+configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-48xl.yml
 configs/llama3/8b/config-llama3-8b-eks-inf2.yml
-configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-lmi-dist.yml
-configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-vllm.yml
-configs/llama3/8b/config-llama3-8b-instruct-g6-12xl.yml
+configs/llama3/8b/config-llama3-8b-g5-streaming.yml
+configs/llama3/8b/config-llama3-8b-inf2-24xl-tp=8-bs=4-byoe.yml
+configs/llama3/8b/config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
+configs/llama3/8b/config-llama3-8b-inf2-g5-byoe-w-openorca.yml
+configs/llama3/8b/config-llama3-8b-inf2-g5.yml
 configs/llama3/8b/config-llama3-8b-instruct-all.yml
 configs/llama3/8b/config-llama3-8b-instruct-g5-12xl-4-instances.yml
-configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=16-bs=4-byoe.yml
 configs/llama3/8b/config-llama3-8b-instruct-g5-12xl.yml
-configs/llama3/8b/config-ec2-llama3-8b-inf2-48xl.yml
-configs/llama3/8b/config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
-configs/llama3/8b/config-ec2-llama3-8b.yml
-configs/llama3/8b/config-ec2-llama3-8b-m7a-16xlarge.yml
+configs/llama3/8b/config-llama3-8b-instruct-g5-24xl.yml
 configs/llama3/8b/config-llama3-8b-instruct-g5-2xl.yml
-configs/llama3/8b/config-llama3-8b-g5-streaming.yml
-configs/llama3/8b/llama3-8b-inf2-48xl-byoe-g5-24xl.yml
 configs/llama3/8b/config-llama3-8b-instruct-g5-48xl.yml
+configs/llama3/8b/config-llama3-8b-instruct-g5-p4d.yml
+configs/llama3/8b/config-llama3-8b-instruct-g6-12xl.yml
 configs/llama3/8b/config-llama3-8b-instruct-g6-24xl.yml
-configs/llama3/8b/config-llama3-8b-inf2-g5.yml
-configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-48xl.yml
-configs/llama3/8b/config-llama3-8b-trn1.yml
 configs/llama3/8b/config-llama3-8b-instruct-g6-48xl.yml
-configs/llama3/8b/llama3-8b-inf2-24xl-byoe-g5-12xl.yml
-configs/llama3/8b/config-llama3-8b-inf2-24xl-tp=8-bs=4-byoe.yml
-configs/llama3/8b/config-llama3-8b-instruct-g5-24xl.yml
-configs/llama3/8b/config-ec2-neuron-llama3-8b-inf2-24xl.yml
-configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=8-bs=4-byoe.yml
+configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-lmi-dist.yml
+configs/llama3/8b/config-llama3-8b-instruct-p4d-djl-vllm.yml
 configs/llama3/8b/config-llama3-8b-instruct-p5-djl-lmi-dist.yml
+configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=16-bs=4-byoe.yml
+configs/llama3/8b/config-llama3-8b-trn1-32xl-tp=8-bs=4-byoe.yml
+configs/llama3/8b/config-llama3-8b-trn1.yml
+configs/llama3/8b/llama3-8b-inf2-24xl-byoe-g5-12xl.yml
+configs/llama3/8b/llama3-8b-inf2-48xl-byoe-g5-24xl.yml
 configs/llama3/8b/llama3-8b-trn1-32xl-byoe-g5-24xl.yml
-configs/llama3/8b/config-llama3-8b-instruct-g5-p4d.yml
-configs/llama3/8b/config-llama3-8b-inf2-g5-byoe-w-openorca.yml
-configs/llama3/8b/config-bedrock.yml
-configs/llama3/70b/config-llama3-70b-instruct-g5-48xl.yml
-configs/llama3/70b/config-llama3-70b-instruct-g5-p4d.yml
-configs/llama3/70b/config-ec2-llama3-70b-instruct.yml
-configs/llama3/70b/config-llama3-70b-instruct-p4d.yml
-configs/llama3/70b/config-ec2-neuron-llama3-70b-inf2-48xl.yml
-configs/llama3/70b/config-bedrock.yml
-configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
-configs/llama3.1/8b/config-ec2-llama3-1-8b-inf2.yml
-configs/llama3.1/8b/server-config-ec2-llama3-1-8b-inf2-48xl-deploy-ec2.yml
-configs/llama3.1/8b/config-llama3.1-8b-g5.yml
-configs/llama3.1/8b/client-config-ec2-llama3-1-8b.yml
-configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2.yml
-configs/llama3.1/70b/config-ec2-llama3-1-70b-inf2-48xl-deploy-ec2.yml
-prompt_template/.keep
-tokenizer/.keep
+configs/mistral/config-mistral-7b-eks-inf2.yml
+configs/mistral/config-mistral-7b-tgi-g5.yml
+configs/mistral/config-mistral-instruct-AWQ-p4d.yml
+configs/mistral/config-mistral-instruct-AWQ-p5-byo-ep.yml
+configs/mistral/config-mistral-instruct-AWQ-p5.yml
+configs/mistral/config-mistral-instruct-p4d.yml
+configs/mistral/config-mistral-instruct-v1-p5-trtllm.yml
+configs/mistral/config-mistral-instruct-v2-p4d-lmi-dist.yml
+configs/mistral/config-mistral-instruct-v2-p4d-trtllm.yml
+configs/mistral/config-mistral-instruct-v2-p5-lmi-dist.yml
+configs/mistral/config-mistral-instruct-v2-p5-trtllm.yml
+configs/mistral/config-mistral-trn1-32xl-deploy-ec2-tp32.yml
+configs/mistral/config-mistral-v3-inf2-48xl-deploy-ec2-tp24.yml
+configs/phi/config-phi-3-g5.yml
 llama2_tokenizer/.keep
-llama3_tokenizer/.keep
 llama3_1_tokenizer/.keep
+llama3_tokenizer/.keep
 mistral_tokenizer/.keep
 phi_tokenizer/.keep
+prompt_template/.keep
 scripts/.keep
-source_data/2wikimqa_e.jsonl
 source_data/2wikimqa.jsonl
-source_data/hotpotqa_e.jsonl
+source_data/2wikimqa_e.jsonl
+source_data/LICENSE.txt
+source_data/THIRD_PARTY_LICENSES.txt
 source_data/hotpotqa.jsonl
+source_data/hotpotqa_e.jsonl
 source_data/narrativeqa.jsonl
-source_data/triviaqa_e.jsonl
 source_data/triviaqa.jsonl
-source_data/LICENSE.txt
-source_data/THIRD_PARTY_LICENSES.txt
+source_data/triviaqa_e.jsonl
+tokenizer/.keep
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -80,13 +80,14 @@ nav:
       - BYO REST predictor: byo_rest_predictor.md
       - BYO dataset: byo_dataset.md
       - Build FMBench: build.md
+      - Analytics: analytics.md
 
   - Results:
       - Report: results.md
       - Website: internal_website.md
 
   - Releases:
-      - Major release: 2.0.x.md
+      - Major release: announcement.md
       - releases.md
   - Resources:
       - resources.md
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "fmbench"
-version = "2.0.4"
+version = "2.0.5"
 description ="Benchmark performance of **any Foundation Model (FM)** deployed on **any AWS Generative AI service**, be it **Amazon SageMaker**, **Amazon Bedrock**, **Amazon EKS**, or **Amazon EC2**. The FMs could be deployed on these platforms either directly through `FMbench`, or, if they are already deployed then also they could be benchmarked through the **Bring your own endpoint** mode supported by `FMBench`."
 authors = ["Amit Arora <aroraai@amazon.com>", "Madhur prashant <Madhurpt@amazon.com>"]
 readme = "README.md"

diff --git a/release_history.md b/release_history.md
@@ -1,3 +1,7 @@
+## 2.0.2
+
+1. Code cleanup, minor bug fixes and report improvements.
+
 ## 2.0.0
 
 1. 🚨 Model evaluations done by a **Panel of LLM Evaluators[[1]](#1)** 🚨