From fb86b5e5a04ce0393912e74b516971c1333ed754 Mon Sep 17 00:00:00 2001 From: Louie Tsai Date: Sat, 8 Feb 2025 00:58:33 -0800 Subject: [PATCH] Add Deepseek model into validated model table and add required Gaudi cards for LLM microservice (#1267) * Update README.md for Deepseek support and numbers of required gaudi cards Signed-off-by: Tsai, Louie * Update README.md Signed-off-by: Tsai, Louie --------- Signed-off-by: Tsai, Louie --- comps/llms/src/text-generation/README.md | 43 +++++++++++++++++------- 1 file changed, 31 insertions(+), 12 deletions(-) diff --git a/comps/llms/src/text-generation/README.md b/comps/llms/src/text-generation/README.md index 360c459dc..ba1a31df3 100644 --- a/comps/llms/src/text-generation/README.md +++ b/comps/llms/src/text-generation/README.md @@ -8,14 +8,31 @@ Overall, this microservice offers a streamlined way to integrate large language ## Validated LLM Models -| Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi | -| --------------------------- | --------- | -------- | ---------- | -| [Intel/neural-chat-7b-v3-3] | ✓ | ✓ | ✓ | -| [Llama-2-7b-chat-hf] | ✓ | ✓ | ✓ | -| [Llama-2-70b-chat-hf] | ✓ | - | ✓ | -| [Meta-Llama-3-8B-Instruct] | ✓ | ✓ | ✓ | -| [Meta-Llama-3-70B-Instruct] | ✓ | - | ✓ | -| [Phi-3] | x | Limit 4K | Limit 4K | +| Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi | +| ------------------------------------------- | --------- | -------- | ---------- | +| [Intel/neural-chat-7b-v3-3] | ✓ | ✓ | ✓ | +| [meta-llama/Llama-2-7b-chat-hf] | ✓ | ✓ | ✓ | +| [meta-llama/Llama-2-70b-chat-hf] | ✓ | - | ✓ | +| [meta-llama/Meta-Llama-3-8B-Instruct] | ✓ | ✓ | ✓ | +| [meta-llama/Meta-Llama-3-70B-Instruct] | ✓ | - | ✓ | +| [Phi-3] | x | Limit 4K | Limit 4K | +| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | ✓ | - | ✓ | +| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | ✓ | - | ✓ | + +### System Requirements for LLM Models + +| Model | Minimum number of Gaudi cards | +| ------------------------------------------- | ----------------------------- | +| [Intel/neural-chat-7b-v3-3] | 1 | +| [meta-llama/Llama-2-7b-chat-hf] | 1 | +| [meta-llama/Llama-2-70b-chat-hf] | 2 | +| [meta-llama/Meta-Llama-3-8B-Instruct] | 1 | +| [meta-llama/Meta-Llama-3-70B-Instruct] | 2 | +| [Phi-3] | x | +| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | 8 | +| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B] | 4 | + +> NOTE: Detailed system requirements coming soon. ## Support integrations @@ -166,9 +183,11 @@ curl http://${host_ip}:${TEXTGEN_PORT}/v1/chat/completions \ [Intel/neural-chat-7b-v3-3]: https://huggingface.co/Intel/neural-chat-7b-v3-3 -[Llama-2-7b-chat-hf]: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf -[Llama-2-70b-chat-hf]: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf -[Meta-Llama-3-8B-Instruct]: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct -[Meta-Llama-3-70B-Instruct]: https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct +[meta-llama/Llama-2-7b-chat-hf]: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf +[meta-llama/Llama-2-70b-chat-hf]: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf +[meta-llama/Meta-Llama-3-8B-Instruct]: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct +[meta-llama/Meta-Llama-3-70B-Instruct]: https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct [Phi-3]: https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3 [HuggingFace]: https://huggingface.co/ +[deepseek-ai/DeepSeek-R1-Distill-Llama-70B]: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B +[deepseek-ai/DeepSeek-R1-Distill-Qwen-32B]: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B