You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following tables detail the models supported by LMDeploy's TurboMind engine and PyTorch engine across different platforms.
TurboMind on CUDA Platform
Model
Size
Type
FP16/BF16
KV INT8
KV INT4
W4A16
Llama
7B - 65B
LLM
Yes
Yes
Yes
Yes
Llama2
7B - 70B
LLM
Yes
Yes
Yes
Yes
Llama3
8B, 70B
LLM
Yes
Yes
Yes
Yes
Llama3.1
8B, 70B
LLM
Yes
Yes
Yes
Yes
InternLM
7B - 20B
LLM
Yes
Yes
Yes
Yes
InternLM2
7B - 20B
LLM
Yes
Yes
Yes
Yes
InternLM2.5
7B
LLM
Yes
Yes
Yes
Yes
InternLM-XComposer2
7B, 4khd-7B
MLLM
Yes
Yes
Yes
Yes
InternLM-XComposer2.5
7B
MLLM
Yes
Yes
Yes
Yes
Qwen
1.8B - 72B
LLM
Yes
Yes
Yes
Yes
Qwen1.5
1.8B - 110B
LLM
Yes
Yes
Yes
Yes
Qwen2
1.5B - 72B
LLM
Yes
Yes
Yes
Yes
Mistral
7B
LLM
Yes
Yes
Yes
-
Qwen-VL
7B
MLLM
Yes
Yes
Yes
Yes
DeepSeek-VL
7B
MLLM
Yes
Yes
Yes
Yes
Baichuan
7B
LLM
Yes
Yes
Yes
Yes
Baichuan2
7B
LLM
Yes
Yes
Yes
Yes
Code Llama
7B - 34B
LLM
Yes
Yes
Yes
No
YI
6B - 34B
LLM
Yes
Yes
Yes
Yes
LLaVA(1.5,1.6)
7B - 34B
MLLM
Yes
Yes
Yes
Yes
InternVL
v1.1- v1.5
MLLM
Yes
Yes
Yes
Yes
InternVL2
2B-76B
MLLM
Yes
Yes
Yes
Yes
MiniCPM-Llama3-V-2_5
-
MLLM
Yes
Yes
Yes
Yes
MiniCPM-V-2_6
-
MLLM
Yes
Yes
Yes
Yes
MiniGeminiLlama
7B
MLLM
Yes
-
-
Yes
GLM4
9B
LLM
Yes
Yes
Yes
Yes
CodeGeeX4
9B
LLM
Yes
Yes
Yes
-
"-" means not verified yet.
The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.