multi-gpu-inference

Here are 3 public repositories matching this topic...

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

bloom falcon moe gemma mistral mixture-of-experts model-quantization multi-gpu-inference m2m100 llamacpp llm-inference internlm llama2 qwen baichuan2 mixtral phi-2 deepseek minicpm

A script for PyTorch multi-GPU multi-process testing

Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization

Add a description, image, and links to the multi-gpu-inference topic page so that developers can more easily learn about it.

To associate your repository with the multi-gpu-inference topic, visit your repo's landing page and select "manage topics."