Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
-
Updated
Mar 15, 2024 - C++
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
A script for PyTorch multi-GPU multi-process testing
Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization
Add a description, image, and links to the multi-gpu-inference topic page so that developers can more easily learn about it.
To associate your repository with the multi-gpu-inference topic, visit your repo's landing page and select "manage topics."