A high-performance RDMA distributed storage system for fast LLM Inference and GPU Training
python big-data cpp gpu cuda rdma infiniband distributed-cache kv-cache ucx llm-serving vllm llm-framework sglang gpu-multiplexing
-
Updated
Sep 18, 2025 - C++