Releases · ModelTC/lightllm

What's Changed

Benchclient by @shihaobai in #740

fix pause reqs by @shihaobai in #741

add RETURN_LIST for tgi_api by @shihaobai in #742

fix: fix a precision bug in the context_flashattention by @blueswhen in #743

Improve the accuracy of deepseekv3 by @hiworldwzj in #744

deepseekv3 bmm noquant and fix moe gemm bug. by @hiworldwzj in #745

Add Xgrammar Support by @flyinglandlord in #701

fuse fp8 quant in kv copying and add flashinfer decode mla operator in the attention module by @blueswhen in #737

fix: add flashinfer-python in the requirements.txt by @blueswhen in #749

Fix tokens2 by @SangChengC in #748

Fix Unit-test in PR: Add xgrammar by @flyinglandlord in #750

add support for multinode tp by @shihaobai in #751

New Features

Cross-Process Request Object:
- Retained and optimized the previous three-process architecture design.
- Introduced a request object that can be accessed across processes, significantly reducing inter-process communication overhead.
Folding of scheduling and model inference:
- Implemented the folding of scheduling and model inference, significantly reducing communication overhead between the scheduler and modelrpc.
CacheTensorManager:
- New class to manage the allocation and release of Torch tensors within the framework.
- Maximizes tensor sharing across layers at runtime and enhances memory sharing between different CUDA graphs.
- On an 8x80GB H100 machine, using the DeepSeek-v2 model, LightLLM can run 200 CUDA graphs concurrently without out of memory (OOM).
PD-Disaggregation Prototype
- Dynamic registration of P and D nodes
Fastest DeepSeek-R1 performance on H200
- sglang==0.4.3, vllm==0.7.2, trtllm==0.17.0
- num_clients = 100. The input length of the test data is 1024, and the output follows a Gaussian distribution with a mean of 128

For more details, stay tuned to our blog at https://www.light-ai.top/lightllm-blog/. Thanks to outstanding projects like vllm, sglang, and trtllm, LightLLM also leverages some of the high-performance quantization kernels from vllm. We hope to collaborate in driving the growth of the open-source community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights

What's Changed

Contributors

New Features

Releases: ModelTC/lightllm

v1.0.1

Highlights

What's Changed

Contributors

LightLLM v1.0.0 Release!

New Features