OSC-LLM

A lightweight LLM inference toolkit focused on minimizing inference latency.

Features

CUDA Graph: Compilation optimizations that reduce inference latency
PagedAttention: Efficient KV-cache management enabling long-sequence inference
Continuous batching: Supports dynamic batch inference optimization
FlashAttention: Memory optimization for long sequences

💡 All technical details are built on osc-transformers, please visit for more details.

Installation

Install the PyTorch
Install flash-attn: recommended to use the official prebuilt wheel to avoid build issues
Install osc-llm

pip install osc-llm --upgrade

Quick Start

Basic Usage

from osc_llm import LLM, SamplingParams

# Initialize the model
llm = LLM("checkpoints/Qwen/Qwen3-0.6B", gpu_memory_utilization=0.5, device="cuda:0")

# Chat
messages = [
    {"role": "user", "content": "Hello! What's your name?"}
]
sampling_params = SamplingParams(temperature=0.5, top_p=0.95, top_k=40)
result = llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=False)
print(result)

# Streaming generation
for token in llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=True):
    print(token, end="", flush=True)

Supported Models

Qwen3ForCausalLM
Qwen2ForCausalLM

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
.github/workflows		.github/workflows
osc_llm		osc_llm
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
readme-zh.md		readme-zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OSC-LLM

Features

Installation

Quick Start

Basic Usage

Supported Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

di-osc/osc-llm

Folders and files

Latest commit

History

Repository files navigation

OSC-LLM

Features

Installation

Quick Start

Basic Usage

Supported Models

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages