Skip to content

Refactor ATOM for top-k top-p sampling support#227

Open
aryaman-gupta wants to merge 6 commits intomainfrom
aryaman/topk-topp-support
Open

Refactor ATOM for top-k top-p sampling support#227
aryaman-gupta wants to merge 6 commits intomainfrom
aryaman/topk-topp-support

Conversation

@aryaman-gupta
Copy link

@aryaman-gupta aryaman-gupta commented Feb 20, 2026

Summary

Adds top-k and top-p sampling support to ATOM, complementing existing temperature-based sampling.

Changes

File Changes
sampling_params.py Added top_k: int = -1 and top_p: float = 1.0 fields with validation
sequence.py Store top_k and top_p from sampling params
scheduler.py Added top_ks and top_ps lists to ScheduledBatch
model_runner.py Added GPU buffers, updated prepare_sample with CPU-side uniformity optimization
sampler.py Added top-k/top-p filtering with aiter integration. Default temperature-base Gumbel Max path when filtering disabled (no overhead). Native PyTorch fallback when aiter.ops.sampling unavailable (marked experimental).
openai_server.py Wired up top_k and top_p parameters

Basic Testing

Run a model:

python3 -m atom.entrypoints.openai_server --model <model_path> <additional_params> --host 0.0.0.0 --port 8000 

Query with top-k and top-p parameters:

import requests

response = requests.post(
  "http://localhost:8000/v1/completions",
  json={
      "model": "/it-share/gpt-oss-120b/",
      "prompt": "The capital city of France is",
      "max_tokens": 32,
      "temperature": 0.8,
      "top_k": 10,
      "top_p": 0.8,
      "stream": True
  },
  stream=True
)
for line in response.iter_lines():
  print(line)

@aryaman-gupta aryaman-gupta marked this pull request as ready for review February 26, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants