[data][llm] Use numpy arrays for embeddings to avoid torch.Tensor serialization overhead #59919

jeffreywang-anyscale · 2026-01-07T06:21:54Z

Description

Currently, Ray Data LLM uses torch.Tensor to store embeddings from pooling tasks (e.g., classify) in Ray dataset columns. This causes redundant copies during serialization and deserialization. Using NumPy arrays is better because they support zero-copy deserialization via Ray's shared memory object store.

When embeddings are stored in Ray Data columns:

np.array: Data stored once in shared memory → multiple workers read via pointers (zero-copy)
torch.Tensor: Data copied into pickle stream → copied again on deserialize (2x copies)

Benchmark Result

Using np.array, classification tasks with vLLMEngineStage show a ~9% throughput improvement.

Metric	torch.Tensor	np.array
Throughput (row/s)	953	1038

Hardware: L4
Repro command:

python benchmark_processor.py --mode classify --batch-size 2048 --concurrency 1 --num-prompts 102400 --model HuggingFaceTB/fineweb-edu-classifier

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

gemini-code-assist

Code Review

This pull request successfully improves performance by switching from torch.Tensor to numpy.ndarray for embeddings, which enables zero-copy deserialization in Ray. The core change in vllm_engine_stage.py is correct and well-targeted. The addition of a new classification mode to the benchmark script, along with a corresponding test case, effectively validates the performance gain and correctness of the new functionality.

I have one suggestion regarding an import statement in the benchmark script to improve its portability and maintainability. Overall, this is a solid contribution that delivers a measurable performance improvement.

python/ray/llm/_internal/batch/benchmark/benchmark_processor.py

…verhead Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

…ialization overhead (ray-project#59919) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>

jeffreywang-anyscale requested a review from a team as a code owner January 7, 2026 06:21

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

python/ray/llm/_internal/batch/benchmark/benchmark_processor.py Outdated Show resolved Hide resolved

cursor bot reviewed Jan 7, 2026

View reviewed changes

python/ray/llm/_internal/batch/benchmark/benchmark_processor.py Outdated Show resolved Hide resolved

ray-gardener bot added data Ray Data-related issues llm labels Jan 7, 2026

Use numpy arrays for embeddings to avoid torch.Tensor serialization o…

091c6ca

…verhead Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale force-pushed the use-numpy branch from 39d2139 to 091c6ca Compare January 7, 2026 07:32

jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Jan 7, 2026

Fix release tests

2d9ff2e

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha approved these changes Jan 7, 2026

View reviewed changes

kouroshHakha merged commit a1bfd6a into ray-project:master Jan 7, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data][llm] Use numpy arrays for embeddings to avoid torch.Tensor serialization overhead #59919

[data][llm] Use numpy arrays for embeddings to avoid torch.Tensor serialization overhead #59919

Uh oh!

jeffreywang-anyscale commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[data][llm] Use numpy arrays for embeddings to avoid torch.Tensor serialization overhead #59919

[data][llm] Use numpy arrays for embeddings to avoid torch.Tensor serialization overhead #59919

Uh oh!

Conversation

jeffreywang-anyscale commented Jan 7, 2026

Description

Benchmark Result

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants