Skip to content

Conversation

@jeffreywang-anyscale
Copy link
Contributor

Description

Currently, Ray Data LLM uses torch.Tensor to store embeddings from pooling tasks (e.g., classify) in Ray dataset columns. This causes redundant copies during serialization and deserialization. Using NumPy arrays is better because they support zero-copy deserialization via Ray's shared memory object store.

When embeddings are stored in Ray Data columns:

  • np.array: Data stored once in shared memory → multiple workers read via pointers (zero-copy)
  • torch.Tensor: Data copied into pickle stream → copied again on deserialize (2x copies)

Benchmark Result

Using np.array, classification tasks with vLLMEngineStage show a ~9% throughput improvement.

Metric torch.Tensor np.array
Throughput (row/s) 953 1038
  • Hardware: L4
  • Repro command:
python benchmark_processor.py --mode classify --batch-size 2048 --concurrency 1 --num-prompts 102400 --model HuggingFaceTB/fineweb-edu-classifier

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@jeffreywang-anyscale jeffreywang-anyscale requested a review from a team as a code owner January 7, 2026 06:21
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully improves performance by switching from torch.Tensor to numpy.ndarray for embeddings, which enables zero-copy deserialization in Ray. The core change in vllm_engine_stage.py is correct and well-targeted. The addition of a new classification mode to the benchmark script, along with a corresponding test case, effectively validates the performance gain and correctness of the new functionality.

I have one suggestion regarding an import statement in the benchmark script to improve its portability and maintainability. Overall, this is a solid contribution that delivers a measurable performance improvement.

@ray-gardener ray-gardener bot added data Ray Data-related issues llm labels Jan 7, 2026
…verhead

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Jan 7, 2026
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@kouroshHakha kouroshHakha merged commit a1bfd6a into ray-project:master Jan 7, 2026
6 checks passed
AYou0207 pushed a commit to AYou0207/ray that referenced this pull request Jan 13, 2026
…ialization overhead (ray-project#59919)

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants