Skip to content

Limitation: S3 query result files must be fully downloaded to memory before streaming #1744

@coderabbitai

Description

@coderabbitai

Description

Currently, when streaming query results from S3 storage, each result file must be fully downloaded into memory before being deserialized and streamed to the client. This limits the effectiveness of the streaming approach for large individual result files.

Location

components/api-server/src/client.rs in the fetch_results_from_s3 method:

let bytes = obj.body.collect().await?.into_bytes();
let mut deserializer = rmp_serde::Deserializer::from_read_ref(bytes.as_ref());

While multiple files can be processed incrementally (providing some streaming behavior across files), each individual file is loaded entirely into memory.

Potential Solutions

  1. Implement a custom msgpack deserializer that can work with async byte streams without requiring the entire buffer in memory
  2. Switch to a more async-friendly format for serializing search results that supports incremental deserialization

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions