Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTTP responses containing multiple ZSTD frames #2583

Merged

Conversation

Andrey36652
Copy link
Contributor

proper tests and fix for #2574

Copy link

@yanns yanns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me.
Of course, we can multiply the tests, like

  • multiple frames in multiple chunks, where one frame = one chunk
  • multiple frames in multiple chunks, where frames and chunks do not align

Here is the one I was working on:

#[tokio::test]
async fn test_chunked_fragmented_multiple_frames_in_multiple_chunks() {
    const DELAY_BETWEEN_RESPONSE_PARTS: tokio::time::Duration =
        tokio::time::Duration::from_millis(1000);
    const DELAY_MARGIN: tokio::time::Duration = tokio::time::Duration::from_millis(50);
    const RESPONSE_CONTENT_2: &str = "another message here";

    let server = server::low_level_with_response(|_raw_request, client_socket| {
        Box::new(async move {
            let zstded_content_1 = zstd_compress(RESPONSE_CONTENT.as_bytes());
            let zstded_content_2 = zstd_compress(RESPONSE_CONTENT_2.as_bytes());

            let response_first_part = [
                COMPRESSED_RESPONSE_HEADERS,
                format!(
                    "Transfer-Encoding: chunked\r\n\r\n{:x}\r\n",
                    zstded_content_1.len() + zstded_content_2.len()
                )
                .as_bytes(),
                &zstded_content_1,
            ]
            .concat();

            client_socket
                .write_all(response_first_part.as_slice())
                .await
                .expect("response_first_part write_all failed");
            client_socket
                .flush()
                .await
                .expect("response_first_part flush failed");

            tokio::time::sleep(DELAY_BETWEEN_RESPONSE_PARTS).await;

            client_socket
                .write_all(zstded_content_2.as_slice())
                .await
                .expect("response_second_part write_all failed");
            client_socket
                .flush()
                .await
                .expect("response_second_part flush failed");

            client_socket
                .write_all(b"\r\n0\r\n\r\n")
                .await
                .expect("write_all failed");
            client_socket.flush().await.expect("flush failed");
        })
    });

    let start = tokio::time::Instant::now();
    let res = reqwest::Client::new()
        .get(format!("http://{}/", server.addr()))
        .send()
        .await
        .expect("response");

    assert_eq!(
        res.text().await.expect("text"),
        format!("{RESPONSE_CONTENT}{RESPONSE_CONTENT_2}")
    );
    assert!(start.elapsed() >= DELAY_BETWEEN_RESPONSE_PARTS - DELAY_MARGIN);
}

But I guess we cannot test all combinations here. The code is generic enough.

@seanmonstar
Copy link
Owner

Thanks! Something I'd like to know before we merge:

My question is, is that the right place to do this? Is it expected in zstd to have multiple members? I believe in other algorithms that's supported but atypical. If you want to investigate that, it'd be good to have a solid answer before we blindly flip that switch.

@yanns
Copy link

yanns commented Mar 6, 2025

Thanks! Something I'd like to know before we merge:

My question is, is that the right place to do this? Is it expected in zstd to have multiple members? I believe in other algorithms that's supported but atypical. If you want to investigate that, it'd be good to have a solid answer before we blindly flip that switch.

I've tried to find more information about it, but could not really find something good.
What I found:

Zstandard compressed data is made up of one or more frames. Each frame is independent and can be decompressed independently of other frames. The decompressed content of multiple concatenated frames is the concatenation of each frame's decompressed content.

Since the official tools transparently handle multiple frames, I'm leaning towards that being the default for python-zstandard - at least for streaming operations

I could not find anything in the HTTP spec specifying if a client must handle multiple zstd frames.

@Andrey36652
Copy link
Contributor Author

Is it expected in zstd to have multiple members?

Yes. Per https://www.rfc-editor.org/rfc/rfc8878.html#section-3.1-1

There are examples in the past when lib not supporting multi framed ZSTD decompression by default leads to confusion:

There are cases when developers want fine control over compressed ZSTD content (like indygreg/python-zstandard#59 and https://docs.rs/zstd-framed/latest/zstd_framed/ ) but I don't think that's the intended usecase for reqwest

Co-authored-by: Yann Simon <yann.simon.fr@gmail.com>
@yanns
Copy link

yanns commented Mar 7, 2025

I could try this patch on an internal service.
We could not activate zstd before because of the issue.
Now it's working perfectly.

Copy link
Owner

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@seanmonstar seanmonstar merged commit 6f9d0ee into seanmonstar:master Mar 7, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants