Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add compression middleware #65

Closed
wants to merge 5 commits into from
Closed

Conversation

Nehliin
Copy link

@Nehliin Nehliin commented Sep 4, 2019

Description

The middleware inserts an Accepted_Encoding header with "gzip, br, deflate, zstd" as value and then decodes the response from the server depending on theContent-Encoding. The implementation is heavily inspired of the Decompression implemented in tide. To implement a similar approach for surf I wrapped the Body used in requests and responses in a BufReader and used the different AsyncBufRead decoders from async_compression.

Motivation and Context

Fixes #24

How Has This Been Tested?

I have added a StubClient so the middleware can be tested with integration tests. More tested are needed though, for example responses with multiple layers of compression.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

}
}
}

Copy link
Member

@yoshuawuyts yoshuawuyts Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we should probably use the AsyncRead variant in async-compression. This might be related to #28 so the body can be read without an extra memcpy, but I definitely do not want to expose tokio-rs/bytes in any public API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is, I don't think we will need bytes if AsyncBufRead is used.

@@ -20,6 +20,9 @@ wasm-client = ["js-sys", "web-sys", "wasm-bindgen", "wasm-bindgen-futures"]
middleware-logger = []

[dependencies]
async-compression = {version = "0.1.0-alpha.1"}
bytes = "0.4.12"
accept-encoding = "0.2.0-alpha.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should make the dependencies part of a middleware-compression feature.

Copy link
Member

@yoshuawuyts yoshuawuyts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, seems like you've been making a lot of progress on this!

I guess the biggest point is we should be using the AsyncRead variant of async-compression so we can keep a consistent end-user interface; but overall super excited about this!

@Nehliin
Copy link
Author

Nehliin commented Sep 5, 2019

Thank you! I presume you mean the AsyncBufRead version?

@yoshuawuyts
Copy link
Member

@Nehliin haa, yes I did!

@Nehliin
Copy link
Author

Nehliin commented Sep 8, 2019

I'm not sure if I am misreading the documentation for async_compression but it seems like only the Deflate and Zlib encoders are available to use with AsyncBufRead. Which makes it hard for me to do anything until support for decoders and or other compression algorithms are available. I also tried to experiment with changing the reader in the Body to use AsyncBufRead but had a hard time getting anywhere. I presume that would be out of scope for this PR and another one would need to be made for that.

What would you suggest the next steps to be? Should I wait for more work on async_compression and wait until the Body could make use of AsyncBufRead or should I try to clean up the current implementation so Bytes aren't exposed to the users? I'm not sure if that's possible though since the Decoders are dependent of it. Maybe some internal wrapper type could help with that but I'm not sure.

@yoshuawuyts
Copy link
Member

What would you suggest the next steps to be?

Hmm, yeah I'd really like us to not rely on bytes as I haven't had pleasant experiences debugging it in the past. Indeed ideally the way forward is to expose AsyncBufRead on the body.

To be fair though, if there's a problem with encoding, I think it's fine to only do decoding. Encoding is a lot harder for clients, because unless you control both client + server you can't really know ahead of time which compression algorithms are supported, so there needs to be a bit of back-and-forth to figure it out, which is not standardized in any HTTP protocol, and thus is hard to write generic middleware for.

Do you think this is this something you can work with?

Also cc/ @fairingrey @Nemo157, this might be interesting to read along on.

@Nemo157
Copy link
Collaborator

Nemo157 commented Sep 12, 2019

I'm not sure if I am misreading the documentation for async_compression but it seems like only the Deflate and Zlib encoders are available to use with AsyncBufRead.

Yeah, the selection outside of stream is rather limited at the moment as that was the only module used by Tide when this was being developed, I implemented those encoders as PoC to see whether they would be simpler/more complicated than the stream based ones.

I also tried to experiment with changing the reader in the Body to use AsyncBufRead but had a hard time getting anywhere.

The easiest way to do that might be to wrap the body in a BufReader before wrapping it with the compression. I was contemplating an async-compression::read module that would do this internally, but I don't think it's worth hiding that detail from the user (probably should mention it in the documentation though).


If there's a usecase for the bufread decoders now I can definitely find some time to work on/mentor them. I just opened Nullus157/async-compression#32 about an idea I've had for a while that would greatly simplify getting them done, I can probably experiment with that over the next couple of days and see how it goes.

@Nehliin
Copy link
Author

Nehliin commented Sep 12, 2019

To be fair though, if there's a problem with encoding, I think it's fine to only do decoding. Encoding is a lot harder for clients, because unless you control both client + server you can't really know ahead of time which compression algorithms are supported, so there needs to be a bit of back-and-forth to figure it out, which is not standardized in any HTTP protocol, and thus is hard to write generic middleware for.

Yes I also only want to decode but the problem was that there were only encoders while I was looking for a decoder :)

The easiest way to do that might be to wrap the body in a BufReader before wrapping it with the compression. I was contemplating an async-compression::read module that would do this internally, but I don't think it's worth hiding that detail from the user (probably should mention it in the documentation though).

I could try and make that work! I'm a bit busy the up coming week but hopefully I could get something started.

Thanks for the help!

@Nemo157
Copy link
Collaborator

Nemo157 commented Sep 16, 2019

@Nehliin just published async-compression 0.1.0-alpha.5 with full encoder/decoder support for AsyncBufRead

@Nehliin
Copy link
Author

Nehliin commented Sep 16, 2019

Great thanks! Hopefully I can have some new commits done this weekend :)

@Nehliin
Copy link
Author

Nehliin commented Sep 21, 2019

@Nemo157 I can't use the latest version since that depends on futures-preview v0.3.0-alpha.18 and chttp (latest version) has set the same dependency to version = "=0.3.0-alpha.17" so cargo can't find a compatible version :( Unfortunately the chttp crate has changed name to isahc so we would need to change out that dependency. But when I tried to replace the old with isachI ran in to other issues like compilation errors in futures-timer v0.2.1 and even if I specify a newer version in Cargo.toml I end up with the same issue so some other dependency seems to depend on 0.2.1. I'm a bit unsure how to fix this 😕

@Nemo157
Copy link
Collaborator

Nemo157 commented Sep 21, 2019

As a workaround until chttp/isahc can be updated you could use this as your dependency, it's 0.1.0-alpha.5 but with the dependency changed back to futures alpha.17:

async-compression = { git = "https://github.com/Nemo157/async-compression-rs", branch = "0.17-override" }

@Nehliin
Copy link
Author

Nehliin commented Sep 22, 2019

That's great! Thanks for the quick reply :)

@Nehliin
Copy link
Author

Nehliin commented Sep 22, 2019

I updated the code so it now uses the AsyncBufRead version of async-compression and added the middleware-compression feature. I also updated the tests to include some "onion layered" compression that the middleware should handle. Those tests currently fails but I think it's because I don't set the header correctly in the mocked response from the stub client. That should hopefully be an easy fix that I'll do in the beginning of next week.

@Nehliin
Copy link
Author

Nehliin commented Sep 28, 2019

I have now updated documentation a bit and added errors, the integration tests pass but the doc tests that tests a request to https://httpbin.org/gzip fails. with the error Error: Custom { kind: InvalidData, error: "Invalid gzip header" } which confuses me. It's like the body of the response is not encoded in a valid way. But I've tried to curl it and then decode the response with GzDecoder directly in a separate test used when I tried to debug and that works. My code only extracts the body and then wraps the body in to a BufReader as previously suggested and gives that to the GzipDecoder from async-compression.

I've been stuck on this for awhile do you have any suggestions on things I could try?

@Nemo157
Copy link
Collaborator

Nemo157 commented Sep 28, 2019

When you say it decoded ok using GzDecoder are you referring to the synchronous one or the one from async-compression? It could be an async-compression bug, flate2 doesn't expose the gzip header parsing code so we had to reimplement that, and it's definitely not 100% complete.

EDIT: Either way it's probably an async-compression bug. If it parses ok in a test then it could be related to the exact sequence of pending/partial-buffers being read from the network, it would be nice to be able to instrument the BufReader somehow so it spits out the exact sets of bytes it makes available on each call to generate a test from it. I can probably take a look at what's happening sometime tomorrow.

@Nehliin
Copy link
Author

Nehliin commented Sep 29, 2019

I used the flate2::bufread::GzDecoder. Ah then it's probably a bug as you said, makes sense. I should have said something earlier and perhaps save me some confusion 😄. I found this if that may be of interest https://docs.rs/gzip-header/0.3.0/gzip_header/

@Nemo157
Copy link
Collaborator

Nemo157 commented Sep 30, 2019

So, throwing some debugging in I found this:

> cargo test --doc  --features middleware-compression -- middleware::compression
[...]
[src/middleware/compression/compression.rs:23] (type_name::<T>(), reader.poll_read(cx, buf)) = (
    "surf::http_client::Body",
    Ready(
        Ok(
            285,
        ),
    ),
)
[src/middleware/compression/compression.rs:24] std::str::from_utf8(buf) = Ok(
    "{\n  \"gzipped\": true, \n  \"headers\": {\n    \"Accept\": \"*/*\", \n    \"Accept-Encoding\": \"gzip, br, deflate, zstd\", \n    \"Content-Length\": \"0\", \n    \"Host\": \"httpbin.org\", \n    \"User-Agent\": \"curl/7.65.1-DEV chttp/0.5.5\"\n  }, \n  \"method\": \"GET\", \n  \"origin\": \"92.116.115.71, 92.116.115.71\"\n}\n\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u
[...]

that's before the data is passed into the GzipDecoder, so it appears the underlying body implementation is already gzip decoding the content, and it's correctly failing when attempting to double-decode it.

@Nehliin
Copy link
Author

Nehliin commented Oct 1, 2019

Oh, well that explains it. It turns out that isahc already decodes automatically. Looks like hyper doesn't do any compression reading through this discussion I wanted to try it myself but couldn't because of a compiler error with the latest surf release which isn't that nice:

error[E0277]: the trait bound ext::TimeoutStream<S>: futures_core::stream::Stream is not satisfied --> /home/oskar/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-timer-0.2.1/src/ext.rs:170:9 | 170 | impl<S> TryStream for TimeoutStream<S> | ^^^^^^^^^ the trait futures_core::stream::Stream is not implemented for ext::TimeoutStream<S>

And if I use 1.01 or 1.0.0 I get undeclared module hyper_tls together with native_tls because I think someone forgot to add those dependencies to the hyper-client feature in Cargo.toml

@Nehliin
Copy link
Author

Nehliin commented Oct 14, 2019

Maybe this PR isn't necessary now that isach already does this under the hood? Should I close the PR? I learned a lot either way so I'm happy anyways :)

@yoshuawuyts
Copy link
Member

Maybe this PR isn't necessary now that isach already does this under the hood

Apologies for the slow reply. I do think this would be fantastic to have, even if Isahc already does this. Not all client backends support compression, and having an easy way to enable it is probably a win for everyone involved!

I'm not sure if now is the right time to pick this back up, since #131 will likely change the signature of our middleware, but overall this would be fantastic to have!

@yoshuawuyts yoshuawuyts added the enhancement New feature or request label Jan 10, 2020
@goto-bus-stop goto-bus-stop changed the base branch from master to main August 9, 2020 12:57
@Fishrock123
Copy link
Member

Hi @Nehliin, if you still want to persue this, could you open a new PR with something that is more like https://github.com/Fishrock123/tide-compress?

Client decompression is likely a necessary feature for Surf to switch to using async-h1 as the default backend: #217

@Nehliin
Copy link
Author

Nehliin commented Feb 24, 2021

@Fishrock123 I won't have time to work on this for a while :/

@Nehliin Nehliin closed this Nov 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Middleware: compression
4 participants