Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix for handling invalid http paths #492

Merged
merged 5 commits into from
Jan 29, 2025

Conversation

ritesh
Copy link
Contributor

@ritesh ritesh commented Jan 24, 2025

Hello! This is an incomplete attempt at trying to fix the bug that we encountered. I was unable to cargo build the project as I'm not familiar with py03.

What I think is happening is that when the path is parsed the percent_decode function, it returns a Result type and granian is currently unwrapping the result, causing the error to cause the the process to crash.

If instead of unwrapping it, we return unwrap_or with a Cow<_, str> with an empty value that should fix it (which is what I've done). Although I suspect this may not be the best way to approach it as there may be other instances of path parsing where we need to decode valid UTF8 but return either an empty string or some HTTP client error?

Issue: #490

@gi0baro
Copy link
Member

gi0baro commented Jan 24, 2025

Hello! This is an incomplete attempt at trying to fix the bug that we encountered.

Thank you for your contribution ❤️

I was unable to cargo build the project as I'm not familiar with py03.

Yeah, I'm sorry about that: I should really take the time to add contributing guidelines in the repo.
In the meantime, the following should be enough:

  • create a virtualenv and activate it
  • install maturin (pip install maturin)
  • build with maturin develop --extras dev

What I think is happening is that when the path is parsed the percent_decode function, it returns a Result type and granian is currently unwrapping the result, causing the error to cause the the process to crash.

Correct.

If instead of unwrapping it, we return unwrap_or with a Cow<_, str> with an empty value that should fix it (which is what I've done). Although I suspect this may not be the best way to approach it as there may be other instances of path parsing where we need to decode valid UTF8 but return either an empty string or some HTTP client error?

Yeah, while your change would fix the issue in itself, it would introduce a different issue which is messing with the routing of the application (as you're practically altering the request path and make it go to the root).
What I think it should happen is Granian should return a 400 response instead, without interacting with Python at all. This would probably require to move out the scope_native_parts macro invocation out of the blocking runtime call (brt.run) and deal with the error there. I'm not sure wether to also change the return value of the call_ methods to be a result itself and consequentially deal with that in the handle_http_response macro, as the other option would be to send the 400 to the receiver but feels a bit weird to me..

@ritesh
Copy link
Contributor Author

ritesh commented Jan 24, 2025

That makes sense. 100% agree that users should get a 400 error instead of the path being mutated/mangled.

Have you also considered if trying to decode it lossily might work? The rationale here would be that the server tries to decode it and assumes it's a user error in encoding the URL. If garbage comes out, you get a 404 since that invalid path doesn't exist. Again, I don't know what the spec says you should do in these cases (i.e. complain about invalid UTF-8 or just say the path doesn't exist).

If you get garbage UTF-8 sequences, you get the infamous UTF-8 replacement character.

https://docs.rs/percent-encoding-rfc3986/latest/percent_encoding_rfc3986/struct.PercentDecode.html#method.decode_utf8_lossy

@gi0baro
Copy link
Member

gi0baro commented Jan 24, 2025

Have you also considered if trying to decode it lossily might work? The rationale here would be that the server tries to decode it and assumes it's a user error in encoding the URL. If garbage comes out, you get a 404 since that invalid path doesn't exist. Again, I don't know what the spec says you should do in these cases (i.e. complain about invalid UTF-8 or just say the path doesn't exist).

I think the best option here is to check ASGI spec and/or what other servers do in order to drive the final decision.

@gi0baro
Copy link
Member

gi0baro commented Jan 24, 2025

Have you also considered if trying to decode it lossily might work? The rationale here would be that the server tries to decode it and assumes it's a user error in encoding the URL. If garbage comes out, you get a 404 since that invalid path doesn't exist. Again, I don't know what the spec says you should do in these cases (i.e. complain about invalid UTF-8 or just say the path doesn't exist).

I think the best option here is to check ASGI spec and/or what other servers do in order to drive the final decision.

Seems like both uvicorn and hypercorn do that. Thus, I'm fine switching to utf8 lossy here.

@ritesh
Copy link
Contributor Author

ritesh commented Jan 24, 2025

Sounds good - thanks for the info regarding setting up a dev environment. It is a trivial change but I'd like to add a test or two and learn more about this whole rust in python business. Hope to get it done today if I get some focused time away from work, but if not it will likely be early next week. If anyone else beats me to the punch that's OK too.

@ritesh
Copy link
Contributor Author

ritesh commented Jan 28, 2025

I've made an attempt at fixing the failing tests but I'm not sure if this is the right approach.
The server returns a 500 instead of a 404, and I'm not sure where to make the change where invalid UTF-8 paths are treated as 404s.

@gi0baro
Copy link
Member

gi0baro commented Jan 29, 2025

@ritesh hey, sorry I directly intervened on the code here, but I wanted this to be merged for the next patch release

Thank you for your contribution

@gi0baro gi0baro merged commit 4a461dc into emmett-framework:master Jan 29, 2025
15 of 16 checks passed
@gi0baro gi0baro linked an issue Jan 30, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Granian crashes with a UTF-8 error when you send it %c0
2 participants