-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(percent-encoding): add support for preserving characters when decoding #970
base: main
Are you sure you want to change the base?
feat(percent-encoding): add support for preserving characters when decoding #970
Conversation
a6302c8
to
a782c3c
Compare
Copied from #969, to make this the canonical place to discuss: I wasn't 100% sure about where to put the constant. I went with EMPTY being a constant on the AsciiSet as empty seems like an inherent property of a type, but the other constants seem like usages of AsciiSet. I was 70/30% on this being right, so wouldn't object to this being changed to be consistent with the other constants. The rationale for making the constants references rather than just values all seemed odd to me. What was that necessary for? |
@@ -79,7 +79,7 @@ const BITS_PER_CHUNK: usize = 8 * mem::size_of::<Chunk>(); | |||
|
|||
impl AsciiSet { | |||
/// An empty set. | |||
pub const EMPTY: AsciiSet = AsciiSet { | |||
pub const EMPTY: &'static AsciiSet = &AsciiSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshka (continuing here to have a thread)
I don't think there's a reason it needs to be a reference, but given that the existing constants are references, I think it makes sense to just have all of them be the same.
Changing the existing ones to not be references would be a breaking change, so this kinda seems like the only option to me.
I guess you could consider this a special case, but that does require calling the functions with &AsciiSet::EMPTY
unlike the others, no?
Another option would be to change the function to take impl AsRef<AsciiSet>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know the design decisions behind this well enough to give a useful answer. I'd defer to the library maintainers for more understanding / context on that. Making a constant that is a reference to a constant value instead of just the value seems just a little odd to me. It also seems unlikely to me that an EMPTY const would ever be used except in some other constant expression, so it seems likely to me that a non-ref is still more correct here.
Rather than using AsRef as suggested, if I was fixing this up a bit to make it work with either values or refs, I'd add derived implementations for Clone
and Copy
, impl Into<AsciiSet> for &'_ AsciiSet { fn into(self) -> AsciiSet { *self } }
, and then change the methods to accept Into<AsciiSet>
and the PercentEncoding struct to just store the value instead of a ref. This would be both backward compatible and obvious. The caveat to this is I'm unsure if this code is called in some performance sensitive situation however (e.g where the nanoseconds matter), It's 16 bytes of memory copied instead of 8 bytes for the reference, so I'd hope a copy would be fine at some general level. It may not be if this is used in a super high volume scenario (e.g. a firewall or proxy). There aren't any benchmarks to imply that this would have some high perf needs.
BTW, I meant to add, I'm definitely no expert on this crate. I added the functionality in the recent PR as I was trying to work out how to represent RFC defined percentage encodings, and they are defined in terms of combinations of sets rather than in terms of individual characters, so it made sense to have that available here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance shouldn't matter too much either way, the more pressing thing for static-vs-const is the instruction size bloat if the const gets duplicated everywhere, but this is a small const. The compiler is also able to optimize in both ways at times. I'd say that the reference is slightly better just because of consistency with existing consts, but for smaller consts in general a straight up const is better overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#976 does the refactoring mentioned
Refs take 8 bytes, whereas the values are only 16 bytes, so there is not a huge benefit to using references rather than values. PercentEncoding is changed to store the AsciiSet as a value, and the functions that take AsciiSet now take Into<AsciiSet> instead of &'static AsciiSet. This allows existing code to continue to work without modification. The AsciiSet consts (CONTROLS and NON_ALPHANUMERIC) are also changed to be values, which is a breaking change, but will only affect code that attempts to dereference them. Discussion about the rationale for this is change is at <servo#970 (comment)>
Refs take 8 bytes, whereas the values are only 16 bytes, so there is not a huge benefit to using references rather than values. PercentEncoding is changed to store the AsciiSet as a value, and the functions that take AsciiSet now take Into<AsciiSet> instead of &'static AsciiSet. This allows existing code to continue to work without modification. The AsciiSet consts (CONTROLS and NON_ALPHANUMERIC) are also changed to be values, which is a breaking change, but will only affect code that attempts to dereference them. Discussion about the rationale for this is change is at <servo#970 (comment)>
Refs take 8 bytes, whereas the values are only 16 bytes, so there is not a huge benefit to using references rather than values. PercentEncoding is changed to store the AsciiSet as a value, and the functions that take AsciiSet now take Into<AsciiSet> instead of &'static AsciiSet. This allows existing code to continue to work without modification. The AsciiSet consts (CONTROLS and NON_ALPHANUMERIC) are also changed to be values, which is a breaking change, but will only affect code that attempts to dereference them. Discussion about the rationale for this is change is at <servo#970 (comment)>
Refs take 8 bytes, whereas the values are only 16 bytes, so there is not a huge benefit to using references rather than values. PercentEncoding is changed to store the AsciiSet as a value, and the functions that take AsciiSet now take Into<AsciiSet> instead of &'static AsciiSet. This allows existing code to continue to work without modification. The AsciiSet consts (CONTROLS and NON_ALPHANUMERIC) are also changed to be values, which is a breaking change, but will only affect code that attempts to dereference them. Discussion about the rationale for this is change is at <servo#970 (comment)>
Refs take 8 bytes, whereas the values are only 16 bytes, so there is not a huge benefit to using references rather than values. PercentEncoding is changed to store the AsciiSet as a value, and the functions that previously accepted a reference now accept a value. This is a breaking change for users who were passing a reference to AsciiSet to the functions in the public API. The AsciiSet consts (CONTROLS, NON_ALPHANUMERIC, etc.) are also changed to be values. This is an alternative to the non-breaking change in <servo#976> Discussion about the rationale for this is change is at <servo#970 (comment)>
This is useful to match the behavior of JavaScript's
decodeURI
for example.I've also made the functions not take a static reference to the ASCII set for inline inversions.
And I've changed
AsciiSet::EMPTY
(added in #969) to be a reference to match the existing constants (I'm not sure if they need to be references, but I feel like it's better to have the same behavior everywhere).