Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AsciiSet::EMPTY and boolean operators #969

Merged
merged 3 commits into from
Sep 19, 2024
Merged

Conversation

joshka
Copy link
Contributor

@joshka joshka commented Sep 19, 2024

In RFCs, the sets of characters to percent-encode are often defined as
the union of multiple sets. This change adds an EMPTY constant to
AsciiSet and implements the Add trait for AsciiSet so that sets
can be combined with the + operator. The ! negation operator is also
defined, as well as equivalent constant functions for these (union(),
complement()).

AsciiSet now derives Debug, PartialEq, and Eq so that it can be
used in tests.


Example: https://www.rfc-editor.org/rfc/rfc3986#section-3.4 defines

   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Using this new method, this can be easily represented as:

const SUB_DELIMS: AsciiSet = AsciiSet::EMPTY.add(b'!').add(b'$') ...
const GEN_DELIMS: AsciiSet = AsciiSet::EMPTY.add(b':').add(b'/') ...
const RESERVED: AsciiSet = GEN_DELIMS.union(SUB_DELIMS);

Similarly the set of characters that must be encoded is defined as the set of characters that are not in the allowed characters

https://www.rfc-editor.org/rfc/rfc3986#section-2.2

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.
If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

So a part like query is defined in https://www.rfc-editor.org/rfc/rfc3986#appendix-A as:

   query         = *( pchar / "/" / "?" )
   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   pct-encoded   = "%" HEXDIG HEXDIG
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

which can be translated to:

const QUERY: AsciiSet = PCHAR.add(b'/').add(b'?');
const PCHAR: AsciiSet = UNRESERVED.union(SUB_DELIMS).add(b':').add(b'@');
const UNRESERVED: AsciiSet = ...

// which can be then used like:
let encoded_query = utf8_percent_encode("foo?:@/=bar", !QUERY);

In RFCs, the sets of characters to percent-encode are often defined as
the union of multiple sets. This change adds an `EMPTY` constant to
`AsciiSet` and implements the `Add` trait for `AsciiSet` so that sets
can be combined with the `+` operator.

AsciiSet now derives `Debug`, `PartialEq`, and `Eq` so that it can be
used in tests.
@joshka joshka changed the title Add AsciiSet::EMPTY and impl ops::Add for AsciiSet Add AsciiSet::EMPTY and impl Add and Not for AsciiSet Sep 19, 2024
@joshka joshka changed the title Add AsciiSet::EMPTY and impl Add and Not for AsciiSet Add AsciiSet::EMPTY and operators Sep 19, 2024
@joshka joshka changed the title Add AsciiSet::EMPTY and operators Add AsciiSet::EMPTY and boolean operators Sep 19, 2024
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@9404ff5). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #969   +/-   ##
=======================================
  Coverage        ?   81.85%           
=======================================
  Files           ?       21           
  Lines           ?     3560           
  Branches        ?        0           
=======================================
  Hits            ?     2914           
  Misses          ?      646           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@valenting valenting added this pull request to the merge queue Sep 19, 2024
Merged via the queue into servo:main with commit 5505565 Sep 19, 2024
14 checks passed
@joshka
Copy link
Contributor Author

joshka commented Sep 19, 2024

Thanks!

@@ -77,6 +78,11 @@ const ASCII_RANGE_LEN: usize = 0x80;
const BITS_PER_CHUNK: usize = 8 * mem::size_of::<Chunk>();

impl AsciiSet {
/// An empty set.
pub const EMPTY: AsciiSet = AsciiSet {
Copy link

@ForsakenHarmony ForsakenHarmony Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's now inconsistent with the existing constants and functions taking &'static AsciiSet.

Copy link
Contributor Author

@joshka joshka Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't 100% sure about that. I went with EMPTY being a constant on the AsciiSet as empty seems like an inherent property of a type, but the other constants seem like usages of AsciiSet. I was 70/30% on this being right, so wouldn't object to this being changed to be consistent with the other constants.

The rationale for making the constants references rather than just values all seemed odd to me. What was that necessary for?

Edit: let's disuss on #970 instead of here.

kodiakhq bot pushed a commit to pdylanross/fatigue that referenced this pull request Nov 5, 2024
Bumps url from 2.5.2 to 2.5.3.

Release notes
Sourced from url's releases.

v2.5.3
What's Changed

fix: enable wasip2 feature for wasm32-wasip2 target by @​brooksmtownsend in servo/rust-url#960
Fix idna tests with no_std by @​cjwatson in servo/rust-url#963
Fix debugger_visualizer test failures. by @​valenting in servo/rust-url#967
Add AsciiSet::EMPTY and boolean operators by @​joshka in servo/rust-url#969
mention why we pin unicode-width by @​Manishearth in servo/rust-url#972
refactor and add tests for percent encoding by @​joshka in servo/rust-url#977
Add a test for and fix issue #974 by @​hansl in servo/rust-url#975
no_std support for the url crate by @​domenukk in servo/rust-url#831
Normalize URL paths: convert /.//p, /..//p, and //p to p by @​theskim in servo/rust-url#943
support Hermit by @​m-mueller678 in servo/rust-url#985
fix: support wasm32-wasip2 on the stable channel by @​brooksmtownsend in servo/rust-url#983
Improve serde error output by @​konstin in servo/rust-url#982
OSS-Fuzz: Add more fuzzer by @​arthurscchan in servo/rust-url#988
Merge idna-v1x to main by @​hsivonen in servo/rust-url#990

New Contributors

@​brooksmtownsend made their first contribution in servo/rust-url#960
@​cjwatson made their first contribution in servo/rust-url#963
@​joshka made their first contribution in servo/rust-url#969
@​hansl made their first contribution in servo/rust-url#975
@​theskim made their first contribution in servo/rust-url#943
@​m-mueller678 made their first contribution in servo/rust-url#985
@​konstin made their first contribution in servo/rust-url#982
@​arthurscchan made their first contribution in servo/rust-url#988

Full Changelog: servo/rust-url@v2.5.2...v2.5.3



Commits

8a683ff Merge idna-v1x to main (#990)
08a3268 OSS-Fuzz: Add more fuzzers (#988)
5d363cc Improve serde error output (#982)
30e6258 fix: support wasm32-wasip2 on stable channel (#983)
bf089c4 support hermit (#985)
b08a655 Normalize URL paths: convert /.//p, /..//p, and //p to p (#943)
ebd5cfb no_stdsupport for the url crate (#831)
7eccac9 Add a test for and fix issue #974 (#975)
710e1e7 refactor and add tests for percent encoding (#977)
6050a6e mention why we pin unicode-width (#972)
Additional commits viewable in compare view




Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants