fix: validate atoms modulo leading and trailing whitespace #6012

david-christiansen · 2024-11-08T12:10:20Z

This PR improves the validation of new syntactic tokens. Previously, the validation code had inconsistencies: some atoms would be accepted only if they had a leading space as a pretty printer hint. Additionally, atoms with internal whitespace are no longer allowed.

Closes #6011

Closes leanprover#6011

david-christiansen · 2024-11-08T12:42:15Z

The failing test defines this operator:

/-- `f '' s` denotes the image of `s : Set α` under the function `f : α → β`. -/
infixl:80 " '' " => image

Should this be allowed? I would think not, because deleting the leading space makes current versions of Lean reject it, and I would expect that pretty printing instructions were orthogonal to token rules. On that assumption, I'll update the test.

david-christiansen · 2024-11-08T12:49:40Z

This notation is also used in Mathlib.

digama0 · 2024-11-08T20:14:17Z

Should this be allowed? I would think not, because deleting the leading space makes current versions of Lean reject it, and I would expect that pretty printing instructions were orthogonal to token rules. On that assumption, I'll update the test.

Why wouldn't this be allowed? Whitespace aside, it seems like a perfectly reasonable infix declaration. (Well, I do recall there was a hack added specifically to allow '' to be a token rather than an empty character literal, but besides that it's a regular token.)

david-christiansen · 2024-11-10T08:35:45Z

If it should be allowed, then shouldn't it be allowed whether or not the pretty-printer is instructed to insert whitespace before it?

Today, "''" is rejected as an atom, but " '' " is accepted. I think that either both or neither should be accepted here. Does that make sense?

digama0 · 2024-11-10T08:45:08Z

I agree. They should both be accepted.

digama0 · 2024-11-10T08:50:48Z

The exception for empty character literals was added in #1931 . I think we need a similar exception at Lean.Elab.Term.toParserDescr.isValidAtom.

Kha · 2024-11-13T09:48:49Z

The exception for empty character literals was added in #1931 . I think we need a similar exception at Lean.Elab.Term.toParserDescr.isValidAtom.

agreed

This reverts commit 93c157c.

david-christiansen · 2024-11-13T12:55:38Z

Great, '' is now allowed with or without whitespace as of b2a0735.

While I'm at it, we presently allow atoms with internal spaces. Should we?

This works, and I'm not sure it should:

infix:70 " <I like apples, they are delicious> " => Nat.add
#eval 3 <I like apples, they are delicious> 4

david-christiansen · 2024-11-13T13:02:21Z

Just to see if it breaks Mathlib, I gave it a go

david-christiansen · 2024-11-13T15:41:44Z

It seems that open private is a single atom.

david-christiansen · 2024-11-14T09:21:30Z

leanprover-community/batteries#1050 makes batteries build for me with this. I think we're OK.

digama0 · 2024-11-14T09:30:05Z

While I think batteries#1050 is reasonable, I don't see a particular reason to ban tokens including whitespace? That is a thing you might want to do, it gives additional flexibility to DSL writers.

david-christiansen · 2024-11-14T17:03:02Z

This was a balance of footgun vs flexibility. A consequence of the way open private was specified in Batteries was that open private or open /- Extracting the secret treasures! -/ private was banned, which could be surprising. This is also an easy mistake to make - it seems to work well in most cases, and then suddenly there's weird error messages for a few downstream users.

If someone really needs precisely that level of flexiblity for their DSL, it's still possible to drop down and write the corresponding ParserFn and parser info. But if your DSL has token rules that are very different from Lean's, then you're probably better off working mostly at that level anyway (like I do in Verso).

digama0 · 2024-11-14T17:51:03Z

This was a balance of footgun vs flexibility. A consequence of the way open private was specified in Batteries was that open private or open /- Extracting the secret treasures! -/ private was banned, which could be surprising. This is also an easy mistake to make - it seems to work well in most cases, and then suddenly there's weird error messages for a few downstream users.

I agree on this, it's not the first time this mistake has been made and I don't think there are any commands in batteries or mathlib that would want to do this. But that's justification for a warning at best.

kim-em · 2024-11-17T09:08:17Z

@david-christiansen, I'm seeing

/-- `f ''ᵁ U` is notation for the image (as an open set) of `U` under an open immersion `f`. -/
scoped[AlgebraicGeometry] notation3:90 f:91 " ''ᵁ " U:90 => (Scheme.Hom.opensFunctor f).obj U

erroring with "invalid atom" now. Can we add some further flexibility to allow this?

david-christiansen · 2024-11-18T06:03:21Z

I agree on this, it's not the first time this mistake has been made and I don't think there are any commands in batteries or mathlib that would want to do this. But that's justification for a warning at best.

If a real-world use case arises, then we can revisit this. Right now, it doesn't seem like a good use of development resources - "no internal whitespace" is easy to implement and solves the problem, while making it a warning would require more time and increase the complexity of the code overall.

This PR liberalizes atom rules by allowing `''` to be a prefix of an atom, after #6012 only added an exception for `''` alone, and also adds some unit tests for atom validation.

digama0 · 2024-11-23T15:26:14Z

I had a use case for this come up:

axiom SProp : Type
axiom SProp.ip : UInt64 → SProp
prefix:80 "RIP ↦ " => SProp.ip

from some separation logic project I'm working on. As I understand it this will not work in the next version of lean, and the prefix command does not allow adding multiple string literals in that position. Could we make the prefix command either split the input on whitespace (preserving formatting), or allow multiple string tokens in that position?

The other unfortunate side effect of using two tokens here is that it makes RIP into a keyword, which is okay but suboptimal in this situation (I would prefer it is only special when immediately followed by ↦ in this notation).

david-christiansen · 2024-11-25T05:22:07Z

I'm not totally convinced by this use case for spaces in the tokens. It's not that it doesn't seem like what's being asked for is useful, it's just that it seems to me that the usefulness really comes from new features rather than from undoing this new validation step.

Keeping the old behavior would have resulted in a keyword with a single space in it. RIP ↦ would be valid syntax, while RIP ↦ or RIP /- This does the thing! -/ ↦ would not. This is precisely the footgun to be avoided, and I don't think that it's a compelling case for whitespace characters in tokens.

It doesn't seem bonkers to allow multi-token operators, though I'd like to see why a notation doesn't do what you want here. I think it's OK if operators make easy things easy, but don't cover a wide range of syntax, and from what I can see, you'd get all the benefits of a two-token prefix operator from a notation. RIP would indeed be a keyword, of course, but this is another one of those balancing acts between flexibility and consistency that doesn't strike me as a huge problem, though I of course don't have a ton of insight into the specifics of your project. What am I missing?

digama0 · 2024-11-25T09:48:02Z

It's not that it doesn't seem like what's being asked for is useful, it's just that it seems to me that the usefulness really comes from new features rather than from undoing this new validation step.

No disagreement from me on this! I just wanted to give you a reasonable use case to center your thinking on this one.

RIP would indeed be a keyword, of course, but this is another one of those balancing acts between flexibility and consistency that doesn't strike me as a huge problem, though I of course don't have a ton of insight into the specifics of your project. What am I missing?

The idea here is that RIP would be a normal identifier (in fact it may even be a definition), but when followed by ↦ (I don't care if comments are allowed in between the token or not) it would be treated as the above composite syntax. There are a few tricks already in use in lean core to avoid e.g. tactic names from being treated as tokens and banning their use in other positions, and this is a similar issue. I'm not fussed on the particular implementation choices needed to get there, but I don't think notation &"RIP" " ↦ " x => SProp.ip x works currently.

Context on the project isn't too relevant to this feature request, but to explain a bit more: RIP ↦ x is a separating proposition which says that the instruction pointer points to x, but RIP isn't a memory location, and it's not even a regular register name, which is why it has special syntax here. RIP is however the name of a projection out of the machine state (in a relatively separate part of the project), and these two things clash if you require RIP to be a token.

fix: validate atoms modulo leading and trailing whitespace

2c372e9

Closes leanprover#6011

david-christiansen added the changelog-language Language features, tactics, and metaprograms label Nov 8, 2024

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Nov 8, 2024

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Nov 8, 2024

Update lean-toolchain for testing leanprover/lean4#6012

acdb8f4

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Nov 8, 2024

Update lean-toolchain for testing leanprover/lean4#6012

b54a1da

leanprover-community-bot added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Nov 8, 2024

fix: update test

93c157c

david-christiansen requested review from kim-em and Kha November 8, 2024 12:49

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Nov 8, 2024

Trigger CI for leanprover/lean4#6012

78d982c

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Nov 8, 2024

Trigger CI for leanprover/lean4#6012

a2f9c62

david-christiansen added 2 commits November 13, 2024 13:47

Revert "fix: update test"

b57fa2c

This reverts commit 93c157c.

fix: allow "''" as an atom

b2a0735

david-christiansen added 2 commits November 13, 2024 14:01

chore: disallow whitespace inside atoms

71ab126

chore: update test

f399a94

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Nov 13, 2024

Trigger CI for leanprover/lean4#6012

c49507e

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Nov 13, 2024

Trigger CI for leanprover/lean4#6012

cfe6129

david-christiansen added the will-merge-soon …unless someone speaks up label Nov 14, 2024

david-christiansen added this pull request to the merge queue Nov 14, 2024

Merged via the queue into leanprover:master with commit 8e1ddbc Nov 14, 2024
20 checks passed

david-christiansen deleted the atom-validation-ws branch November 14, 2024 16:16

david-christiansen mentioned this pull request Nov 18, 2024

fix: liberalize rules for atoms by allowing leading '' #6114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate atoms modulo leading and trailing whitespace #6012

fix: validate atoms modulo leading and trailing whitespace #6012

david-christiansen commented Nov 8, 2024 •

edited

Loading

david-christiansen commented Nov 8, 2024

david-christiansen commented Nov 8, 2024

digama0 commented Nov 8, 2024 •

edited

Loading

david-christiansen commented Nov 10, 2024

digama0 commented Nov 10, 2024

digama0 commented Nov 10, 2024

Kha commented Nov 13, 2024

david-christiansen commented Nov 13, 2024

david-christiansen commented Nov 13, 2024

david-christiansen commented Nov 13, 2024

david-christiansen commented Nov 14, 2024

digama0 commented Nov 14, 2024 •

edited

Loading

david-christiansen commented Nov 14, 2024

digama0 commented Nov 14, 2024 •

edited

Loading

kim-em commented Nov 17, 2024

david-christiansen commented Nov 18, 2024

digama0 commented Nov 23, 2024 •

edited

Loading

david-christiansen commented Nov 25, 2024

digama0 commented Nov 25, 2024 •

edited

Loading

fix: validate atoms modulo leading and trailing whitespace #6012

fix: validate atoms modulo leading and trailing whitespace #6012

Conversation

david-christiansen commented Nov 8, 2024 • edited Loading

david-christiansen commented Nov 8, 2024

david-christiansen commented Nov 8, 2024

digama0 commented Nov 8, 2024 • edited Loading

david-christiansen commented Nov 10, 2024

digama0 commented Nov 10, 2024

digama0 commented Nov 10, 2024

Kha commented Nov 13, 2024

david-christiansen commented Nov 13, 2024

david-christiansen commented Nov 13, 2024

david-christiansen commented Nov 13, 2024

david-christiansen commented Nov 14, 2024

digama0 commented Nov 14, 2024 • edited Loading

david-christiansen commented Nov 14, 2024

digama0 commented Nov 14, 2024 • edited Loading

kim-em commented Nov 17, 2024

david-christiansen commented Nov 18, 2024

digama0 commented Nov 23, 2024 • edited Loading

david-christiansen commented Nov 25, 2024

digama0 commented Nov 25, 2024 • edited Loading

david-christiansen commented Nov 8, 2024 •

edited

Loading

digama0 commented Nov 8, 2024 •

edited

Loading

digama0 commented Nov 14, 2024 •

edited

Loading

digama0 commented Nov 14, 2024 •

edited

Loading

digama0 commented Nov 23, 2024 •

edited

Loading

digama0 commented Nov 25, 2024 •

edited

Loading