Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree-sitter rolling fixes, 1.118 edition #1010

Merged

Conversation

savetheclocktower
Copy link
Contributor

More fixes for TypeScript/JavaScript, plus fixes in grammar-matching logic for PHP. More to come, no doubt.

…from getting a `constant.other` scope applied.

Also ensure correct scoping of JSX tags even when the tag name is a member expression like `Foo.Bar`.
…so that the grammar-choosing logic picks the modern Tree-sitter grammar more consistently.
…to gain support for type annotations on class constants, among other things.
…in anticipation of the new auto-generated API docs. (These are some tweaks that I've applied locally because they make the docs better.)
@Psychosynthesis
Copy link

I wonder how to find out what has changed in the binary files? Or are they automatically generated from the rest?

I'm just starting to slowly understand the entire codebase, so I don't understand a lot.

@savetheclocktower
Copy link
Contributor Author

I wonder how to find out what has changed in the binary files? Or are they automatically generated from the rest?

I'm just starting to slowly understand the entire codebase, so I don't understand a lot.

The WASM files are generated from their various Tree-sitter repos. This is partly why we introduced the parserSource convention in the grammar files — it tells you which commit SHA the WASM was built from.

Hence if you wanted to know what's changed in tree-sitter-php, you could compare b569a5f2c0d592e67430520d1a0e1f765d83ceb0 to 27afeb02e49ff30acd17b67897b1c0114561a38c. Not the easiest thing to do, but it's at least possible.

@Psychosynthesis
Copy link

@savetheclocktower thanx, got it.

We made this change in one place, but not in another. The real fix is to make both places use the same copy of `resolveNodeDescriptor`, but that'll have to wait another month.
…so that it doesn’t try to look up a child node that we know won’t exist.

Most Ruby comments aren’t foldable because they start and end on the same line — by their very nature! But there is one obscure multi-line syntax that we can support, so this tells Pulsar to fold to the end of the comment itself.
* Ensure private methods like `#foo` are included in lists of symbols.
* Scope `void` as a unary operator.
* Properly scope rest object properties in function definitions.
@savetheclocktower
Copy link
Contributor Author

I never put this PR into draft status, but if I had, I'd be taking it out of draft status right now. It's ready for review.

Copy link
Member

@DeeDeeG DeeDeeG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me!

  • Docs fixes
  • A parser update for php
  • Miscellaneous grammar fixes and improvements.

Comment on lines -17 to +18
'common/highlights.scm'
'tree-sitter-tsx/highlights.scm'
'common/highlights.scm'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change meaningful?

Any insights into the purpose of it if so? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order matters here. The contents of each file is concatenated into one query file, and earlier queries are matched before later queries.

I needed a few JSX-specific rules to act before the ordinary rules in highlights.scm — the “Block off identifiers within complex JSX tag names…” stuff matches some things so that it can prevent them from being matched by a later rule.

In this instance, it didn't otherwise matter which query file was included first. But if it had, I'd have needed to create a third file, put only those rules into it, and configure it to be included first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a pretty useful and practical explanation, with the examples.

Comment on lines +8 to +16
# The TextMate grammar has an elaborate `firstLineMatch` pattern defined with
# Oniguruma; it catches hashbang syntax and complex vim/emacs modelines along
# with the typical `<?php` prefix.
#
# We can't compete with all that, but we can catch obvious instances of
# shebangs and `<?php`s. In fact, we must, or else the TextMate grammar will be
# ranked higher for certain files.
firstLineRegex: '^\\s*<\\?(?:php|PHP|=|\\s|$)|^\\#!.*(?:\\s|\\/)php\\d?'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd say users are better served by using the Tree-sitter grammar than the TextMate one in these cases, right? Despite the less-sophisticated first line regex?

TextMate won't have issued with newer Electron if I understand correctly, so we can let TextMate handle it if it'd be better? But if I also recall correctly, Tree-sitter is Nifty ™️ in Ways ™️ so this is probably desirable.

Just checking!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, they're better served by using the (modern) Tree-sitter grammars; that's why we prefer Tree-sitter grammars over TextMate grammars when all else is equal.

The goal of the TextMate PHP grammar's firstLineRegex — I'm nearly certain — is to help it detect things that don't have .php extensions but are nonetheless PHP files. That would allow Pulsar to auto-assign files that it otherwise didn't know what to do with. I don't think the intent was to identify patterns that that grammar would be better at parsing than another PHP grammar, since no such grammar existed until quite recently.

It's good to have the two different PHP grammars’ firstLineRegexes match as much as possible so that they're not competing aginst one another in the grammar assignment process — only against non-PHP grammars. But in this case it's not possible.

(If it really mattered, we could use Oniguruma for this task even when the grammar is a Tree-sitter grammar, but I'd rather not do that if we can avoid it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes enough sense to me! Again, appreciate the explanation.

@savetheclocktower savetheclocktower merged commit ecbedf9 into pulsar-edit:master Jun 16, 2024
104 checks passed
@savetheclocktower savetheclocktower deleted the tree-sitter-june branch June 16, 2024 02:10
// * `filePath`: A {String} path to the file.
// * `contents`: The {String} contents of the file.
//
// Returns a {Number}.
getGrammarScore(grammar, filePath, contents) {
if (contents == null && fs.isFileSync(filePath)) {
Copy link

@Psychosynthesis Psychosynthesis Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be contents === null?

I apologize for writing after the merge, my comment was not sent due to the idiotic functionality of Github, it was in pending status, and I didn’t even know about it

Copy link
Member

@DeeDeeG DeeDeeG Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it depends on if there is a meaningful difference between "either null or undefined" here (==) vs "strictly null, deliberately excluding undefined" (===).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants