Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to Checker for cached parsing of stringified type annotations #13158

Merged
merged 6 commits into from
Sep 2, 2024

Conversation

AlexWaygood
Copy link
Member

Summary

This PR adds a method to ruff_linter::checkers::ast::Checker for cached parsing of stringified type annotations. It was decided in review of #12951 that this would be desirable, since ruff_python_parser::typing::parse_type_annotation can be quite expensive. However, since we already have a number of linter rules that call ruff_python_parser::typing::parse_type_annotation, it seemed to me like this would make sense as a standalone PR. (Adding caching was also more complicated than I expected, so separating this into its own PR should make life easier for reviewers.) If this is accepted, I'll rebase #12951 on top of it.

The PR should be easiest to review commit-by-commit. Each commit on its own passes the entire test suite. The first two commits do some refactoring to lay the groundwork for adding the new method. The third commit adds the new method and makes use of it in crates/ruff_linter/src/checkers/ast/mod.rs; the final commit makes use of the new method in various linter rules that currently use ruff_python_parser::typing::parse_type_annotation.

Test Plan

cargo test

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Aug 30, 2024

This doesn't seem to have any impact on the Codspeed benchmarks one way or another as a standalone PR (but I don't know how many stringified annotations we have in those benchmarks?)

Copy link
Contributor

github-actions bot commented Aug 30, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser
Copy link
Member

This doesn't seem to have any impact on the Codspeed benchmarks one way or another as a standalone PR (but I don't know how many stringified annotations we have in those benchmarks?)

I don't think we have any benchmark containing stringified annotations. You could try to run some hyperfine benchmarks over a project that does use stringified annotations.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Looks good to me.

The only part that's unclear to me is why we need to reset the cache. It would be nice if it could be avoided. If not, then it would help to extend the comment so that it explains what data gets invalidated.

crates/ruff_linter/src/checkers/ast/mod.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/checkers/ast/mod.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/checkers/ast/mod.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/checkers/ast/mod.rs Outdated Show resolved Hide resolved
crates/ruff_linter/src/checkers/ast/mod.rs Outdated Show resolved Hide resolved
@Daverball
Copy link

This is more of a general comment, since I thought about adding something similar for my ongoing work on flake8-type-checking.

I am not sure to what degree this would be viable, but why not just eagerly parse string annotations, store them all on the checker as a struct that contains both the string and the parsed expression add that struct to the deferred string annotations vector instead of just the string and then visit the expression where we currently parse the annotation in the checker. That should preserve the same semantics for bindings and references.

I realize there are some corner cases with nested strings like 'list["str"]' where parsing either would still need to be deferred or the nested case would need to be handled eagerly while parsing the annotation, adding an arbitrary number of string / expression pairs, rather than only a single one. But it still seems viable to me, unless I'm forgetting an important detail.

@MichaReiser
Copy link
Member

I thought about that too because we always end up parsing all type annotations.

My conclusion was that doing it inside of Checker would require one additional AST pass, which is unfortunate. It would be nice if we could do this directly in the parser but that's probably more involved.

@AlexWaygood AlexWaygood enabled auto-merge (squash) September 2, 2024 12:40
@AlexWaygood AlexWaygood merged commit b7c7b4b into main Sep 2, 2024
16 checks passed
@AlexWaygood AlexWaygood deleted the alex/cached-annotation-parsing branch September 2, 2024 12:44
@dhruvmanila dhruvmanila added the internal An internal refactor or improvement label Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants