Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for TSX #409

Merged
merged 5 commits into from
Mar 12, 2024
Merged

Conversation

eyakubovich
Copy link
Contributor

Support TSX dialect of the TypeScript.
Because the TSX is almost a superset of TypeScript, it makes sense to reuse the .tsg file. However it
requires a bit of preprocessing to handle the differences.

I'm posting this PR as a straw-man to solicit input on how to best proceed. The tree-sitter has TSX support as a dialect of typescript. While the two dialects are very similar, TSX does not support type assertions in the form of:

<MyType>x

Therefore the .tsg file fails to run for TSX as it has a (type_assertion) tree-sitter query.

In this patch I went with using "askama" templating engine to strip out the offending stanza before feeding it into the TSG parser. The biggest issue with this approach is that it messes up the line numbers, making debugging difficult. Other alternatives include:

  1. Creating a small custom preprocessor that will replace the removed lines with newlines to preserve the line numbers.
  2. Adding support for "conditional compilation" into the TSG language itself.

Curious to hear what others think would be the best approach.

@eyakubovich eyakubovich requested a review from a team as a code owner February 29, 2024 01:34
@ghost
Copy link

ghost commented Feb 29, 2024

Maybe I am wrong here but I would expect type_assertion to exist in tree_sitter_typescript::language_tsx().

https://github.com/tree-sitter/tree-sitter-typescript/blob/198e2ea43d1c4ddd76ee883f4eae15f4201cd241/common/define-grammar.js#L439-L442

@eyakubovich
Copy link
Contributor Author

I think the rule exists but it's not referenced by any other rule because it gets filtered out for TSX dialect: https://github.com/tree-sitter/tree-sitter-typescript/blob/198e2ea43d1c4ddd76ee883f4eae15f4201cd241/common/define-grammar.js#L214-L221

And I guess tree-sitter is smart enough to filter out unused productions as type_assertion is not in the generated parser code: https://github.com/tree-sitter/tree-sitter-typescript/blob/v0.20.2/tsx/src/parser.c

@hendrikvanantwerpen
Copy link
Collaborator

Thanks for this proposal!

Before getting into the details of it, I have to say I'm a little surprised that TSX is not simply a superset of TypeScript? I tried finding a reference for TSX but couldn't find one quickly. If it should be a superset, this is something we should try to get fixed in the grammar.

Now for the proposal. I'm in favor of trying to get TSX supported, but I'm not sure about the right approach here. You already mention the problem with changed line numbers and resulting debugging. Besides that, I think there are two more aspects that are not optimal:

  • The primary source file is not valid TSG anymore because of the templating directives in there.
  • The templating engine is in the runtime and becomes a required dependency for all users of the crate.

My ideal solution would be something along the lines of tree-sitter/tree-sitter-graph#144, where we could split up the spec into multiple files, controlling which parts are used in toplevel files for the two dialects. However, it's unlikely that we can spent time to implement that any time soon.

I think we can change your solution a bit to overcome the most important downsides.

  • Instead of doing the preprocessing at runtime, we could write a build script to preprocess src/stack-graphs.tsg into stack-graphs-{typescript,tsx}.tsg. We export two languages from the crate, each using a different file.

  • Instead of pulling in a whole templating library, we write some simple preprocessing code as part of the build script. It would go through all the lines of the file, matching pairs of directive indicating the dialect (#dialect XXX and #end or something), but in such a way that it can be part of a comment (i.e., after ;). The lines from the opening to the closing directive (including the directive lines) are replaced by whitespace (or at least empty lines), like you suggested to keep the line numbers correct.

Text manipulation like this is not ideal, but I think it would work, not affect the runtime code or the debugging capabilities, and be self-contained in this crate.

/cc @dcreager Do you think this an acceptable solution?

Support TSX dialect of the TypeScript.
Because the TSX is almost a superset of TypeScript,
it makes sense to reuse the .tsg file. However it
requires a bit of preprocessing to handle the differences.
Preprocesses the .tsg file into typescript and tsx specific
.tsg files. The preprocessing is done in the build script.

Add a --dialect|-d option to the CLI to select the dialect.
@eyakubovich
Copy link
Contributor Author

@hendrikvanantwerpen Thank you for the suggestions. I updated the PR with the build script preprocessing as you suggested.

Please note that although the .tsg file now loads for the TSX dialect, it doesn't successfully process .tsx files. Unlike the javascript TSG file, it's missing all the jsx_element related stanzas (next up for me to do).

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look very nice! I left some comments below, to iron out some details and simplify the code a little, but overall I'm happy with the direction.

languages/tree-sitter-stack-graphs-typescript/rust/bin.rs Outdated Show resolved Hide resolved
languages/tree-sitter-stack-graphs-typescript/rust/bin.rs Outdated Show resolved Hide resolved
languages/tree-sitter-stack-graphs-typescript/rust/lib.rs Outdated Show resolved Hide resolved
languages/tree-sitter-stack-graphs-typescript/rust/lib.rs Outdated Show resolved Hide resolved
languages/tree-sitter-stack-graphs-typescript/build.rs Outdated Show resolved Hide resolved
fn main() {
let out_dir = std::env::var_os("OUT_DIR").unwrap();
for dialect in DIALECTS {
let input = std::fs::File::open(TSG_SOURCE).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use expect instead of unwrap to give a little context to the resulting error.

let input = std::fs::File::open(TSG_SOURCE).unwrap();

let out_filename = Path::new(&out_dir).join(format!("stack-graphs-{dialect}.tsg"));
let output = std::fs::File::create(out_filename).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

let out_filename = Path::new(&out_dir).join(format!("stack-graphs-{dialect}.tsg"));
let output = std::fs::File::create(out_filename).unwrap();

preprocess(input, output, dialect).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

languages/tree-sitter-stack-graphs-typescript/build.rs Outdated Show resolved Hide resolved
languages/tree-sitter-stack-graphs-typescript/build.rs Outdated Show resolved Hide resolved
- Remove dialect stack
- Better error handling
- Use clap::ValueEnum
- Append _typescript to {try}_language_configuration
@eyakubovich eyakubovich changed the title WIP: Support for TSX Support for TSX Mar 11, 2024
Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few final nits, but otherwise this is good to go!

let directive_start = Regex::new(r";\s*#dialect\s+(\w+)").unwrap();

// Matches: ; #end
let directirve_end = Regex::new(r";\s*#end").unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
let directirve_end = Regex::new(r";\s*#end").unwrap();
let directive_end = Regex::new(r";\s*#end").unwrap();


filter = Some(dialect == directive);
output.write_all(line.as_bytes())?;
} else if directirve_end.is_match(line) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem:

Suggested change
} else if directirve_end.is_match(line) {
} else if directive_end.is_match(line) {

}

filter = Some(dialect == directive);
output.write_all(line.as_bytes())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the directives out of the generated file where they have already been applied.

Suggested change
output.write_all(line.as_bytes())?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for two reasons:

  1. Since I process a full line at a time there's a chance that a comment / directive follows valid TSG code:
(function) { ...} ;  #dialect tsx

always writing out the line ensures that the valid code is not lost.

  1. When looking at the generated code, I found it convenient to still see the directives as they highlight dialect specific sections.

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that in the generated code the the directives are already processed, and them not being there makes that obvious.

I see your point about the comment coming after some other code. (And this is always the trouble with text-based preprocessing instead of something properly supported in the language---but that's the cost of going for an expedient solution here ;).) My expectation with markers in pairs like this is that they operate on blocks of lines. A tricky example with keeping the lines would be:

; #dialect tsx
(function) {
  ...
} ; #end

If this is processed with the current algorithm, it would leave an umatched } I think. I think treating them as operating on lines simplifies things. If you're worried that people may mistakes, you can add an extra check that the line before the matched directive matches [\s;]+, or throw an error that a directive cannot be on the same line as code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're correct that I accounted for one edge case but not the mirror image of it. I'll add a check to ensure that directives are not mixed with code. And then I won't output the directives.

}

filter = None;
output.write_all(line.as_bytes())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem:

Suggested change
output.write_all(line.as_bytes())?;

@@ -17,9 +17,10 @@ pub mod tsconfig;
pub mod util;

/// The stacks graphs tsg path for this language.
pub const STACK_GRAPHS_TSG_PATH: &str = "src/stack-graphs.tsg";
pub const STACK_GRAPHS_TSG_PATH: &str = "./stack-graphs.tsg";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub const STACK_GRAPHS_TSG_PATH: &str = "./stack-graphs.tsg";
pub const STACK_GRAPHS_TSG_PATH: &str = "src/stack-graphs.tsg";

This should be relative to the crate root for VSCode to handle it correctly when running cargo test e.g.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Comment on lines 22 to 23
pub const STACK_GRAPHS_TSG_TS_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-typescript.tsg"));
pub const STACK_GRAPHS_TSG_TSX_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-tsx.tsg"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub const STACK_GRAPHS_TSG_TS_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-typescript.tsg"));
pub const STACK_GRAPHS_TSG_TSX_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-tsx.tsg"));
const STACK_GRAPHS_TSG_TS_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-typescript.tsg"));
const STACK_GRAPHS_TSG_TSX_SOURCE: &str = include_str!(concat!(env!("OUT_DIR"), "/stack-graphs-tsx.tsg"));

No need to expose the generated file paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're completely right. I thought this was exporting the path, but it's exporting the source. In that case, yes, keep it public!

@eyakubovich
Copy link
Contributor Author

@hendrikvanantwerpen if you want, I can squash the commits. The first one has that dependency on "askama" so not great if you ever run git bisect.

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

If you can share, I'd love to hear what you are using stack graps for. We're always curious where this gets applied :)

@hendrikvanantwerpen hendrikvanantwerpen merged commit c09f746 into github:main Mar 12, 2024
8 checks passed
@eyakubovich
Copy link
Contributor Author

We're interested in analyzing JS/TS apps in their entirety. As such, a single real world app typically has a good mix of all 4: JS/JSX/TS/TSX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants